The `test_diff()` function is used to test coefficients of a 'proDAFit'
object. It provides a Wald test to test individual
coefficients and a likelihood ratio F-test to compare the
original model with a reduced model. The result_names
method provides a quick overview which coefficients are
available for testing.
test_diff(fit, contrast, reduced_model = ~1, alternative = c("two.sided", "greater", "less"), pval_adjust_method = "BH", sort_by = NULL, decreasing = FALSE, n_max = Inf, verbose = FALSE) # S4 method for proDAFit result_names(fit)
fit | an object of class 'proDAFit'. Usually, this is
produced by calling |
---|---|
contrast | an expression or a string specifying which
contrast is tested. It can be a single coefficient (to see
the available options use |
reduced_model | If you don't want to test an individual
coefficient, you can can specify a reduced model and compare
it with the original model using an F-test. This is useful
to find out how a set of parameters affect the goodness of
the fit. If neither a |
alternative | a string that decides how the
hypothesis test is done. This parameter is only relevant for
the Wald-test specified using the `contrast` argument.
Default: |
pval_adjust_method | a string the indicates the method
that is used to adjust the p-value for the multiple testing.
It must match the options in |
sort_by | a string that specifies the column that is used
to sort the resulting data.frame. Default: |
decreasing | a boolean to indicate if the order is reversed.
Default: |
n_max | the maximum number of rows returned by the method.
Default: |
verbose | boolean that signals if the method prints informative
messages. Default: |
The `result_names()` function returns a character vector.
The `test_diff()` function returns a data.frame
with one row per protein
with the key parameters of the statistical test. Depending what kind of test
(Wald or F test) the content of the `data.frame` differs.
The Wald test, which can considered equivalent to a t-test, returns a `data.frame` with the following columns:
the name of the protein, extracted from the rowname of the input matrix
the p-value of the statistical test
the multiple testing adjusted p-value
the difference that particular coefficient makes. In differential expression analysis this value is also called log fold change, which is equivalent to the difference on the log scale.
the diff
divided by the standard
error se
the standard error associated with the diff
the degrees of freedom, which describe the amount
of available information for estimating the se
. They
are the sum of the number of samples the protein was observed
in, the amount of information contained in the missing values,
and the degrees of freedom of the variance prior.
the estimate of the average abundance of the protein across all samples.
the approximated information available for estimating the protein features, expressed as multiple of the information contained in one observed value.
the number of samples a protein was observed in
The F-test returns a `data.frame` with the following columns
the name of the protein, extracted from the rowname of the input matrix
the p-value of the statistical test
the multiple testing adjusted p-value
the ratio of difference of normalized deviances from original model and the reduced model, divided by the standard deviation.
the difference of the number of coefficients in the original model and the number of coefficients in the reduced model
the degrees of freedom, which describe the amount
of available information for estimating the se
. They
are the sum of the number of samples the protein was observed
in, the amount of information contained in the missing values,
and the degrees of freedom of the variance prior.
the estimate of the average abundance of the protein across all samples.
the information available for estimating the protein features, expressed as multiple of the information contained in one observed value.
the number of samples a protein was observed in
To test if coefficient is different from zero with a Wald
test use the contrast
function argument. To test if two
models differ with an F-test use the reduced_model
argument. Depending on the test that is conducted, the functions
returns slightly different data.frames.
The function is designed to follow the principles of the
base R test functions (ie. t.test
and
wilcox.test
) and the functions designed
for collecting the results of high-throughput testing
(ie. limma::topTable
and DESeq2::results
).
The contrast argument is inspired by
limma::makeContrasts
.
# "t-test" syn_data <- generate_synthetic_data(n_proteins = 10) fit <- proDA(syn_data$Y, design = syn_data$groups) result_names(fit)#> [1] "Condition_1" "Condition_2"test_diff(fit, Condition_1 - Condition_2)#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.63066796 0.8292043 0.13868715 0.5197612 0.2668286 4 #> 2 protein_2 0.70057164 0.8292043 -0.09230335 -0.4133169 0.2233234 4 #> 3 protein_3 0.02960657 0.1480328 1.17063438 3.3117272 0.3534815 4 #> 4 protein_4 0.15879472 0.4800317 0.31384244 1.7293959 0.1814752 4 #> 5 protein_5 0.82920433 0.8292043 0.05036218 0.2302352 0.2187423 4 #> 6 protein_6 0.81128343 0.8292043 0.05249717 0.2550194 0.2058556 4 #> 7 protein_7 0.19201270 0.4800317 0.23846547 1.5677433 0.1521075 4 #> 8 protein_8 0.53561684 0.8292043 -0.14429122 -0.6768613 0.2131769 4 #> 9 protein_9 0.72667595 0.8292043 -0.12732876 -0.3750301 0.3395161 4 #> 10 protein_10 0.01098839 0.1098839 -0.81158997 -4.4801736 0.1811515 4 #> avg_abundance n_approx n_obs #> 1 18.22209 3.003094 3 #> 2 20.00984 3.989574 4 #> 3 17.52142 1.363857 1 #> 4 21.28096 6.000000 6 #> 5 21.21086 4.977908 5 #> 6 19.59506 5.345718 5 #> 7 23.08283 6.000000 6 #> 8 19.06041 4.267228 4 #> 9 20.00061 4.983345 5 #> 10 23.52646 6.000000 6suppressPackageStartupMessages(library(SummarizedExperiment)) se <- generate_synthetic_data(n_proteins = 10, n_conditions = 3, return_summarized_experiment = TRUE) colData(se)$age <- rnorm(9, mean=45, sd=5) colData(se)#> DataFrame with 9 rows and 4 columns #> group true_dropout_curve_position true_dropout_curve_scale #> <factor> <numeric> <numeric> #> Condition_1-1 Condition_1 18.5 -1.2 #> Condition_1-2 Condition_1 18.5 -1.2 #> Condition_1-3 Condition_1 18.5 -1.2 #> Condition_2-1 Condition_2 18.5 -1.2 #> Condition_2-2 Condition_2 18.5 -1.2 #> Condition_2-3 Condition_2 18.5 -1.2 #> Condition_3-1 Condition_3 18.5 -1.2 #> Condition_3-2 Condition_3 18.5 -1.2 #> Condition_3-3 Condition_3 18.5 -1.2 #> age #> <numeric> #> Condition_1-1 45.4767700483905 #> Condition_1-2 42.6859029002183 #> Condition_1-3 37.6555892195272 #> Condition_2-1 45.7634325276076 #> Condition_2-2 53.8688130565859 #> Condition_2-3 41.7596453324248 #> Condition_3-1 44.000912621659 #> Condition_3-2 48.4462186648859 #> Condition_3-3 45.180727549183#> [1] "Intercept" "groupCondition_2" "groupCondition_3" "age"test_diff(fit, "groupCondition_2", n_max = 3, sort_by = "pval")#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.05766252 0.3596771 1.186875 2.453854 0.4836778 5 #> 10 protein_10 0.10541452 0.3596771 1.063993 1.973770 0.5390665 5 #> 3 protein_3 0.10790314 0.3596771 1.296400 1.955533 0.6629396 5 #> avg_abundance n_approx n_obs #> 1 18.20726 2.643421 1 #> 10 18.11640 4.664379 4 #> 3 20.20987 8.362519 8# F-test test_diff(fit, reduced_model = ~ group)#> name pval adj_pval f_statistic df1 df2 avg_abundance #> 1 protein_1 0.83525553 0.9985119 5.159081e-02 1 2.926664 18.20726 #> 2 protein_2 0.17019359 0.8509679 2.250220e+00 1 8.403543 20.07761 #> 3 protein_3 0.08813637 0.8509679 3.693300e+00 1 8.645761 20.20987 #> 4 protein_4 0.91314840 0.9985119 1.256128e-02 1 9.283242 23.16640 #> 5 protein_5 0.63759269 0.9985119 2.371096e-01 1 9.283242 21.20843 #> 6 protein_6 0.99851187 0.9985119 3.670676e-06 1 9.283242 22.53936 #> 7 protein_7 0.41659423 0.9985119 7.228662e-01 1 9.283242 20.41743 #> 8 protein_8 0.93818225 0.9985119 7.179518e-03 1 2.790716 18.39177 #> 9 protein_9 0.65665475 0.9985119 2.190582e-01 1 5.861225 19.13118 #> 10 protein_10 0.31029173 0.9985119 1.276726e+00 1 4.947621 18.11640 #> n_approx n_obs #> 1 2.643421 1 #> 2 8.120301 8 #> 3 8.362519 8 #> 4 9.000000 9 #> 5 9.000000 9 #> 6 9.000000 9 #> 7 9.000000 9 #> 8 2.507474 1 #> 9 5.577983 5 #> 10 4.664379 4