The `test_diff()` function is used to test coefficients of a 'proDAFit' object. It provides a Wald test to test individual coefficients and a likelihood ratio F-test to compare the original model with a reduced model. The result_names method provides a quick overview which coefficients are available for testing.

test_diff(fit, contrast, reduced_model = ~1,
  alternative = c("two.sided", "greater", "less"),
  pval_adjust_method = "BH", sort_by = NULL, decreasing = FALSE,
  n_max = Inf, verbose = FALSE)

# S4 method for proDAFit
result_names(fit)

Arguments

fit

an object of class 'proDAFit'. Usually, this is produced by calling proDA()

contrast

an expression or a string specifying which contrast is tested. It can be a single coefficient (to see the available options use result_names(fit)) or any linear combination of them. The contrast is always compared against zero. Thus, to find out if two coefficients differ use coef1 - coef2.

reduced_model

If you don't want to test an individual coefficient, you can can specify a reduced model and compare it with the original model using an F-test. This is useful to find out how a set of parameters affect the goodness of the fit. If neither a contrast, nor a reduced_model is specified, by default a comparison with an intercept model (ie. just the average across conditions) is done. Default: ~ 1.

alternative

a string that decides how the hypothesis test is done. This parameter is only relevant for the Wald-test specified using the `contrast` argument. Default: "two.sided"

pval_adjust_method

a string the indicates the method that is used to adjust the p-value for the multiple testing. It must match the options in p.adjust. Default: "BH"

sort_by

a string that specifies the column that is used to sort the resulting data.frame. Default: NULL which means the result is sorted by the order of the input matrix.

decreasing

a boolean to indicate if the order is reversed. Default: FALSE

n_max

the maximum number of rows returned by the method. Default: Inf

verbose

boolean that signals if the method prints informative messages. Default: FALSE.

Value

The `result_names()` function returns a character vector.

The `test_diff()` function returns a data.frame with one row per protein with the key parameters of the statistical test. Depending what kind of test (Wald or F test) the content of the `data.frame` differs.

The Wald test, which can considered equivalent to a t-test, returns a `data.frame` with the following columns:

name

the name of the protein, extracted from the rowname of the input matrix

pval

the p-value of the statistical test

adj_pval

the multiple testing adjusted p-value

diff

the difference that particular coefficient makes. In differential expression analysis this value is also called log fold change, which is equivalent to the difference on the log scale.

t_statistic

the diff divided by the standard error se

se

the standard error associated with the diff

df

the degrees of freedom, which describe the amount of available information for estimating the se. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.

avg_abundance

the estimate of the average abundance of the protein across all samples.

n_approx

the approximated information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

n_obs

the number of samples a protein was observed in

The F-test returns a `data.frame` with the following columns

name

the name of the protein, extracted from the rowname of the input matrix

pval

the p-value of the statistical test

adj_pval

the multiple testing adjusted p-value

f_statistic

the ratio of difference of normalized deviances from original model and the reduced model, divided by the standard deviation.

df1

the difference of the number of coefficients in the original model and the number of coefficients in the reduced model

df2

the degrees of freedom, which describe the amount of available information for estimating the se. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.

avg_abundance

the estimate of the average abundance of the protein across all samples.

n_approx

the information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

n_obs

the number of samples a protein was observed in

Details

To test if coefficient is different from zero with a Wald test use the contrast function argument. To test if two models differ with an F-test use the reduced_model argument. Depending on the test that is conducted, the functions returns slightly different data.frames.

The function is designed to follow the principles of the base R test functions (ie. t.test and wilcox.test) and the functions designed for collecting the results of high-throughput testing (ie. limma::topTable and DESeq2::results).

See also

The contrast argument is inspired by limma::makeContrasts.

Examples

# "t-test" syn_data <- generate_synthetic_data(n_proteins = 10) fit <- proDA(syn_data$Y, design = syn_data$groups) result_names(fit)
#> [1] "Condition_1" "Condition_2"
test_diff(fit, Condition_1 - Condition_2)
#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.63066796 0.8292043 0.13868715 0.5197612 0.2668286 4 #> 2 protein_2 0.70057164 0.8292043 -0.09230335 -0.4133169 0.2233234 4 #> 3 protein_3 0.02960657 0.1480328 1.17063438 3.3117272 0.3534815 4 #> 4 protein_4 0.15879472 0.4800317 0.31384244 1.7293959 0.1814752 4 #> 5 protein_5 0.82920433 0.8292043 0.05036218 0.2302352 0.2187423 4 #> 6 protein_6 0.81128343 0.8292043 0.05249717 0.2550194 0.2058556 4 #> 7 protein_7 0.19201270 0.4800317 0.23846547 1.5677433 0.1521075 4 #> 8 protein_8 0.53561684 0.8292043 -0.14429122 -0.6768613 0.2131769 4 #> 9 protein_9 0.72667595 0.8292043 -0.12732876 -0.3750301 0.3395161 4 #> 10 protein_10 0.01098839 0.1098839 -0.81158997 -4.4801736 0.1811515 4 #> avg_abundance n_approx n_obs #> 1 18.22209 3.003094 3 #> 2 20.00984 3.989574 4 #> 3 17.52142 1.363857 1 #> 4 21.28096 6.000000 6 #> 5 21.21086 4.977908 5 #> 6 19.59506 5.345718 5 #> 7 23.08283 6.000000 6 #> 8 19.06041 4.267228 4 #> 9 20.00061 4.983345 5 #> 10 23.52646 6.000000 6
suppressPackageStartupMessages(library(SummarizedExperiment)) se <- generate_synthetic_data(n_proteins = 10, n_conditions = 3, return_summarized_experiment = TRUE) colData(se)$age <- rnorm(9, mean=45, sd=5) colData(se)
#> DataFrame with 9 rows and 4 columns #> group true_dropout_curve_position true_dropout_curve_scale #> <factor> <numeric> <numeric> #> Condition_1-1 Condition_1 18.5 -1.2 #> Condition_1-2 Condition_1 18.5 -1.2 #> Condition_1-3 Condition_1 18.5 -1.2 #> Condition_2-1 Condition_2 18.5 -1.2 #> Condition_2-2 Condition_2 18.5 -1.2 #> Condition_2-3 Condition_2 18.5 -1.2 #> Condition_3-1 Condition_3 18.5 -1.2 #> Condition_3-2 Condition_3 18.5 -1.2 #> Condition_3-3 Condition_3 18.5 -1.2 #> age #> <numeric> #> Condition_1-1 45.4767700483905 #> Condition_1-2 42.6859029002183 #> Condition_1-3 37.6555892195272 #> Condition_2-1 45.7634325276076 #> Condition_2-2 53.8688130565859 #> Condition_2-3 41.7596453324248 #> Condition_3-1 44.000912621659 #> Condition_3-2 48.4462186648859 #> Condition_3-3 45.180727549183
fit <- proDA(se, design = ~ group + age) result_names(fit)
#> [1] "Intercept" "groupCondition_2" "groupCondition_3" "age"
test_diff(fit, "groupCondition_2", n_max = 3, sort_by = "pval")
#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.05766252 0.3596771 1.186875 2.453854 0.4836778 5 #> 10 protein_10 0.10541452 0.3596771 1.063993 1.973770 0.5390665 5 #> 3 protein_3 0.10790314 0.3596771 1.296400 1.955533 0.6629396 5 #> avg_abundance n_approx n_obs #> 1 18.20726 2.643421 1 #> 10 18.11640 4.664379 4 #> 3 20.20987 8.362519 8
# F-test test_diff(fit, reduced_model = ~ group)
#> name pval adj_pval f_statistic df1 df2 avg_abundance #> 1 protein_1 0.83525553 0.9985119 5.159081e-02 1 2.926664 18.20726 #> 2 protein_2 0.17019359 0.8509679 2.250220e+00 1 8.403543 20.07761 #> 3 protein_3 0.08813637 0.8509679 3.693300e+00 1 8.645761 20.20987 #> 4 protein_4 0.91314840 0.9985119 1.256128e-02 1 9.283242 23.16640 #> 5 protein_5 0.63759269 0.9985119 2.371096e-01 1 9.283242 21.20843 #> 6 protein_6 0.99851187 0.9985119 3.670676e-06 1 9.283242 22.53936 #> 7 protein_7 0.41659423 0.9985119 7.228662e-01 1 9.283242 20.41743 #> 8 protein_8 0.93818225 0.9985119 7.179518e-03 1 2.790716 18.39177 #> 9 protein_9 0.65665475 0.9985119 2.190582e-01 1 5.861225 19.13118 #> 10 protein_10 0.31029173 0.9985119 1.276726e+00 1 4.947621 18.11640 #> n_approx n_obs #> 1 2.643421 1 #> 2 8.120301 8 #> 3 8.362519 8 #> 4 9.000000 9 #> 5 9.000000 9 #> 6 9.000000 9 #> 7 9.000000 9 #> 8 2.507474 1 #> 9 5.577983 5 #> 10 4.664379 4