The `test_diff()` function is used to test coefficients of a 'proDAFit'
object. It provides a Wald test to test individual
coefficients and a likelihood ratio F-test to compare the
original model with a reduced model. The `result_names`

method provides a quick overview which coefficients are
available for testing.

test_diff(fit, contrast, reduced_model = ~1, alternative = c("two.sided", "greater", "less"), pval_adjust_method = "BH", sort_by = NULL, decreasing = FALSE, n_max = Inf, verbose = FALSE) # S4 method for proDAFit result_names(fit)

fit | an object of class 'proDAFit'. Usually, this is
produced by calling |
---|---|

contrast | an expression or a string specifying which
contrast is tested. It can be a single coefficient (to see
the available options use |

reduced_model | If you don't want to test an individual
coefficient, you can can specify a reduced model and compare
it with the original model using an F-test. This is useful
to find out how a set of parameters affect the goodness of
the fit. If neither a |

alternative | a string that decides how the
hypothesis test is done. This parameter is only relevant for
the Wald-test specified using the `contrast` argument.
Default: |

pval_adjust_method | a string the indicates the method
that is used to adjust the p-value for the multiple testing.
It must match the options in |

sort_by | a string that specifies the column that is used
to sort the resulting data.frame. Default: |

decreasing | a boolean to indicate if the order is reversed.
Default: |

n_max | the maximum number of rows returned by the method.
Default: |

verbose | boolean that signals if the method prints informative
messages. Default: |

The `result_names()` function returns a character vector.

The `test_diff()` function returns a `data.frame`

with one row per protein
with the key parameters of the statistical test. Depending what kind of test
(Wald or F test) the content of the `data.frame` differs.

The Wald test, which can considered equivalent to a t-test, returns a `data.frame` with the following columns:

- name
the name of the protein, extracted from the rowname of the input matrix

- pval
the p-value of the statistical test

- adj_pval
the multiple testing adjusted p-value

- diff
the difference that particular coefficient makes. In differential expression analysis this value is also called log fold change, which is equivalent to the difference on the log scale.

- t_statistic
the

`diff`

divided by the standard error`se`

- se
the standard error associated with the

`diff`

- df
the degrees of freedom, which describe the amount of available information for estimating the

`se`

. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.- avg_abundance
the estimate of the average abundance of the protein across all samples.

- n_approx
the approximated information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

- n_obs
the number of samples a protein was observed in

The F-test returns a `data.frame` with the following columns

- name
the name of the protein, extracted from the rowname of the input matrix

- pval
the p-value of the statistical test

- adj_pval
the multiple testing adjusted p-value

- f_statistic
the ratio of difference of normalized deviances from original model and the reduced model, divided by the standard deviation.

- df1
the difference of the number of coefficients in the original model and the number of coefficients in the reduced model

- df2
the degrees of freedom, which describe the amount of available information for estimating the

`se`

. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.- avg_abundance
the estimate of the average abundance of the protein across all samples.

- n_approx
the information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

- n_obs
the number of samples a protein was observed in

To test if coefficient is different from zero with a Wald
test use the `contrast`

function argument. To test if two
models differ with an F-test use the `reduced_model`

argument. Depending on the test that is conducted, the functions
returns slightly different data.frames.

The function is designed to follow the principles of the
base R test functions (ie. `t.test`

and
`wilcox.test`

) and the functions designed
for collecting the results of high-throughput testing
(ie. `limma::topTable`

and `DESeq2::results`

).

The contrast argument is inspired by
`limma::makeContrasts`

.

# "t-test" syn_data <- generate_synthetic_data(n_proteins = 10) fit <- proDA(syn_data$Y, design = syn_data$groups) result_names(fit)#> [1] "Condition_1" "Condition_2"test_diff(fit, Condition_1 - Condition_2)#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.63066796 0.8292043 0.13868715 0.5197612 0.2668286 4 #> 2 protein_2 0.70057164 0.8292043 -0.09230335 -0.4133169 0.2233234 4 #> 3 protein_3 0.02960657 0.1480328 1.17063438 3.3117272 0.3534815 4 #> 4 protein_4 0.15879472 0.4800317 0.31384244 1.7293959 0.1814752 4 #> 5 protein_5 0.82920433 0.8292043 0.05036218 0.2302352 0.2187423 4 #> 6 protein_6 0.81128343 0.8292043 0.05249717 0.2550194 0.2058556 4 #> 7 protein_7 0.19201270 0.4800317 0.23846547 1.5677433 0.1521075 4 #> 8 protein_8 0.53561684 0.8292043 -0.14429122 -0.6768613 0.2131769 4 #> 9 protein_9 0.72667595 0.8292043 -0.12732876 -0.3750301 0.3395161 4 #> 10 protein_10 0.01098839 0.1098839 -0.81158997 -4.4801736 0.1811515 4 #> avg_abundance n_approx n_obs #> 1 18.22209 3.003094 3 #> 2 20.00984 3.989574 4 #> 3 17.52142 1.363857 1 #> 4 21.28096 6.000000 6 #> 5 21.21086 4.977908 5 #> 6 19.59506 5.345718 5 #> 7 23.08283 6.000000 6 #> 8 19.06041 4.267228 4 #> 9 20.00061 4.983345 5 #> 10 23.52646 6.000000 6suppressPackageStartupMessages(library(SummarizedExperiment)) se <- generate_synthetic_data(n_proteins = 10, n_conditions = 3, return_summarized_experiment = TRUE) colData(se)$age <- rnorm(9, mean=45, sd=5) colData(se)#> DataFrame with 9 rows and 4 columns #> group true_dropout_curve_position true_dropout_curve_scale #> <factor> <numeric> <numeric> #> Condition_1-1 Condition_1 18.5 -1.2 #> Condition_1-2 Condition_1 18.5 -1.2 #> Condition_1-3 Condition_1 18.5 -1.2 #> Condition_2-1 Condition_2 18.5 -1.2 #> Condition_2-2 Condition_2 18.5 -1.2 #> Condition_2-3 Condition_2 18.5 -1.2 #> Condition_3-1 Condition_3 18.5 -1.2 #> Condition_3-2 Condition_3 18.5 -1.2 #> Condition_3-3 Condition_3 18.5 -1.2 #> age #> <numeric> #> Condition_1-1 45.4767700483905 #> Condition_1-2 42.6859029002183 #> Condition_1-3 37.6555892195272 #> Condition_2-1 45.7634325276076 #> Condition_2-2 53.8688130565859 #> Condition_2-3 41.7596453324248 #> Condition_3-1 44.000912621659 #> Condition_3-2 48.4462186648859 #> Condition_3-3 45.180727549183#> [1] "Intercept" "groupCondition_2" "groupCondition_3" "age"test_diff(fit, "groupCondition_2", n_max = 3, sort_by = "pval")#> name pval adj_pval diff t_statistic se df #> 1 protein_1 0.05766252 0.3596771 1.186875 2.453854 0.4836778 5 #> 10 protein_10 0.10541452 0.3596771 1.063993 1.973770 0.5390665 5 #> 3 protein_3 0.10790314 0.3596771 1.296400 1.955533 0.6629396 5 #> avg_abundance n_approx n_obs #> 1 18.20726 2.643421 1 #> 10 18.11640 4.664379 4 #> 3 20.20987 8.362519 8# F-test test_diff(fit, reduced_model = ~ group)#> name pval adj_pval f_statistic df1 df2 avg_abundance #> 1 protein_1 0.83525553 0.9985119 5.159081e-02 1 2.926664 18.20726 #> 2 protein_2 0.17019359 0.8509679 2.250220e+00 1 8.403543 20.07761 #> 3 protein_3 0.08813637 0.8509679 3.693300e+00 1 8.645761 20.20987 #> 4 protein_4 0.91314840 0.9985119 1.256128e-02 1 9.283242 23.16640 #> 5 protein_5 0.63759269 0.9985119 2.371096e-01 1 9.283242 21.20843 #> 6 protein_6 0.99851187 0.9985119 3.670676e-06 1 9.283242 22.53936 #> 7 protein_7 0.41659423 0.9985119 7.228662e-01 1 9.283242 20.41743 #> 8 protein_8 0.93818225 0.9985119 7.179518e-03 1 2.790716 18.39177 #> 9 protein_9 0.65665475 0.9985119 2.190582e-01 1 5.861225 19.13118 #> 10 protein_10 0.31029173 0.9985119 1.276726e+00 1 4.947621 18.11640 #> n_approx n_obs #> 1 2.643421 1 #> 2 8.120301 8 #> 3 8.362519 8 #> 4 9.000000 9 #> 5 9.000000 9 #> 6 9.000000 9 #> 7 9.000000 9 #> 8 2.507474 1 #> 9 5.577983 5 #> 10 4.664379 4