Row-wise tests of difference using the probabilistic dropout model

This is a helper function that combines the call of proDA() and test_diff(). If you need more flexibility use those functions.

pd_row_t_test(X, Y, moderate_location = TRUE, moderate_variance = TRUE,
  alternative = c("two.sided", "greater", "less"),
  pval_adjust_method = "BH", location_prior_df = 3, max_iter = 20,
  epsilon = 0.001, return_fit = FALSE, verbose = FALSE)

pd_row_f_test(X, ..., groups = NULL, moderate_location = TRUE,
  moderate_variance = TRUE, pval_adjust_method = "BH",
  location_prior_df = 3, max_iter = 20, epsilon = 0.001,
  return_fit = FALSE, verbose = FALSE)

Arguments

X, Y, ...	the matrices for condition 1, 2 and so on. They must have the same number of rows.
moderate_location	boolean values to indicate if the location and the variances are moderated. Default: `TRUE`
moderate_variance	boolean values to indicate if the location and the variances are moderated. Default: `TRUE`
alternative	a string that decides how the hypothesis test is done. This parameter is only relevant for the Wald-test specified using the `contrast` argument. Default: `"two.sided"`
pval_adjust_method	a string the indicates the method that is used to adjust the p-value for the multiple testing. It must match the options in `p.adjust`. Default: `"BH"`
location_prior_df	the number of degrees of freedom used for the location prior. A large number (> 30) means that the prior is approximately Normal. Default: `3`
max_iter	the maximum of iterations `proDA()` tries to converge to the hyper-parameter estimates. Default: `20`
epsilon	if the remaining error is smaller than `epsilon` the model has converged. Default: `1e-3`
return_fit	boolean that signals that in addition to the data.frame with the hypothesis test results, the fit from `proDA()` is returned. Default: `FALSE`
verbose	boolean that signals if the method prints messages during the fitting. Default: `FALSE`
groups	a factor or character vector with that assignes the columns of `X` to different conditions. This parameter is only applicable for the F-test and must be specified if only a single matrix is provided.

Value

If return_fit == FALSE a data.frame is returned with the content that is described in test_diff.

If return_fit == TRUE a list is returned with two elements: fit with a reference to the object returned from proDA() and a test_result() with the data.frame returned from test_diff().

Details

The pd_row_t_test is not actually doing a t-test, but rather a Wald test. But, as the two are closely related and term t-test is more widely understood, we choose to use that name.

Examples

  data1 <- matrix(rnorm(10 * 3), nrow=10)
  data2 <- matrix(rnorm(10 * 4), nrow=10)
  data3 <- matrix(rnorm(10 * 2), nrow=10)

  # Comparing two datasets
  pd_row_t_test(data1, data2)
#>    name      pval  adj_pval        diff t_statistic        se df avg_abundance
#> 1     1 0.6318223 0.9610461 -0.17403899 -0.50990336 0.3413176  5    0.08441427
#> 2     2 0.8537999 0.9610461  0.06719201  0.19401537 0.3463231  5    0.09955047
#> 3     3 0.7060897 0.9610461  0.14086471  0.39939698 0.3526935  5    0.11559041
#> 4     4 0.2003586 0.9610461 -0.52457621 -1.47448765 0.3557685  5    0.06480774
#> 5     5 0.6111559 0.9610461  0.18858601  0.54189668 0.3480110  5    0.06870236
#> 6     6 0.2694543 0.9610461  0.52366491  1.24158862 0.4217701  5    0.20250791
#> 7     7 0.7403208 0.9610461  0.11750324  0.35039495 0.3353451  5   -0.01569756
#> 8     8 0.5919206 0.9610461  0.20569825  0.57224953 0.3594555  5    0.12727617
#> 9     9 0.8981846 0.9610461  0.04831160  0.13459189 0.3589489  5    0.17334037
#> 10   10 0.9610461 0.9610461  0.01774168  0.05133534 0.3456036  5    0.10621068
#>    n_approx n_obs
#> 1  7.000000     7
#> 2  7.000000     7
#> 3  7.000000     7
#> 4  7.000000     7
#> 5  7.000000     7
#> 6  7.000000     7
#> 7  7.000000     7
#> 8  7.000000     7
#> 9  7.000000     7
#> 10 7.000026     7

  # Comparing multiple datasets
  pd_row_f_test(data1, data2, data3)
#>    name      pval adj_pval f_statistic df1 df2 avg_abundance n_approx n_obs
#> 1     1 0.9164410 0.988259  0.08725759   2 Inf  0.0682202255 9.000001     9
#> 2     2 0.9882590 0.988259  0.01181043   2 Inf  0.0688035749 9.000000     9
#> 3     3 0.9497317 0.988259  0.05157570   2 Inf  0.0761795645 9.000000     9
#> 4     4 0.5218571 0.988259  0.65036152   2 Inf  0.0558458728 9.000000     9
#> 5     5 0.9119464 0.988259  0.09217410   2 Inf  0.0549571863 9.000009     9
#> 6     6 0.5125471 0.988259  0.66836272   2 Inf  0.1003033936 9.000000     9
#> 7     7 0.9333549 0.988259  0.06896978   2 Inf  0.0002559956 9.000000     9
#> 8     8 0.8167696 0.988259  0.20239824   2 Inf  0.1039868599 9.000021     9
#> 9     9 0.7983868 0.988259  0.22516214   2 Inf  0.0712991005 9.000000     9
#> 10   10 0.8886323 0.988259  0.11807171   2 Inf  0.0477848046 9.000000     9

  # Alternative
  data_comb <- cbind(data1, data2, data3)
  pd_row_f_test(data_comb,
     groups = c(rep("A",3), rep("B", 4), rep("C", 2)))
#>    name      pval adj_pval f_statistic df1 df2 avg_abundance n_approx n_obs
#> 1     1 0.9164410 0.988259  0.08725759   2 Inf  0.0682202255 9.000001     9
#> 2     2 0.9882590 0.988259  0.01181043   2 Inf  0.0688035749 9.000000     9
#> 3     3 0.9497317 0.988259  0.05157570   2 Inf  0.0761795645 9.000000     9
#> 4     4 0.5218571 0.988259  0.65036152   2 Inf  0.0558458728 9.000000     9
#> 5     5 0.9119464 0.988259  0.09217410   2 Inf  0.0549571863 9.000009     9
#> 6     6 0.5125471 0.988259  0.66836272   2 Inf  0.1003033936 9.000000     9
#> 7     7 0.9333549 0.988259  0.06896978   2 Inf  0.0002559956 9.000000     9
#> 8     8 0.8167696 0.988259  0.20239824   2 Inf  0.1039868599 9.000021     9
#> 9     9 0.7983868 0.988259  0.22516214   2 Inf  0.0712991005 9.000000     9
#> 10   10 0.8886323 0.988259  0.11807171   2 Inf  0.0477848046 9.000000     9

  # t.test, lm, pd_row_t_test, and pd_row_f_test are
  # approximately equivalent on fully observed data
  set.seed(1)
  x <- rnorm(5)
  y <- rnorm(5, mean=0.3)

  t.test(x, y)
#> 
#> 	Welch Two Sample t-test
#> 
#> data:  x and y
#> t = -0.58413, df = 7.1385, p-value = 0.5771
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.5391981  0.9274665
#> sample estimates:
#> mean of x mean of y 
#> 0.1292699 0.4351357 
#> 
  summary(lm(c(x, y) ~ cond,
             data = data.frame(cond = c(rep("x", 5),
                                        rep("y", 5)))))$coefficients[2,]
#>   Estimate Std. Error    t value   Pr(>|t|) 
#>  0.3058658  0.5236289  0.5841270  0.5752316 
  pd_row_t_test(matrix(x, nrow=1), matrix(y, nrow=1),
                moderate_location = FALSE,
                moderate_variance = FALSE)
#>   name      pval  adj_pval       diff t_statistic        se df avg_abundance
#> 1    1 0.5752316 0.5752316 -0.3058658   -0.584127 0.5236289  8     0.2822028
#>   n_approx n_obs
#> 1       10    10
  pd_row_f_test(matrix(x, nrow=1), matrix(y, nrow=1),
                moderate_location = FALSE,
                moderate_variance = FALSE)
#>   name      pval  adj_pval f_statistic df1 df2 avg_abundance n_approx n_obs
#> 1    1 0.5752316 0.5752316   0.3412044   1   8     0.2822028       10    10

Row-wise tests of difference using the probabilistic dropout model

Arguments

Value

Details

See also

Examples

Contents