Generate a dataset according to the probabilistic dropout model

generate_synthetic_data(n_proteins, n_conditions = 2, n_replicates = 3,
  frac_changed = 0.1, dropout_curve_position = 18.5,
  dropout_curve_scale = -1.2, location_prior_mean = 20,
  location_prior_scale = 4, variance_prior_scale = 0.05,
  variance_prior_df = 2, effect_size = 2,
  return_summarized_experiment = FALSE)

Arguments

n_proteins	the number of rows in the dataset
n_conditions	the number of conditions. Default: 2
n_replicates	the number of replicates per condition. Can either be a single number or a vector with `length(n_replicates) == n_conditions`. Default: 3
frac_changed	the fraction of proteins that actually differ between the conditions. Default: 0.1
dropout_curve_position	the point where the chance to observe a value is 50%. Can be a single number or a vector of `length(dropout_curve_position) == n_conditions * n_replicates`. Default: 18.5
dropout_curve_scale	The width of the dropout curve. Negative numbers mean that lower intensities are more likely to be missing. Can be a single number or a vector of `length(dropout_curve_position) == n_conditions * n_replicates`. Default: -1.2
location_prior_mean, location_prior_scale	the position and the variance around which the individual condition means (`t_mu`) scatter. Default: 20 and 4
variance_prior_scale, variance_prior_df	the scale and the degrees of freedom of the inverse Chi-squared distribution used as a prior for the variances. Default: 0.05 and 2
effect_size	the standard deviation that is used to draw different values for the `frac_changed` part of the proteins. Default: 2
return_summarized_experiment	a boolean indicator if the method should return a `SummarizedExperiment` object instead of a list. Default: `FALSE`

Value

a list with the following elements

Y: the intensity matrix including the missing values
Z: the intensity matrix before dropping out values
t_mu: a matrix with n_proteins rows and n_conditions columns that contains the underlying means for each protein
t_sigma2: a vector with the true variances for each protein
changed: a vector with boolean values if the protein is actually changed
group: the group structure mapping samples to conditions

if return_summarized_experiment is FALSE. Otherwise returns a SummarizedExperiment with the same information.

Examples

  syn_data <- generate_synthetic_data(n_proteins = 10)
  names(syn_data)
#> [1] "Y"        "Z"        "t_mu"     "t_sigma2" "changed"  "groups"  
  head(syn_data$Y)
#>           Condition_1-1 Condition_1-2 Condition_1-3 Condition_2-1 Condition_2-2
#> protein_1            NA            NA            NA            NA      17.50241
#> protein_2            NA            NA            NA            NA            NA
#> protein_3            NA            NA      18.07870            NA            NA
#> protein_4      20.73936      20.79656      20.71465      20.78360      20.43842
#> protein_5      20.93270      20.41703      20.42440      20.59114      20.46230
#> protein_6            NA            NA      17.75419            NA            NA
#>           Condition_2-3
#> protein_1            NA
#> protein_2            NA
#> protein_3            NA
#> protein_4      20.29244
#> protein_5      20.47756
#> protein_6      17.80422

  # Returning a SummarizedExperiment
  se <- generate_synthetic_data(n_proteins = 10, return_summarized_experiment = TRUE)
  se
#> class: SummarizedExperiment 
#> dim: 10 6 
#> metadata(0):
#> assays(2): abundances full_observations
#> rownames(10): protein_1 protein_2 ... protein_9 protein_10
#> rowData names(4): changed true_s2 true_Condition_1 true_Condition_2
#> colnames(6): Condition_1-1 Condition_1-2 ... Condition_2-2
#>   Condition_2-3
#> colData names(3): group true_dropout_curve_position
#>   true_dropout_curve_scale
  head(SummarizedExperiment::assay(se))
#>           Condition_1-1 Condition_1-2 Condition_1-3 Condition_2-1 Condition_2-2
#> protein_1      23.63188      24.56854      23.95489      23.77380      23.66998
#> protein_2            NA            NA            NA            NA      18.23310
#> protein_3      18.87651            NA            NA      19.06291            NA
#> protein_4      20.45522      20.11678      20.15111      20.71967      20.53295
#> protein_5      19.59775      19.40209      19.47563      19.31781      19.53988
#> protein_6      21.03080      20.46508      20.49290      21.05010      20.65255
#>           Condition_2-3
#> protein_1      23.51429
#> protein_2      18.24830
#> protein_3            NA
#> protein_4      20.54349
#> protein_5      19.95840
#> protein_6      20.94089

Generate a dataset according to the probabilistic dropout model

Arguments

Value

Examples

Contents