R/generate_synthetic_data.R
generate_synthetic_data.Rd
Generate a dataset according to the probabilistic dropout model
generate_synthetic_data(n_proteins, n_conditions = 2, n_replicates = 3, frac_changed = 0.1, dropout_curve_position = 18.5, dropout_curve_scale = -1.2, location_prior_mean = 20, location_prior_scale = 4, variance_prior_scale = 0.05, variance_prior_df = 2, effect_size = 2, return_summarized_experiment = FALSE)
n_proteins | the number of rows in the dataset |
---|---|
n_conditions | the number of conditions. Default: 2 |
n_replicates | the number of replicates per condition.
Can either be a single number or a vector with
|
frac_changed | the fraction of proteins that actually differ between the conditions. Default: 0.1 |
dropout_curve_position | the point where the chance
to observe a value is 50%. Can be a single number or
a vector of |
dropout_curve_scale | The width of the dropout curve.
Negative numbers mean that lower intensities are more likely
to be missing.
Can be a single number or a vector of
|
location_prior_mean, location_prior_scale | the position and the variance around which the individual
condition means ( |
variance_prior_scale, variance_prior_df | the scale and the degrees of freedom of the inverse Chi-squared distribution used as a prior for the variances. Default: 0.05 and 2 |
effect_size | the standard deviation that is used to draw
different values for the |
return_summarized_experiment | a boolean indicator if
the method should return a |
a list with the following elements
the intensity matrix including the missing values
the intensity matrix before dropping out values
a matrix with n_proteins
rows and
n_conditions
columns that contains the underlying
means for each protein
a vector with the true variances for each protein
a vector with boolean values if the protein is actually changed
the group structure mapping samples to conditions
if return_summarized_experiment
is FALSE
. Otherwise
returns a SummarizedExperiment
with the same information.
#> [1] "Y" "Z" "t_mu" "t_sigma2" "changed" "groups"#> Condition_1-1 Condition_1-2 Condition_1-3 Condition_2-1 Condition_2-2 #> protein_1 NA NA NA NA 17.50241 #> protein_2 NA NA NA NA NA #> protein_3 NA NA 18.07870 NA NA #> protein_4 20.73936 20.79656 20.71465 20.78360 20.43842 #> protein_5 20.93270 20.41703 20.42440 20.59114 20.46230 #> protein_6 NA NA 17.75419 NA NA #> Condition_2-3 #> protein_1 NA #> protein_2 NA #> protein_3 NA #> protein_4 20.29244 #> protein_5 20.47756 #> protein_6 17.80422# Returning a SummarizedExperiment se <- generate_synthetic_data(n_proteins = 10, return_summarized_experiment = TRUE) se#> class: SummarizedExperiment #> dim: 10 6 #> metadata(0): #> assays(2): abundances full_observations #> rownames(10): protein_1 protein_2 ... protein_9 protein_10 #> rowData names(4): changed true_s2 true_Condition_1 true_Condition_2 #> colnames(6): Condition_1-1 Condition_1-2 ... Condition_2-2 #> Condition_2-3 #> colData names(3): group true_dropout_curve_position #> true_dropout_curve_scale#> Condition_1-1 Condition_1-2 Condition_1-3 Condition_2-1 Condition_2-2 #> protein_1 23.63188 24.56854 23.95489 23.77380 23.66998 #> protein_2 NA NA NA NA 18.23310 #> protein_3 18.87651 NA NA 19.06291 NA #> protein_4 20.45522 20.11678 20.15111 20.71967 20.53295 #> protein_5 19.59775 19.40209 19.47563 19.31781 19.53988 #> protein_6 21.03080 20.46508 20.49290 21.05010 20.65255 #> Condition_2-3 #> protein_1 23.51429 #> protein_2 18.24830 #> protein_3 NA #> protein_4 20.54349 #> protein_5 19.95840 #> protein_6 20.94089