The function works similar to the classical `lm`

but with special handling of `NA`

's. Whereas `lm`

usually
just ignores response value that are missing, `pd_lm`

applies
a probabilistic dropout model, that assumes that missing values
occur because of the dropout curve. The dropout curve describes for
each position the chance that that a value is missed. A negative
`dropout_curve_scale`

means that the lower the intensity was,
the more likely it is to miss the value.

pd_lm(formula, data = NULL, subset = NULL, dropout_curve_position, dropout_curve_scale, location_prior_mean = NULL, location_prior_scale = NULL, variance_prior_scale = NULL, variance_prior_df = NULL, location_prior_df = 3, method = c("analytic_hessian", "analytic_grad", "numeric"), verbose = FALSE)

formula | a formula that specifies a linear model |
---|---|

data | an optional data.frame whose columns can be used to
specify the |

subset | an optional selection vector for data to subset it |

dropout_curve_position | the value where the chance to observe a value is 50%. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |

dropout_curve_scale | the width of the dropout curve. Smaller values mean that the sigmoidal curve is steeper. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |

location_prior_mean, location_prior_scale | the optional mean and variance of the prior around which the predictions are supposed to scatter. If no value is provided no location regularization is applied. |

variance_prior_scale, variance_prior_df | the optional scale and degrees of freedom of the variance prior. If no value is provided no variance regularization is applied. |

location_prior_df | The degrees of freedom for the t-distribution of the location prior. If it is large (> 30) the prior is approximately Normal. Default: 3 |

method | one of 'analytic_hessian', 'analytic_gradient', or
'numeric'. If 'analytic_hessian' the |

verbose | boolean that signals if the method prints informative
messages. Default: |

a list with the following entries

- coefficients
a named vector with the fitted values

- coef_variance_matrix
a

`p*p`

matrix with the variance associated with each coefficient estimate- n_approx
the estimated "size" of the data set (n_hat - variance_prior_df)

- df
the estimated degrees of freedom (n_hat - p)

- s2
the estimated unbiased variance

- n_obs
the number of response values that were not `NA`

#> #> Call: #> lm(formula = y ~ 1) #> #> Coefficients: #> (Intercept) #> 20.02 #>pd_lm(y ~ 1, dropout_curve_position = NA, dropout_curve_scale = NA)#> $coefficients #> Intercept #> 20.02302 #> #> $coef_variance_matrix #> Intercept #> Intercept 0.2259314 #> #> $n_approx #> [1] 5 #> #> $df #> [1] 4 #> #> $s2 #> [1] 1.129657 #> #> $n_obs #> [1] 5 #>#> #> Call: #> lm(formula = y ~ 1) #> #> Coefficients: #> (Intercept) #> 22.2 #>pd_lm(y ~ 1, dropout_curve_position = 19, dropout_curve_scale = -1)#> $coefficients #> Intercept #> 20.90007 #> #> $coef_variance_matrix #> Intercept #> Intercept 3.400309 #> #> $n_approx #> [1] 1.451011 #> #> $df #> [1] 0.4510113 #> #> $s2 #> [1] 13.95306 #> #> $n_obs #> [1] 2 #># With only missing values y <- c(NA, NA, NA) # lm(y ~ 1) # Fails pd_lm(y ~ 1, dropout_curve_position = 19, dropout_curve_scale = -1, location_prior_mean = 21, location_prior_scale = 3, variance_prior_scale = 0.1, variance_prior_df = 2)#> $coefficients #> Intercept #> 18.77828 #> #> $coef_variance_matrix #> Intercept #> Intercept 0.3879792 #> #> $n_approx #> [1] 0.07104773 #> #> $df #> [1] 1.071048 #> #> $s2 #> [1] 0.1897697 #> #> $n_obs #> [1] 0 #>