Skip to content

A compilation of custom R functions related to data science designed to make your work easier.

Notifications You must be signed in to change notification settings

ETA444/dreemstat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dreemstat

A compilation of custom R functions related to data science designed to make your work easier.

Available functions:

varselector()

This function allows you to quickly run (seperately or together) a Fast-forward, Lasso, Ridge and Elastic Net regressions. It uses RMSE and MAE to compare the models from each technique and makes a final recommendation for variable selection.

Beside a final conclusion on variable selection, you are also provided all information that was used for said conclusion. Finally, the created models are saved to your environment, in case they are needed.

You can also call ?dreemstat::varselector() to read a comprehensive documentation.


Brief User Guide of varselector():

> Mandatory Input:

df: Dataframe object that is unsplit in terms of train/test data. Also, subsetted, meaning remove any unwanted columns you don't want in the model. | e.g., df = select(analysis_df, -id)

y: String that represents the name of your dependent variable, as it is called in your dataframe. | e.g., y = 'clicks'

cv: trainControl object that specifies the cross-validation (CV) folds. See documentation of caret package for additional information: ?caret::train() & ?caret::trainControl() | e.g., cv = trainControl(method = "cv", number = 5)

lambda: Specify the lambda value used for tuning of the ridge and lasso models in the tuneGrid parameter of caret::train(). | e.g., lambda = c(seq(0.1, 2, by =0.1) , seq(2, 5, 0.5) , seq(5, 25, 5))

alpha: Specify the alpha value used for tuning of the elastic net model in the tuneGrid parameter of caret::train(). This value is 0 by default for Ridge and 1 for Lasso. In the Elastic net it is systematically varied to find the balance between Lasso-Ridge. | e.g., alpha = seq(0.00, 1, 0.1)

> Optional Input:

model_id: Specify a string for naming purposes of the model objects generated by varselector. This is useful varselector is used in a loop, so that model object names generated by the function will remain unique. When used in a loop make sure that each iteration of the loop the model_id value changes (e.g., 1,2,3,4...)

mode: Specify which regression to run, default is all 4 methods ('all').

Abbreviations: 'ffr' = run fast-forward; 'rr' = run ridge regression; 'lr' = run lasso regression; 'enr' = elastic net regression

split: A number between 0.01-0.99 specifying the proportion to split your dataframe into test-train datasets within the function. By default the value is 0.80, meaning 80% training and 20% testing split. e.g., split = 0.75


outlier_checker()

Coming soon

About

A compilation of custom R functions related to data science designed to make your work easier.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages