dreemstat

A compilation of custom R functions related to data science designed to make your work easier.

Available functions:

varselector()

This function allows you to quickly run (seperately or together) a Fast-forward, Lasso, Ridge and Elastic Net regressions. It uses RMSE and MAE to compare the models from each technique and makes a final recommendation for variable selection.

Beside a final conclusion on variable selection, you are also provided all information that was used for said conclusion. Finally, the created models are saved to your environment, in case they are needed.

You can also call ?dreemstat::varselector() to read a comprehensive documentation.

Brief User Guide of varselector():

> Mandatory Input:

df: Dataframe object that is unsplit in terms of train/test data. Also, subsetted, meaning remove any unwanted columns you don't want in the model. | e.g., df = select(analysis_df, -id)

y: String that represents the name of your dependent variable, as it is called in your dataframe. | e.g., y = 'clicks'

cv: trainControl object that specifies the cross-validation (CV) folds. See documentation of caret package for additional information: ?caret::train() & ?caret::trainControl() | e.g., cv = trainControl(method = "cv", number = 5)

lambda: Specify the lambda value used for tuning of the ridge and lasso models in the tuneGrid parameter of caret::train(). | e.g., lambda = c(seq(0.1, 2, by =0.1) , seq(2, 5, 0.5) , seq(5, 25, 5))

alpha: Specify the alpha value used for tuning of the elastic net model in the tuneGrid parameter of caret::train(). This value is 0 by default for Ridge and 1 for Lasso. In the Elastic net it is systematically varied to find the balance between Lasso-Ridge. | e.g., alpha = seq(0.00, 1, 0.1)

> Optional Input:

model_id: Specify a string for naming purposes of the model objects generated by varselector. This is useful varselector is used in a loop, so that model object names generated by the function will remain unique. When used in a loop make sure that each iteration of the loop the model_id value changes (e.g., 1,2,3,4...)

mode: Specify which regression to run, default is all 4 methods ('all').

Abbreviations: 'ffr' = run fast-forward; 'rr' = run ridge regression; 'lr' = run lasso regression; 'enr' = elastic net regression

split: A number between 0.01-0.99 specifying the proportion to split your dataframe into test-train datasets within the function. By default the value is 0.80, meaning 80% training and 20% testing split. e.g., split = 0.75

outlier_checker()

Coming soon

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
dreemstat.Rproj		dreemstat.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dreemstat

Available functions:

varselector()

Brief User Guide of varselector():

> Mandatory Input:

> Optional Input:

outlier_checker()

About

Releases

Packages

Languages

ETA444/dreemstat

Folders and files

Latest commit

History

Repository files navigation

dreemstat

Available functions:

varselector()

Brief User Guide of varselector():

> Mandatory Input:

> Optional Input:

outlier_checker()

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages