Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

Open
gitgithan opened this issue Nov 22, 2018 · 0 comments

Comments

@gitgithan
Copy link
Contributor

From Datacamp's "Machine Learning with the Experts: School Budgets" 2.Creating a simple first model -Setting up a train-test split in scikit-learn, the lesson text says

"Some labels don't occur very often, but we want to make sure that they appear in both the training and the test sets. We provide a function that will make sure at least min_count examples of each label appear in each split: multilabel_train_test_split"

From what i see from the source, only the test set has guarantee of min_count of each label, there is no such guarantee on the training set as described in the datacamp lesson text. Training set indices were simply the complement of test set indices with this line in def multilabel_train_test_split? train_set_mask = ~test_set_mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant