multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

gitgithan · 2018-11-22T16:53:50Z

From Datacamp's "Machine Learning with the Experts: School Budgets" 2.Creating a simple first model -Setting up a train-test split in scikit-learn, the lesson text says

"Some labels don't occur very often, but we want to make sure that they appear in both the training and the test sets. We provide a function that will make sure at least min_count examples of each label appear in each split: multilabel_train_test_split"

From what i see from the source, only the test set has guarantee of min_count of each label, there is no such guarantee on the training set as described in the datacamp lesson text. Training set indices were simply the complement of test set indices with this line in def multilabel_train_test_split? train_set_mask = ~test_set_mask

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

gitgithan commented Nov 22, 2018

multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

multilabel_train_test_split in multilabel.py does not ensure min_count examples of each label appear in each split #3

Comments

gitgithan commented Nov 22, 2018