forked from facebookresearch/vissl
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve the DATA_LIMIT atribute to be able to handle more uses cases (f…
…acebookresearch#216) Summary: This PR is a draft, pushed for visibility and discussion. The additional uses cases I propose to support are: - being able to sub-select part of a dataset in a balanced way (each label is included the same number of time) - being able to sub-select exclusive parts of the same dataset (for instance to have a validation set that does not intersect with a training set, useful for HP searches) - make sure that this sub-sampling is deterministic (same seed across all distributed workers) This would avoid having to create sub-sets of datasets such as ImageNet to test on 1% of each label for instance. It would also allow to benchmark SSL algorithms on low data regime in a more flexible way. /!\ This PR introduces a breaking change (DATA_LIMIT is not an integer anymore but a structure) This PR includes: - unit tests for the sub-sampling strategies - update of all configuration using the DATA_LIMIT attribute Pull Request resolved: facebookresearch#216 Reviewed By: prigoyal Differential Revision: D26923493 Pulled By: QuentinDuval fbshipit-source-id: b4ed7c61369587ac9349218933b5eed357c19b06
- Loading branch information
1 parent
a67277c
commit 930e0c9
Showing
6 changed files
with
227 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved | ||
|
||
import unittest | ||
|
||
import numpy as np | ||
from vissl.data.data_helper import balanced_sub_sampling, unbalanced_sub_sampling | ||
|
||
|
||
class TestDataLimitSubSampling(unittest.TestCase): | ||
""" | ||
Testing the DATA_LIMIT underlying sub sampling methods | ||
""" | ||
|
||
def test_unbalanced_sub_sampling(self): | ||
labels = np.array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 0]) | ||
|
||
indices1 = unbalanced_sub_sampling(len(labels), num_samples=8, skip_samples=0) | ||
self.assertEqual(8, len(indices1)) | ||
self.assertEqual(len(indices1), len(set(indices1)), "indices must be unique") | ||
|
||
indices2 = unbalanced_sub_sampling(len(labels), num_samples=8, skip_samples=2) | ||
self.assertEqual(8, len(indices2)) | ||
self.assertEqual(len(indices2), len(set(indices2)), "indices must be unique") | ||
|
||
self.assertTrue( | ||
np.array_equal(indices1[2:], indices2[:-2]), | ||
"skipping samples should slide the window", | ||
) | ||
|
||
def test_balanced_sub_sampling(self): | ||
labels = np.array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 0]) | ||
unique_labels = set(labels) | ||
|
||
indices1 = balanced_sub_sampling(labels, num_samples=8, skip_samples=0) | ||
values, counts = np.unique(labels[indices1], return_counts=True) | ||
self.assertEqual(8, len(indices1)) | ||
self.assertEqual( | ||
set(values), | ||
set(unique_labels), | ||
"at least one of each label should be selected", | ||
) | ||
self.assertEqual(2, np.min(counts), "at least two of each label is selected") | ||
self.assertEqual(2, np.max(counts), "at most two of each label is selected") | ||
|
||
indices2 = balanced_sub_sampling(labels, num_samples=8, skip_samples=4) | ||
self.assertEqual(8, len(indices2)) | ||
self.assertEqual( | ||
4, | ||
len(set(indices1) & set(indices2)), | ||
"skipping samples should slide the window", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters