You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In assert_allclose (https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py#L1581), it requires all elements to match within tolerance. However, there's always outliers especially for very large arrays. If we provide a small tolerance, unit tests can only work for small arrays. If we provide large tolerance, the test may be too relaxed for majority of the elements.
I'm proposing to introduce an additional parameter to assert_allclose() to indicate the number of elements that can go beyond tolerance. In the test, the above issue can be addressed by having multiple assert_allclose() with different tolerance and #elements within tolerance. For example, we can assert 99.9% of elements are within rtol=0.01 while for all elements the rtol may be 100, by
# assume a & b are all very large arrays
assert_allclose(a, b, rtol=0.01, outlier_elements=10)
assert_allclose(a, b, rtol=100, outlier_elements=0)
The text was updated successfully, but these errors were encountered:
This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.
However, there's always outliers especially for very large arrays.
Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.
This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.
Yea, I think it's OK to implement outside of numpy. I was wondering this proposal can still allow outliers to be covered by two assert statements in a single test, one with smaller tolerance but some outliers, and the other with larger tolerance but allow 0 outliers.
However, there's always outliers especially for very large arrays.
Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.
The use case we have is to test a normal, un-optimized implementation is close enough to an optimized implementation. E.g. unquantized and quantized results are close enough. we are generating a number of test cases randomly and compare the results. The tests are still deterministic (by fixing the random keys) but it's still hard to precisely indicate what elements in each test case are the outlier. Hence, I think assertions based on distribution can be useful.
Proposed new feature or change:
In assert_allclose (https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py#L1581), it requires all elements to match within tolerance. However, there's always outliers especially for very large arrays. If we provide a small tolerance, unit tests can only work for small arrays. If we provide large tolerance, the test may be too relaxed for majority of the elements.
I'm proposing to introduce an additional parameter to assert_allclose() to indicate the number of elements that can go beyond tolerance. In the test, the above issue can be addressed by having multiple assert_allclose() with different tolerance and #elements within tolerance. For example, we can assert 99.9% of elements are within rtol=0.01 while for all elements the rtol may be 100, by
# assume a & b are all very large arrays
assert_allclose(a, b, rtol=0.01, outlier_elements=10)
assert_allclose(a, b, rtol=100, outlier_elements=0)
The text was updated successfully, but these errors were encountered: