ENH: assert_allclose to allow element number tolerance #26782

tankbattle · 2024-06-22T04:02:46Z

Proposed new feature or change:

In assert_allclose (https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py#L1581), it requires all elements to match within tolerance. However, there's always outliers especially for very large arrays. If we provide a small tolerance, unit tests can only work for small arrays. If we provide large tolerance, the test may be too relaxed for majority of the elements.

I'm proposing to introduce an additional parameter to assert_allclose() to indicate the number of elements that can go beyond tolerance. In the test, the above issue can be addressed by having multiple assert_allclose() with different tolerance and #elements within tolerance. For example, we can assert 99.9% of elements are within rtol=0.01 while for all elements the rtol may be 100, by
# assume a & b are all very large arrays
assert_allclose(a, b, rtol=0.01, outlier_elements=10)
assert_allclose(a, b, rtol=100, outlier_elements=0)

rgommers · 2024-06-23T09:04:17Z

Thanks for the proposal @tankbattle.

This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.

However, there's always outliers especially for very large arrays.

Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.

tankbattle · 2024-06-23T20:31:39Z

Thanks @rgommers !

This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.

Yea, I think it's OK to implement outside of numpy. I was wondering this proposal can still allow outliers to be covered by two assert statements in a single test, one with smaller tolerance but some outliers, and the other with larger tolerance but allow 0 outliers.

However, there's always outliers especially for very large arrays.

Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.
The use case we have is to test a normal, un-optimized implementation is close enough to an optimized implementation. E.g. unquantized and quantized results are close enough. we are generating a number of test cases randomly and compare the results. The tests are still deterministic (by fixing the random keys) but it's still hard to precisely indicate what elements in each test case are the outlier. Hence, I think assertions based on distribution can be useful.

rgommers added the component: numpy.testing label Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: assert_allclose to allow element number tolerance #26782

ENH: assert_allclose to allow element number tolerance #26782

tankbattle commented Jun 22, 2024

rgommers commented Jun 23, 2024

tankbattle commented Jun 23, 2024 •

edited

Loading

ENH: assert_allclose to allow element number tolerance #26782

ENH: assert_allclose to allow element number tolerance #26782

Comments

tankbattle commented Jun 22, 2024

Proposed new feature or change:

rgommers commented Jun 23, 2024

tankbattle commented Jun 23, 2024 • edited Loading

tankbattle commented Jun 23, 2024 •

edited

Loading