Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: assert_allclose to allow element number tolerance #26782

Open
tankbattle opened this issue Jun 22, 2024 · 2 comments
Open

ENH: assert_allclose to allow element number tolerance #26782

tankbattle opened this issue Jun 22, 2024 · 2 comments

Comments

@tankbattle
Copy link

Proposed new feature or change:

In assert_allclose (https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py#L1581), it requires all elements to match within tolerance. However, there's always outliers especially for very large arrays. If we provide a small tolerance, unit tests can only work for small arrays. If we provide large tolerance, the test may be too relaxed for majority of the elements.

I'm proposing to introduce an additional parameter to assert_allclose() to indicate the number of elements that can go beyond tolerance. In the test, the above issue can be addressed by having multiple assert_allclose() with different tolerance and #elements within tolerance. For example, we can assert 99.9% of elements are within rtol=0.01 while for all elements the rtol may be 100, by
# assume a & b are all very large arrays
assert_allclose(a, b, rtol=0.01, outlier_elements=10)
assert_allclose(a, b, rtol=100, outlier_elements=0)

@rgommers
Copy link
Member

Thanks for the proposal @tankbattle.

This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.

However, there's always outliers especially for very large arrays.

Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.

@tankbattle
Copy link
Author

tankbattle commented Jun 23, 2024

Thanks @rgommers !

This shouldn't be too difficult to do outside of numpy, right? It's a pretty niche feature request, and there's a lot of variations of the proposed ideas possible (e.g., I imagine one may want tolerances for how much the outliers may be off), so I'd much prefer to see it implemented outside of numpy.

Yea, I think it's OK to implement outside of numpy. I was wondering this proposal can still allow outliers to be covered by two assert statements in a single test, one with smaller tolerance but some outliers, and the other with larger tolerance but allow 0 outliers.

However, there's always outliers especially for very large arrays.

Actually no, this is quite rare for testing use cases. It is more common for real-world data, but that is not what numpy.testing is meant to be used for.
The use case we have is to test a normal, un-optimized implementation is close enough to an optimized implementation. E.g. unquantized and quantized results are close enough. we are generating a number of test cases randomly and compare the results. The tests are still deterministic (by fixing the random keys) but it's still hard to precisely indicate what elements in each test case are the outlier. Hence, I think assertions based on distribution can be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants