Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array Updates #633

Open
12 of 20 tasks
joshessman-llnl opened this issue Aug 9, 2021 · 10 comments
Open
12 of 20 tasks

Array Updates #633

joshessman-llnl opened this issue Aug 9, 2021 · 10 comments
Assignees
Labels
Core Issues related to Axom's 'core' component GPU Issues related to GPU development Reviewed

Comments

@joshessman-llnl
Copy link
Member

joshessman-llnl commented Aug 9, 2021

From #627

I envision making the following changes in each PR (roughly in order)

Other (smaller?) feature requests/tasks:

@joshessman-llnl joshessman-llnl added Core Issues related to Axom's 'core' component GPU Issues related to GPU development labels Aug 9, 2021
@joshessman-llnl joshessman-llnl self-assigned this Aug 9, 2021
@joshessman-llnl joshessman-llnl changed the title Array Updats Array Updates Aug 9, 2021
@joshessman-llnl
Copy link
Member Author

joshessman-llnl commented Sep 16, 2021

Some caveats/gotchas/pitfalls that have come up as I've been working through the MCArray -> Array replacement:

  1. MCArray functions like size()/capacity()/etc returned the number of tuples and not the number of elements. In the n-dimensional case this would probably correspond to the number of rank n-1 subtensors. I've chosen to instead match numpy.ndarray and unconditionally return the number of elements - see https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html

  2. emplace as implemented in MCArray doesn't really correspond to the idea of emplace in the sense that the STL uses it, so I've opted not to add that functionality to Array. If it's needed my preference would be to pick a different name for it. Additionally, the MCArray::emplace interface is a little vague in my opinion - that is, it might not necessarily be clear to someone what it does on first glance. Do any widely used numerical container classes provide similar functionality?

  3. The semantics of "raw" insert() methods are pretty straightforward in the 1D case but significantly less so in the multidimensional case - if someone wants to insert of a single value (or an array of values of arbitrary length) at some arbitrary position in a multidimensional container, how would that affect the extents/dimensions/strides? My sense is we'd want to disallow it in the multidimensional case. I have implemented an insert(IndexType pos, Array other) (and a shortcut append(Array other)) that would allow the insertion of nonscalar data:

int[] values = {1, 2, 3, 4}; // 2x2 {{1,2},{3,4}}
Array<int, 2> arr(1, 2); // create a 2D array with extents 1x2
arr(0,0) = 5;
arr(0,1) = 6;
// TODO: this will be an ArrayView since we're only temporarily wrapping a pointer
arr.insert(0, Array<int, 2>(values, 2, 2)); // temporary 2D array wrapping "values" with extents 2x2
// will print [ 1 2 3 4 5 6 ]
std::cout << arr << "\n";

The above method requires that the array argument be of the same dimension of the calling object and have identical extents in all but the leading dimension (such that you could insert a 2x2 into a 3x2 to get a 5x2). Currently the "units" of the pos argument are number of elements but I think we'd either have to check that it aligns with a "block boundary" e.g. the above insertion would only make sense if pos was a multiple of 2 - or change the "units" to the size of the "rank n-1 subtensor" I describe above.

@samuelpmishLLNL
Copy link
Contributor

MCArray functions like size()/capacity()/etc returned the number of tuples and not the number of elements. In the n-dimensional
case this would probably correspond to the number of rank n-1 subtensors. I've chosen to instead match numpy.ndarray and
unconditionally return the number of elements - see https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html

I'm tempted to side with MCArray's interpretation, here. To me, container::size() is to C++ as len() is to python (i.e. the idiomatic way of getting the number of things in a list), and len(my_numpy_array) does return the "number of rank n-1 subtensors", not the total number of contained elements. So, if we're looking to numpy for inspiration, it might make more sense to copy their conventions rather than their naming directly.

Of course, it's important to be able to get the individual dimensions as well as the total number of items contained in the array.

@samuelpmishLLNL
Copy link
Contributor

Regarding points (2) and (3), I don't expect to do any inserting, appending, or emplacing in the multidimensional case at all. So, if it were up to me, I wouldn't even worry about implementing those except in the 1D case (and then, if users decide they are essential after all, they could be added later).

@white238
Copy link
Member

white238 commented Sep 16, 2021

My thoughts on 1.

I find it odd you would want to know how many tuples are in a multidimensional array. I would expect if i ask the size of a 2x2 array, that I would receive the answer 4 elements not 2 tuples. This just gets more confusing if you do a 3x3x3, is it 3, because that's the top level amount of tuples, or 9 tuple instead of 27.

I agree if this was an array of containers, I would expect the size to be the number of those containers. But the fact that it is tuples is an internal implementation detail, not something the user of the class should care about.

@joshessman-llnl
Copy link
Member Author

@samuelpmishLLNL Your point on len is a good one - I did want to clarify that regardless of what size() does that "number of rank n-1 subtensors" is always accessible via arr.shape()[0] - additionally when Views are introduced we could also have begin()/end() iterate over those subtensors instead of individual elements

As for inserting/appending/emplacing - it looks like at least the inserting/appending gets used fairly frequently by users of MCArray, though it's often tricky to tell if it's actually doing a multidimensional insertion or if it's actually a "trivially multidimensional" array where the trailing dimension is 1. Emplacing for a MCArray is roughly "inserting x tuples at y position where each element of each tuple has the value z" - though this looks like it only gets used a handful of places in mint.

@white238
Copy link
Member

white238 commented Sep 17, 2021

My two cents for what its worth... I personally feel size() and shape() serve different use cases in this class.

Given the fact that this is a multidimensional array, would a user expect to use size() in for loop or ask for the specific dimension via shape()?

With a 2d array:

for(int i = 0; i < arr.size(); ++i) { // size is number of tuples in this case
   std::tuple t = arr[i];
   for(int j = 0; j < t.size(); ++i) {
      std::cout << t[j] << std::endl;
   }
}

or:

for(int i = 0; i < arr.shape()[0]; ++i) {
   for(int j = 0; j < arr.shape()[1]; ++i) {
      std::cout << arr(i,j) << std::endl;
   }
}

@kennyweiss
Copy link
Member

kennyweiss commented Sep 20, 2021

I agree with @white238 here. Size should be the product of the dimensions.

See also: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html

EDIT: And @joshessman-llnl's original link (I should have read closer)
From personal experience, the original MCArray API was clunky to work with -- it was never clear when we were referring to the number of tuples of the full array.

@samuelpmishLLNL
Copy link
Contributor

I don't want to belabor this discussion too much, but it may be worth looking to the mdspan proposal for inspiration on naming, since that is imminently coming to C++ and aims to accomplish a very similar task. size and other members of an mdspan are defined unambiguously in the proposal:
Screen Shot 2021-09-20 at 7 48 40 PM

@joshessman-llnl
Copy link
Member Author

joshessman-llnl commented Sep 30, 2021

The decision to temporarily punt on removing MCArray from mint was made in the interest of not having that task hold up further development.

I am not completely sure of this but I would imagine the vast majority of that effort would be characterized by the following:

  • Changing calculations that would be off by a factor of num_components now that size() et al operate in terms of number of elements
  • Replacing the nonstandard (in terms of the numpy and STL containers that Array is intended to follow where possible) methods like MCArray::emplace that get used in mint

@publixsubfan
Copy link
Contributor

I noticed that ArrayView<T>'s constructor requires a non-const reference to Array<T>, because ArrayView<T> doesn't really protect the underlying data from being modified.

As a workaround, I think we should consider adding the ability to construct ArrayView<const T> from a const Array<T>. This would be akin to how std::span<const T> preserves immutability of the underlying data.

@rhornung67 rhornung67 added this to the FY22 Development milestone Mar 21, 2022
@kennyweiss kennyweiss modified the milestones: FY22 Development, FY23 Development Mar 20, 2023
@rhornung67 rhornung67 modified the milestones: FY23 Development, April 2024 Release Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Issues related to Axom's 'core' component GPU Issues related to GPU development Reviewed
Projects
None yet
Development

No branches or pull requests

6 participants