Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parameter cloneRelations for clone to be able not to clone the collections #609

Merged
merged 8 commits into from
Jun 4, 2024

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented May 27, 2024

BEGINRELEASENOTES

  • Add a parameter cloneRelations for the clone method of the user facing handle classes to be able to clone without the relation information.

ENDRELEASENOTES

See this comment and this comment

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue this is intended behavior. Cloning specifically only copies the data and leaves the relations alone. Hence, it is technically user responsibility to fix up the relations. Mainly because there is no way to solve this in all generality.

I am also not entirely sure if the tests would still be working if they are done with the immutable objects. Because in that case clone default initializes the ObjectID, which is {untracked, untracked}.

coll.create();
coll[0].addparents(coll[0]);
coll[0].adddaughters(coll[1]);
coll[0].adddaughters(coll[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is its own daughter? Does this test still work if this is removed? I don't see a reason why it wouldn't but it looks a bit odd, IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. Adding the object itself to the relation is also a possible test case maybe different from adding something else

@jmcarcell
Copy link
Member Author

jmcarcell commented May 27, 2024

I would argue this is intended behavior. Cloning specifically only copies the data and leaves the relations alone. Hence, it is technically user responsibility to fix up the relations. Mainly because there is no way to solve this in all generality.

The workflow for keeping the relations as they are in a cloned collection seems too complicated then:

  • Create a new collection without cloning the elements (we will need to update the relations and we can't remove stuff from there if they are cloned), so with new empty elements where we will have to copy manually all the data and all the other relations except the one we're updating
  • Now go through each of the objects in both the original and the cloned collection. Then go to the relation we are updating, we have to find the index from the original to each object so we can set it in the cloned so that the relation also holds in the new collection.

This makes merging collections and wanting to use them as if they were "original" ones quite cumbersome, no?

Allowing the pointers to be different allows an original and cloned object to diverge and still compare to true, even if the data is the same (which is not checked for in the PR but probably should) then the relations could be different.

I am also not entirely sure if the tests would still be working if they are done with the immutable objects. Because in that case clone default initializes the ObjectID, which is {untracked, untracked}.

This I didn't understand but I made some changes in unittest.cpp for untracked objects outside of collections.

@tmadlener
Copy link
Collaborator

  • Create a new collection without cloning the elements (we will need to update the relations and we can't remove stuff from there if they are cloned), so with new empty elements where we will have to copy manually all the data and all the other relations except the one we're updating

That is a good point, actually. I didn't think of that. So we would definitely need the possibility for the Mutable types to change their relations, resp. clear them, so that at leas the data is easily clone-able, and then relation fixing is actually possible. Potentially, we can give clone a parameter that just hands back new objects with empty relations as well to make things a bit more usable if necessary.

I think if the purpose is to clone a complete collection, then the functionality should be implemented on the collection level, no? E.g. if you clone a Track collection, and a TrackerHit collection, to which TrackerHit collection should the cloned tracks point? I am not sure there is a generally true answer here. And in that case, I would prefer to have the option to choose what I actually want to do.

MCParticle might be a special case here because usually there is exactly one of those per event and it is self-contained, so cloning that can be handled accordingly. However, that would then belong to the EDM and not in podio, because podio can't know about these special cases.

Allowing the pointers to be different allows an original and cloned object to diverge and still compare to true, even if the data is the same (which is not checked for in the PR but probably should) then the relations could be different.

This will just make things extremely complicated I think, because you have to keep track of quite a few things. Currently the user facing classes are rather simple handles and as long as the m_obj pointer is the same, you can be sure that they actually handle the same data (and relations). If we open that up comparisons become rather expensive because we have to compare everything, effectively. And since the m_obj pointers use user facing handles for the relation handling this will potentially create a rather large graph that you have to check.

I am also not entirely sure if the tests would still be working if they are done with the immutable objects. Because in that case clone default initializes the ObjectID, which is {untracked, untracked}.

This I didn't understand but I made some changes in unittest.cpp for untracked objects outside of collections.

What I meant was something like this:

const auto& coll = frame.get<ExampleMCCollection>("mcparticles"); // <-- Mainly for getting a const& to a collection
auto par = coll[0]; // <-- immutable
auto dau = coll[1]; 

auto cPar = par.clone();
auto cDau = dau.clone();

REQUIRE(cPar.daughters()[0] == cDau); // <-- does this still work?

@jmcarcell
Copy link
Member Author

jmcarcell commented May 27, 2024

That is a good point, actually. I didn't think of that. So we would definitely need the possibility for the Mutable types to change their relations, resp. clear them, so that at leas the data is easily clone-able, and then relation fixing is actually possible. Potentially, we can give clone a parameter that just hands back new objects with empty relations as well to make things a bit more usable if necessary.

I think if the purpose is to clone a complete collection, then the functionality should be implemented on the collection level, no? E.g. if you clone a Track collection, and a TrackerHit collection, to which TrackerHit collection should the cloned tracks point? I am not sure there is a generally true answer here. And in that case, I would prefer to have the option to choose what I actually want to do.

I have added a parameter to clone without the relations. Having a look at this I think Mutable types being able to change the relations is not needed; the use case we were discussing here would be after cloning the object since you would have relations to "old" objects. However, with the new parameter when cloning this shouldn't be necessary (or at least I don't know which other use case would require this) and it's cleaner to use clone(true) since you only copy what you need. This should make it a bit less cumbersome to clone a collection since all the data is cloned and then one has to deal only with the relations manually. But since there isn't a clear answer on how to clone the relations, this manual handling probably makes sense, or at least we can start from here.

For the future, I have been thinking about a .cloneCollection() that would only update the relations that are pointing to objects in the current collection (since how to update inter-collection relations doesn't have an answer).

tests/unittests/unittest.cpp Outdated Show resolved Hide resolved
python/templates/macros/declarations.jinja2 Outdated Show resolved Hide resolved
python/templates/macros/declarations.jinja2 Show resolved Hide resolved
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR title and the release notes also need an update to match what is actually done here, right?

@jmcarcell jmcarcell changed the title Fix the equal comparison for relations of cloned objects Add a parameter cloneRelations for clone to be able not to clone the collections Jun 3, 2024
@tmadlener
Copy link
Collaborator

I added a small figure that at least schematically shows what is happening plus a few sentences, such that there is at least a tiny bit of documentation together with this feature. I would wait for CI to pass and the merge this.

@hegner hegner self-requested a review June 4, 2024 11:11
@hegner hegner merged commit a4e684f into AIDASoft:master Jun 4, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants