Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and fix for persisting a collection without its associated collections #235

Merged
merged 10 commits into from
Jan 24, 2022

Conversation

fdplacido
Copy link
Contributor

@fdplacido fdplacido commented Oct 13, 2021

BEGINRELEASENOTES

  • Fix crashes that happen when reading collections that have related objects in collections that have not been persisted.
  • Fix similar crashes for subset collections where the original collection has not been persisted.
    • The expected behavior in both cases is that podio does not crash when reading such collections, but only once the user tries to actually access such a missing object. Each object has an isAvailable function to guard against such crashes if need be.
  • Add a test that makes sure that the expected behavior is the one that is observed.
  • Fix a somewhat related bug in setReferences which was mistakenly a no-op for collections of a type without relations. Since this is the mechanism we use for restoring subset collections it obviously has to be present for all types.

ENDRELEASENOTES

Original problem:

  • Create some collection (cluster), then add an associated collection to it (addHit). Finally persist only the original collection but not its associated collections.
  • When reading these collections this will fail (segfault)
  • Expected behaviour: to be able to read the clusters without a crash, marking the associated collections as invalid.

I created a test that writes and reads some cluster collection, which will fails with a segfault.
Only the clusters are registered for write: the hits are not written. If hits are registered for write the test works without issue.
I pushed a fix which checks the result of collectionProvider->get(), and if this fails will just continue to the next referenced collection.

As an example, this would generated something like this:

void ExampleClusterCollectionData::setReferences(const podio::ICollectionProvider* collectionProvider, bool isSubsetColl) {
// ...
  // Normal collections have to resolve all relations
  for (unsigned int i = 0, size = m_refCollections[0]->size(); i != size; ++i) {
    const auto id = (*m_refCollections[0])[i];
    if (id.index != podio::ObjectID::invalid) {
      podio::CollectionBase* coll = nullptr;
      auto collProvResult = collectionProvider->get(id.collectionID, coll);
      if (collProvResult == false) { continue; }
      ExampleHitCollection* tmp_coll = static_cast<ExampleHitCollection*>(coll);
      const auto tmp = (*tmp_coll)[id.index];
      m_rel_Hits->emplace_back(tmp);
    } else {
      m_rel_Hits->emplace_back(nullptr);
    }
  }
// ...
}

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the suspicion that there was a problem with this.

Thanks for confirming that there indeed is one and for fixing it.

python/templates/macros/collections.jinja2 Outdated Show resolved Hide resolved
Comment on lines 84 to 87
// // auto cluster = clusters[0];
// // for (auto j = cluster.Hits_begin(), end = cluster.Hits_end(); j!=end; ++i){
// // std::cout << " Referenced hit has an energy of " << j->energy() << std::endl;
// // }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How far should this work. I am not entirely sure. Should we still be able to loop over all the hits, as long as we don't try to access any of them? something like

for (const auto hit : cluster.Hits()) {
  if (hit.isAvailable()) throw std::runtime_error("Hit is available, although it has not been written");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the example with your point here.
One can access the clusters, and loop over them in the collection. But then you are also able to loop over the hits in each cluster, because some hits were added. The problem would come when checking if the hit isAvailable(), this segfaults as the hit was never read.

This happens even if I do this:

      auto collGetResult = collectionProvider->get(id.collectionID, coll);
      if (collGetResult == false) {
        m_rel_Hits->emplace_back(nullptr);
        continue;
      }
      ExampleHitCollection* tmp_coll = static_cast<ExampleHitCollection*>(coll);

That looks like a more profound change to me.

@andresailer
Copy link
Member

What happens now if there are two Hit Collections and only one of them is written, but a cluster in principle has links to hits in both collections?

@fdplacido
Copy link
Contributor Author

Also, I wanted to follow the template by doing:

{% macro get_collection(type, name) %}
      podio::CollectionBase* coll = nullptr;
      auto collGetResult = collectionProvider->get(id.collectionID, coll);
      if (collGetResult == false) {
        m_rel_{{ name }}->emplace_back(nullptr);
        continue;
      }
      {{ type }}Collection* tmp_coll = static_cast<{{ type }}Collection*>(coll);
{%- endmacro %}

So this would be coherent with the check if (id.index != podio::ObjectID::invalid) {

But get_collection is used in https://github.com/AIDASoft/podio/blob/master/python/templates/CollectionData.cc.jinja2#L167
and there will not compile...

@fdplacido fdplacido changed the title Test and fix for persisting a collection without its associated collections [WIP] Test and fix for persisting a collection without its associated collections Oct 13, 2021
@tmadlener
Copy link
Collaborator

Yeah, that is a left over from #197 in some sense, the get_collection jinja2 macro was used only internally before, but has become somewhat "public" there. The problem is that here the scope of the jinja template and the c++ scope are mixed a bit weirdly, which makes the whole thing pretty brittle and not really re-usable outside of a few specific contexts.

@tmadlener
Copy link
Collaborator

What happens now if there are two Hit Collections and only one of them is written, but a cluster in principle has links to hits in both collections?

Relations are resolved one by one, so that in that case the cluster will still have all hits that have been present before writing. However, only those that have actually been written will also be available. I.e. in principle you would have to query each hit, whether it is available or not, before trying to access it.

@hegner
Copy link
Collaborator

hegner commented Oct 14, 2021

yap. That's correct. No way around explicit checking whenever you want to access things referenced to.

@fdplacido
Copy link
Contributor Author

fdplacido commented Oct 14, 2021

I updated the example so it fails when accessing the hits, as expected. See: #235 (comment)

So this "fix" would leave it up to the user to not persist some referenced collections, and then the user needs to take care of not accessing the collections that were never persisted.

@@ -53,7 +53,7 @@ const std::array<{{ member.full_type }}, arraysize> {{ class.bare_type }}Collect

{% macro get_collection(type) %}
podio::CollectionBase* coll = nullptr;
collectionProvider->get(id.collectionID, coll);
if (!collectionProvider->get(id.collectionID, coll)) { continue; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this does not succeed, then you will still have to do an emplace an empty object into the m_rel_{{ relation.name }} vector, because otherwise the indexing that is used under the hood breaks later. This is the segfault that we currently get. Basically an out of bounds access of a vector.

An attempt at an explanation. The basic layout of things that are involved look like this:

struct ClusterObj { // simplified to the bare minimum
  // ...
  vector<Hit>* m_rel_hits;
}; 

struct ClusterCollectionData {
  // ...
  vector<Hit> m_rel_hits; // this is where the related hits of ALL clusters live
};

struct ClusterData {
  // ...
  unsigned Hits_begin; // these are indices in 
  unsigned Hits_end;   // the vector in the CollectionData!
};

The important part is that all Clusters from a collection point to the same vector<Hit> that lives in the ClusterCollectionData, and each individual cluster only uses the begin and end indices from its ClusterData to identify the hits that belong to it. So it is important that we always create a Hit and put that into the m_rel_hits vector, even if that Hit is entirely useless (but at least we can call isAvailable on it).

@fdplacido
Copy link
Contributor Author

I implemented the changes by not using the get_collections macro, directly into the set_reference_single_relation and set_references_multi_relation macros. So now if getting the hit fails it will m_rel_{{ relation.name }}->emplace_back(nullptr); or entries[i]->m_{{ relation.name }} = nullptr; respectively.

I reverted get_collections to its original form as it is used elsewhere.

It will now segfault on emplacing_back.

@fdplacido
Copy link
Contributor Author

fdplacido commented Oct 19, 2021

The test now also creates a subset collection and adds hits that are not written to it.
Then this subset collection is read with no issue.

But, the behaviour of the subset and non-subset collections differs:

  • The non-subset collection can loop over its elements (the clusterCollection has clusters), and then on checking if isAvailable() will return false.
  • The subset collection will not loop over its elements (the hitCollection should have hits) because its size() is 0, so it doesn't even get to check availability.

I would expect the behaviour to be the same in both cases.

@tmadlener
Copy link
Collaborator

I would naively expect that everything should work the same, i.e. if there is a subset collection, then it should have the same size when writing and when reading. Also because it is possible that a subset collection has elements that live in different (original) collections and some might be written while others might not. Additionally, if you only want to get a number of elements (e.g. the number of muons in an event), I think it makes sense that the size stays unchanged, even if the underlying data cannot be accessed.

@fdplacido
Copy link
Contributor Author

I created an issue to continue and track the inconsistent use of collections and subset collections here: #244

So this PR can be merged.

@fdplacido fdplacido changed the title [WIP] Test and fix for persisting a collection without its associated collections Test and fix for persisting a collection without its associated collections Nov 4, 2021
@tmadlener tmadlener force-pushed the testing_persist_no_associated_coll branch from e81785c to e7941f5 Compare December 3, 2021 13:11
@tmadlener
Copy link
Collaborator

Fixed the usage of Mutable classes now that #205 has been merged, and rebased everything onto the latest master.

Additionally, subset collections now behave the same as normal relations: I.e. if the referred to objects cannot be found they are simply not available, but the subset collection will still have the same size as before.

There was another problem that I had to fix, which seemed to have slipped through our tests until now: the setReferences functionality did only do any actual work for collections of types with at least one relation. The problem is that setReferences is also responsible for recreating the subset collections when reading. Hence, it should always do something.

@tmadlener
Copy link
Collaborator

@hegner also this would be nice to have before the newer developments start.

@tmadlener tmadlener force-pushed the testing_persist_no_associated_coll branch from e7941f5 to bb27ad3 Compare January 20, 2022 09:36
@hegner hegner merged commit 707bbda into AIDASoft:master Jan 24, 2022
@hegner
Copy link
Collaborator

hegner commented Jan 24, 2022

Thanks @fdplacido and @tmadlener !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent collection elements iteration between subset and non-subset collections
4 participants