Test and fix for persisting a collection without its associated collections #235

fdplacido · 2021-10-13T15:23:09Z

BEGINRELEASENOTES

Fix crashes that happen when reading collections that have related objects in collections that have not been persisted.
Fix similar crashes for subset collections where the original collection has not been persisted.
- The expected behavior in both cases is that podio does not crash when reading such collections, but only once the user tries to actually access such a missing object. Each object has an isAvailable function to guard against such crashes if need be.
Add a test that makes sure that the expected behavior is the one that is observed.
Fix a somewhat related bug in setReferences which was mistakenly a no-op for collections of a type without relations. Since this is the mechanism we use for restoring subset collections it obviously has to be present for all types.

ENDRELEASENOTES

Original problem:

Create some collection (cluster), then add an associated collection to it (addHit). Finally persist only the original collection but not its associated collections.
When reading these collections this will fail (segfault)
Expected behaviour: to be able to read the clusters without a crash, marking the associated collections as invalid.

I created a test that writes and reads some cluster collection, which will fails with a segfault.
Only the clusters are registered for write: the hits are not written. If hits are registered for write the test works without issue.
I pushed a fix which checks the result of collectionProvider->get(), and if this fails will just continue to the next referenced collection.

As an example, this would generated something like this:

void ExampleClusterCollectionData::setReferences(const podio::ICollectionProvider* collectionProvider, bool isSubsetColl) {
// ...
  // Normal collections have to resolve all relations
  for (unsigned int i = 0, size = m_refCollections[0]->size(); i != size; ++i) {
    const auto id = (*m_refCollections[0])[i];
    if (id.index != podio::ObjectID::invalid) {
      podio::CollectionBase* coll = nullptr;
      auto collProvResult = collectionProvider->get(id.collectionID, coll);
      if (collProvResult == false) { continue; }
      ExampleHitCollection* tmp_coll = static_cast<ExampleHitCollection*>(coll);
      const auto tmp = (*tmp_coll)[id.index];
      m_rel_Hits->emplace_back(tmp);
    } else {
      m_rel_Hits->emplace_back(nullptr);
    }
  }
// ...
}

tmadlener

I had the suspicion that there was a problem with this.

Thanks for confirming that there indeed is one and for fixing it.

python/templates/macros/collections.jinja2

tmadlener · 2021-10-13T15:40:54Z

tests/read_and_write_associated.cpp

+      // // auto cluster = clusters[0];
+      // // for (auto j = cluster.Hits_begin(), end = cluster.Hits_end(); j!=end; ++i){
+        // // std::cout << "  Referenced hit has an energy of " << j->energy() << std::endl;
+      // // }


How far should this work. I am not entirely sure. Should we still be able to loop over all the hits, as long as we don't try to access any of them? something like

for (const auto hit : cluster.Hits()) { if (hit.isAvailable()) throw std::runtime_error("Hit is available, although it has not been written"); }

I updated the example with your point here.
One can access the clusters, and loop over them in the collection. But then you are also able to loop over the hits in each cluster, because some hits were added. The problem would come when checking if the hit isAvailable(), this segfaults as the hit was never read.

This happens even if I do this:

auto collGetResult = collectionProvider->get(id.collectionID, coll); if (collGetResult == false) { m_rel_Hits->emplace_back(nullptr); continue; } ExampleHitCollection* tmp_coll = static_cast<ExampleHitCollection*>(coll);

That looks like a more profound change to me.

andresailer · 2021-10-13T15:51:25Z

What happens now if there are two Hit Collections and only one of them is written, but a cluster in principle has links to hits in both collections?

fdplacido · 2021-10-13T16:11:27Z

Also, I wanted to follow the template by doing:

{% macro get_collection(type, name) %}
      podio::CollectionBase* coll = nullptr;
      auto collGetResult = collectionProvider->get(id.collectionID, coll);
      if (collGetResult == false) {
        m_rel_{{ name }}->emplace_back(nullptr);
        continue;
      }
      {{ type }}Collection* tmp_coll = static_cast<{{ type }}Collection*>(coll);
{%- endmacro %}

So this would be coherent with the check if (id.index != podio::ObjectID::invalid) {

But get_collection is used in https://github.com/AIDASoft/podio/blob/master/python/templates/CollectionData.cc.jinja2#L167
and there will not compile...

tmadlener · 2021-10-13T16:27:52Z

Yeah, that is a left over from #197 in some sense, the get_collection jinja2 macro was used only internally before, but has become somewhat "public" there. The problem is that here the scope of the jinja template and the c++ scope are mixed a bit weirdly, which makes the whole thing pretty brittle and not really re-usable outside of a few specific contexts.

tmadlener · 2021-10-14T07:43:53Z

What happens now if there are two Hit Collections and only one of them is written, but a cluster in principle has links to hits in both collections?

Relations are resolved one by one, so that in that case the cluster will still have all hits that have been present before writing. However, only those that have actually been written will also be available. I.e. in principle you would have to query each hit, whether it is available or not, before trying to access it.

hegner · 2021-10-14T07:47:58Z

yap. That's correct. No way around explicit checking whenever you want to access things referenced to.

fdplacido · 2021-10-14T09:22:26Z

I updated the example so it fails when accessing the hits, as expected. See: #235 (comment)

So this "fix" would leave it up to the user to not persist some referenced collections, and then the user needs to take care of not accessing the collections that were never persisted.

tmadlener · 2021-10-14T10:50:38Z

python/templates/macros/collections.jinja2

@@ -53,7 +53,7 @@ const std::array<{{ member.full_type }}, arraysize> {{ class.bare_type }}Collect

 {% macro get_collection(type) %}
      podio::CollectionBase* coll = nullptr;
-      collectionProvider->get(id.collectionID, coll);
+      if (!collectionProvider->get(id.collectionID, coll)) { continue; }


If this does not succeed, then you will still have to do an emplace an empty object into the m_rel_{{ relation.name }} vector, because otherwise the indexing that is used under the hood breaks later. This is the segfault that we currently get. Basically an out of bounds access of a vector.

An attempt at an explanation. The basic layout of things that are involved look like this:

struct ClusterObj { // simplified to the bare minimum // ... vector<Hit>* m_rel_hits; }; struct ClusterCollectionData { // ... vector<Hit> m_rel_hits; // this is where the related hits of ALL clusters live }; struct ClusterData { // ... unsigned Hits_begin; // these are indices in unsigned Hits_end; // the vector in the CollectionData! };

The important part is that all Clusters from a collection point to the same vector<Hit> that lives in the ClusterCollectionData, and each individual cluster only uses the begin and end indices from its ClusterData to identify the hits that belong to it. So it is important that we always create a Hit and put that into the m_rel_hits vector, even if that Hit is entirely useless (but at least we can call isAvailable on it).

fdplacido · 2021-10-14T16:43:37Z

I implemented the changes by not using the get_collections macro, directly into the set_reference_single_relation and set_references_multi_relation macros. So now if getting the hit fails it will m_rel_{{ relation.name }}->emplace_back(nullptr); or entries[i]->m_{{ relation.name }} = nullptr; respectively.

I reverted get_collections to its original form as it is used elsewhere.

It will now segfault on emplacing_back.

fdplacido · 2021-10-19T13:59:21Z

The test now also creates a subset collection and adds hits that are not written to it.
Then this subset collection is read with no issue.

But, the behaviour of the subset and non-subset collections differs:

The non-subset collection can loop over its elements (the clusterCollection has clusters), and then on checking if isAvailable() will return false.
The subset collection will not loop over its elements (the hitCollection should have hits) because its size() is 0, so it doesn't even get to check availability.

I would expect the behaviour to be the same in both cases.

tmadlener · 2021-10-19T15:25:37Z

I would naively expect that everything should work the same, i.e. if there is a subset collection, then it should have the same size when writing and when reading. Also because it is possible that a subset collection has elements that live in different (original) collections and some might be written while others might not. Additionally, if you only want to get a number of elements (e.g. the number of muons in an event), I think it makes sense that the size stays unchanged, even if the underlying data cannot be accessed.

fdplacido · 2021-11-02T14:24:31Z

I created an issue to continue and track the inconsistent use of collections and subset collections here: #244

So this PR can be merged.

tmadlener · 2021-12-03T13:16:17Z

Fixed the usage of Mutable classes now that #205 has been merged, and rebased everything onto the latest master.

Additionally, subset collections now behave the same as normal relations: I.e. if the referred to objects cannot be found they are simply not available, but the subset collection will still have the same size as before.

There was another problem that I had to fix, which seemed to have slipped through our tests until now: the setReferences functionality did only do any actual work for collections of types with at least one relation. The problem is that setReferences is also responsible for recreating the subset collections when reading. Hence, it should always do something.

tmadlener · 2021-12-03T13:31:36Z

@hegner also this would be nice to have before the newer developments start.

Co-authored-by: Thomas Madlener <thomas.madlener@desy.de>

…tions; get_collections reverted to original form

…ollection

Includes a fix to the setReferences behavior, which previously was only available for types with at least one relation (this slipped through until now for some reason).

hegner · 2022-01-24T08:31:02Z

Thanks @fdplacido and @tmadlener !

tmadlener reviewed Oct 13, 2021

View reviewed changes

fdplacido changed the title ~~Test and fix for persisting a collection without its associated collections~~ [WIP] Test and fix for persisting a collection without its associated collections Oct 13, 2021

tmadlener reviewed Oct 14, 2021

View reviewed changes

fdplacido mentioned this pull request Nov 2, 2021

Inconsistent collection elements iteration between subset and non-subset collections #244

Closed

fdplacido changed the title ~~[WIP] Test and fix for persisting a collection without its associated collections~~ Test and fix for persisting a collection without its associated collections Nov 4, 2021

tmadlener force-pushed the testing_persist_no_associated_coll branch from e81785c to e7941f5 Compare December 3, 2021 13:11

tmadlener linked an issue Dec 3, 2021 that may be closed by this pull request

Inconsistent collection elements iteration between subset and non-subset collections #244

Closed

andresailer requested a review from hegner January 18, 2022 12:30

fdplacido and others added 10 commits January 20, 2022 10:31

add failing test for persisting associated collections

e90a736

skip relations if getting them returns false

4c1b2a1

Simplify check into one line

8a47285

Co-authored-by: Thomas Madlener <thomas.madlener@desy.de>

loop over clusters and hits in clusters

9196cbf

emplace a nullptr to vector if getting fails, use specific implementa…

35bbea2

…tions; get_collections reverted to original form

Check in constructors if underlying obj is not null

fd17fba

add subset collection hits for test. Formatting

57839aa

Change error message for subsets collection not being elements of a c…

43474b4

…ollection

write subset hits collection and try to read them

3e4f1dc

Make subset collections behave the same as other relations

bb27ad3

Includes a fix to the setReferences behavior, which previously was only available for types with at least one relation (this slipped through until now for some reason).

tmadlener force-pushed the testing_persist_no_associated_coll branch from e7941f5 to bb27ad3 Compare January 20, 2022 09:36

tmadlener approved these changes Jan 24, 2022

View reviewed changes

hegner merged commit 707bbda into AIDASoft:master Jan 24, 2022

tmadlener mentioned this pull request Jan 24, 2022

Ignore new test in sanitizer builds #256

Merged

tmadlener mentioned this pull request Sep 15, 2023

Frame based I/O crashes when reading objects with related object that were not persisted #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test and fix for persisting a collection without its associated collections #235

Test and fix for persisting a collection without its associated collections #235

fdplacido commented Oct 13, 2021 •

edited by tmadlener

Loading

tmadlener left a comment

tmadlener Oct 13, 2021

fdplacido Oct 14, 2021

andresailer commented Oct 13, 2021

fdplacido commented Oct 13, 2021

tmadlener commented Oct 13, 2021

tmadlener commented Oct 14, 2021

hegner commented Oct 14, 2021

fdplacido commented Oct 14, 2021 •

edited

Loading

tmadlener Oct 14, 2021

fdplacido commented Oct 14, 2021

fdplacido commented Oct 19, 2021 •

edited

Loading

tmadlener commented Oct 19, 2021

fdplacido commented Nov 2, 2021

tmadlener commented Dec 3, 2021

tmadlener commented Dec 3, 2021

hegner commented Jan 24, 2022

Test and fix for persisting a collection without its associated collections #235

Test and fix for persisting a collection without its associated collections #235

Conversation

fdplacido commented Oct 13, 2021 • edited by tmadlener Loading

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener Oct 13, 2021

Choose a reason for hiding this comment

fdplacido Oct 14, 2021

Choose a reason for hiding this comment

andresailer commented Oct 13, 2021

fdplacido commented Oct 13, 2021

tmadlener commented Oct 13, 2021

tmadlener commented Oct 14, 2021

hegner commented Oct 14, 2021

fdplacido commented Oct 14, 2021 • edited Loading

tmadlener Oct 14, 2021

Choose a reason for hiding this comment

fdplacido commented Oct 14, 2021

fdplacido commented Oct 19, 2021 • edited Loading

tmadlener commented Oct 19, 2021

fdplacido commented Nov 2, 2021

tmadlener commented Dec 3, 2021

tmadlener commented Dec 3, 2021

hegner commented Jan 24, 2022

fdplacido commented Oct 13, 2021 •

edited by tmadlener

Loading

fdplacido commented Oct 14, 2021 •

edited

Loading

fdplacido commented Oct 19, 2021 •

edited

Loading