Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on current issues #109

Closed
7 tasks done
tmadlener opened this issue Jun 29, 2020 · 21 comments
Closed
7 tasks done

Discussion on current issues #109

tmadlener opened this issue Jun 29, 2020 · 21 comments

Comments

@tmadlener
Copy link
Collaborator

tmadlener commented Jun 29, 2020

Since the past few weeks have revealed some smaller and larger issues in the current implementation of podio, I have tried to compile a list of things that I think should be addressed. I have split it into a major and minor issues part that mainly reflect how much work I estimate would be needed and how easily they could be addressed. This should mainly serve as a starting point for discussion and more things might get discovered along the way, while others might become obsolete. Also some of this points might be considered personal opinion or taste, but these were at least things I stumbled upon when starting to work with podio.

Major issues

Issues that require a considerable (i.e. non-negligible) amount of work or redesign to parts of podio and are also partially related to each other so that they can not be easily addressed in isolation

Creation of a collection returns a reference

EventStore::create() returns a reference to the created collection. Especially with auto this can be dangerous as omitting the & will turn this into a copy which will not have a relation to the EventStore. Hence, no element of the collection created on this copy will actually be known, tracked and stored by the EventStore.
Conversely Collection::create() returns a value type and trying to bind it to a reference will actually fail during compilation. This difference between collections and their elements has the potential for a lot of confusion, especially for less well versed newcomers in c++.
Additionally, since references are not easy to store in STL containers, it would be nice to have some sort of "value-type" wrapper similar to the elements of the collections.

Impossible to get a non-const handle to a collection from the EventStore

EventStore::get() always returns a const reference to the collection. Collections can only be modified in the process in which they are created which is not really a bad thing to have. However, in some cases it should potentially be possible. This is related to the issue that EventStore::create() returns a reference, which makes a const-cast necessary at the moment for the implementation of the DelphesEDM4HepConverter (https://github.com/key4hep/EDM4hep/blob/delphes/plugins/delphes/DelphesEDM4HepConverter.h#L185).

Impossible to re-write collections read from file

It is impossible to read collections from file and write them to another file. Related to this, I don't think it is possible to attach a Reader and a Writer to the same EventStore simultaneously at the moment. This should be possible, since it is a rather common task in reconstruction to read in some collections and produce some higher level collections from them.

API of IReader is incomplete (at least if it is meant to be public)

The current API of IReader seems to be mainly aimed at internal usage for low(er) level I/O operations rather than an actually public facing API.
Should we consider defining a "complete" (whatever complete means in this sense) API for e.g. IReader and IEventStore (or ICollectionProvider and IMetaDataProvider as it currently stands). This would then allow to more easily provide different but consistent solutions (e.g. I/O with ROOT and HDF5)

Canonical usage of ROOTReader leads to code duplication

The current canonical (?) way of reading files is to use ROOTReader::endOfEvent() to read the next event at the end of the current event (Additionally EventStore::clear() has to be called alongside it). Quite commonly some conditions are checked at the beginning of an event and if they are not met the rest of the event is skipped. This means that endOfEvent has to be called at every place such a skip can occur or otherwise no new event will ever be read. This could be potentially be achieved by a wrapper that uses RAII to read the next event once the old one goes out of scope, but currently that would probably very strongly couple the EventStore and the Reader.

Collections can only be valid if they are read from file

Minor issues

Smaller issues / bugs that can be easily addressed without too much work and (mostly) independent of each other

range-base for-loops for OneToManyRelations

Objects with OneToManyRelations should allow to loop over these relations using range-based for-loops. The current implementation only provides relation_begin() and relation_end() iterator ranges.

Const Objects provide invalid iterator ranges for relations

Infinite loop in ostream operator

Objects with OneToManyRelations to the same type that form a cycle (e.g. MCParticle daughter and mother) lead to an infinite loop in operator<<

podio does not emit a warning if a collection is not found

@hegner
Copy link
Collaborator

hegner commented Jun 29, 2020

thanks Thomas for this very nice summary. Let me comment on the first two, which are about design choices:

  • Returning a reference. The semantics of a reference are in principle the right ones. I agree though that it is way too easy to do the wrong thing on user side. So something to work on.
  • The constness of collections on return was a design decision based on experience in the LHC experiments. I know that the LCIO choice was different w/ the isMutable flag. Mutability is very bad in the long run when supporting multithreaded applications, and we should avoid it as much as possible.

The other comments are about yet missing or buggy implementations. I essentially agree with all your assessment. The IReader part though was indeed meant as purely internal. For exactly the use case you mentioned.

I am happy about all the feedback you give. As PODIO's design needs quite some feedback on usability. A few things are just still in sort of prototype state.

@tmadlener
Copy link
Collaborator Author

Hi Benedikt,

  • The constness of collections on return was a design decision based on experience in the LHC experiments. I know that the LCIO choice was different w/ the isMutable flag. Mutability is very bad in the long run when supporting multithreaded applications, and we should avoid it as much as possible.

I agree completely with this. I think the necessity for "non-const" access to the collections from the EventStore are currently also strongly related to the fact that creation returns an actual reference instead of some sort of handle which makes it hard to store in an STL container (even though std::reference_wrapper exists). At least for the DelphesEDM4HepConverter the issue mainly arises because we want to store all collections in one map as podio::CollectionBase*.

Nevertheless, I think we should consider the possible drawbacks of not having mutability. As far as I can see one could always work around non mutable collections by essentially copying an existing collection and change that on the fly. However, the feasibility of that probably depends on how often one actually would have to do that and also on the collections for which this would need to be done.

@joequant
Copy link
Contributor

joequant commented Jul 3, 2020

Would it be possible to deal with the reference issue for EventStore::create() by using boost::noncopyable or having an undefined copy constructor? This will trigger a compile error if some tries to copy a class.

@hegner
Copy link
Collaborator

hegner commented Jul 3, 2020

actually I tried to do a delete on the copy constructor:
https://github.com/AIDASoft/podio/blob/master/python/templates/Collection.h.template#L57

ROOT I/O did not appreciate this...

@vvolkl
Copy link
Contributor

vvolkl commented Jul 3, 2020

registerCollection is indeed an obvious way to avoid the const-casts in usecases like the delphes-conversion: vvolkl/EDM4HEP@98a0523

@joequant
Copy link
Contributor

joequant commented Jul 4, 2020

I can take a look at the copy constructor issue since I need to get myself familiar with both podio and root. Do you have some test code that I can look at to debug?

actually I tried to do a delete on the copy constructor:
https://github.com/AIDASoft/podio/blob/master/python/templates/Collection.h.template#L57

ROOT I/O did not appreciate this...

@hegner
Copy link
Collaborator

hegner commented Jul 4, 2020

Great! Maybe things changed since I tried last time.

@joequant
Copy link
Contributor

joequant commented Jul 5, 2020

What was the thing that broke? When I set the copy constructor for EventStore to delete, it compiles and unit test runs fine. I'm getting segfaults in root read and write, but this may or may not be due to the fact that I have a weird root set up. What did you see when you tried it?

One problem that I have is that I don't have a very stable set up so I'm using new compilers, weird dependencies etc. etc. so having feedback from someone with a less crazy setup will help debug the issue.

On another topic how dependent is podio on root?

@joequant
Copy link
Contributor

joequant commented Jul 5, 2020

I'm able to reproduce the root problem with Collections. Will take a look at this.

@joequant
Copy link
Contributor

joequant commented Jul 5, 2020

I'll need about two or three weeks to figure out how to work through the problem, but I think I can figure out how to make podio and root play nice with each other.

One thing that I'd suggest we do is to try to reduce the number of pieces that require processing outside of the standard compiler toolchain. My experience with other scientific software (IRAF and ratfor) is that people like to create preprocessors and external systems that do clever things, but that these systems become extraordinarily difficult to maintain after a few years.

@joequant
Copy link
Contributor

joequant commented Jul 5, 2020

i'm getting the following error running Read

'''
(base) [joe@big-apple tests (master)]$ ./read
Error in TTree::SetBranchAddress: unknown branch -> CollectionIDs
reading event 0
Error in TTree::SetBranchAddress: unknown branch -> evtMD
read UserEventName: - expected : event_number_0
terminate called after throwing an instance of 'std::runtime_error'
what(): Couldn't read event meta data parameters 'UserEventName'
'''

This might be due to the fact that I'm running a stripped down ROOT and I'm not including some object. While I'm working on podio, I'm also working on getting a minimal install of ROOT. So I might be not including some library, and if someone has ideas on what they might be, let me know.

@tmadlener
Copy link
Collaborator Author

(base) [joe@big-apple tests (master)]$ ./read
Error in TTree::SetBranchAddress: unknown branch -> CollectionIDs
reading event 0
Error in TTree::SetBranchAddress: unknown branch -> evtMD
read UserEventName: - expected : event_number_0
terminate called after throwing an instance of 'std::runtime_error'
what(): Couldn't read event meta data parameters 'UserEventName'

Hi @joequant
that reads like the file (or TTree) you are trying to read does not contain the necessary branches. Can you check whether the branches are in fact present? E.g. using python

import ROOT
f = ROOT.TFile.Open('your_file.root')
f.ls() # to see which TTrees are available

# Get one tree and print its branches
t = f.Get('metadata') # This is where the CollectionIDs should be
print([b.GetName() for b in t.GetListOfBranches()])

It is also possible to do this in the root command prompt using c++, but it is a bit more cumbersome because you will have to do some casts explicitly that are done implicitly in pyROOT


One thing that I'd suggest we do is to try to reduce the number of pieces that require processing outside of the standard compiler toolchain.

Which parts are you referring to here? Are there things outside the root things that are necessary to generate the dictionaries that root uses for I/O? I haven't really checked I must confess, so maybe there are some things that I miss currently.

@joequant
Copy link
Contributor

joequant commented Jul 6, 2020

Okay I think I know what's going on. I'm trying to read in root files that are corrupt when being written by write. When I run write, I get
'''
start processing
Error in TTree::Branch: The pointer specified for evtMD is not of a class known to ROOT and GenericParameters is not a known class
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "mcparticles#0" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "mcparticles#1" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "clusters#0" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
'''

So it looks like something is being misconfigured.

As far the other statement, what I meant was that something to look at is to make sure that as much of the processing is done by
something that's part of the compler rather than by any external script. Basically you want to minimize the amount of code that gets generated by an external script and have the compiler do as much of the processing as possible.

For example rather than have something like

class (stringsubstitution1)Class {
}

class (stringsubstitution2)Class {
}

You want something that looks more like

#define CLASS_TEMPLATE(x) (single code generated)

CLASS_TEMPLATE(substitution1)
CLASS_TEMPLATE(substitution2)

I haven't looked deeply into the code so I don't know if this is an issue, but it's something I've seen in other scientific packages
like IRAF. What happened with IRAF is that they had a lot of external code processing that generated code for f77, which turns out to be painful/impossible to convert to f95. So what I'm thinking about is to try to make things work so that it won't be a pain to port the system over to C++2040 and python9 (and yes, it's the nature of this type of code that it stays around for a long time.)

@andresailer
Copy link
Member

@joequant Looks like the location for the "rootmap" file for the dictionary is not in the LD_LIBRARY_PATH?

Do you just run write or the tests via ctest?

@tmadlener
Copy link
Collaborator Author

tmadlener commented Jul 6, 2020

Moved comment to #112 to try and minimize the cross talk here.

@joequant
Copy link
Contributor

joequant commented Jul 6, 2020

@joequant Looks like the location for the "rootmap" file for the dictionary is not in the LD_LIBRARY_PATH?

Do you just run write or the tests via ctest?

I just ran write. If it's because I ran the executables incorrectly, I can fix the code a bit to change the error messages.

@hegner
Copy link
Collaborator

hegner commented Jul 6, 2020

Hi!

Could we split this issue into multiple ones? We now have parallel threads going on. I created #111 and #112 to move parts of it.

Thanks,
Benedikt

@tmadlener
Copy link
Collaborator Author

Hi!

Could we split this issue into multiple ones? We now have parallel threads going on. I created #111 and #112 to move parts of it.

Thanks,
Benedikt

Yes, very good idea. I will move my comment to #112

@tmadlener
Copy link
Collaborator Author

For example rather than have something like

class (stringsubstitution1)Class {
}

class (stringsubstitution2)Class {
}

You want something that looks more like

#define CLASS_TEMPLATE(x) (single code generated)

CLASS_TEMPLATE(substitution1)
CLASS_TEMPLATE(substitution2)

The question is whether preprocessor macros are in the end easier to debug/update than a python script that generates the c++ code. I think the main problem with preprocessor macros is that it is not easily possible to have different behaviors for the different classes. The main thing here is the possibility to define relations among the different classes (e.g. OneToManyRelations and OneToOneRelations) which might be present or not. All of these will result in specific parts of the code to be included (and possibly repeated for several such relations), that is not necessary if such collections are not present.

In the end what most users see is the yaml file with the definition of the datamodel. It serves as the single entry point and from there the users shouldn't need to care too much how the code is generated from it.

At least me personally, I would rather debug/port a python script and template string files, then having to dig into deeply nested macros. But that is mainly also because I am way more familiar with python. I suspect that this is similar for large parts of the HEP community. However, I agree that the current implementation of the python generator could be made a bit more readable. But in general I think it is not using anything really fancy from the python world and works with what is available without having to pip install anything new. So at least from that point of view this should hopefully be pretty stable (unless python decides to do another 2to3 update).

Maybe @hegner also has some more insights into this topic.

@hegner
Copy link
Collaborator

hegner commented Jul 6, 2020

It was a conscious choice to use Python as code generation, as indeed multiple classes need to be updated in sync. The readability is an issue, as it just grew and grew...
What would be worth doing is adjusting the generated code to newer C++ features. I am sure a few tricks could be removed.

@tmadlener
Copy link
Collaborator Author

Closing this since essentially all of them have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants