-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype version of Frame I/O #287
Conversation
acb2e41
to
45ccc2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. I only have minor comments so far.
What we should make sure is to have a decent internal documentation about the concepts there.
Could you work on the TODOs listed in the PR? Namely some docs and release notes? The unpacking we can put into another ticket. Thanks!
include/podio/ROOTFrameReader.h
Outdated
void openFiles(const std::vector<std::string>& filenames); | ||
|
||
/// Read all collections requested | ||
std::unique_ptr<podio::ROOTRawData> readNextFrame(const std::string& category); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what behaviour do you want to have in case of end of iteration? The doc string should reflect that
include/podio/ROOTFrameReader.h
Outdated
std::unique_ptr<podio::ROOTRawData> readNextFrame(const std::string& category); | ||
|
||
/// Returns number of entries for the given category | ||
unsigned getEntries(const std::string& category) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the term category is a bit jargon. You should explain a bit better what that is/
* Helper struct to group together all the necessary state to read / process a | ||
* given category. | ||
*/ | ||
struct CategoryInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again - explain "category"
* The name of the meta data tree in podio ROOT files. This tree mainly stores | ||
* meta data that is necessary for ROOT based I/O. | ||
*/ | ||
constexpr static auto metaTreeName = "podio_metadata"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure such information is ROOT specific. It is part of the file schema and could maybe go elsewhere
podio::version::Version currentFileVersion() const { | ||
return m_fileVersion; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns stored collections in this category | |
std::map<std::string, detail::CollectionInfo> storedClasses(const std::string& catName); |
|
||
return brNames; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map<std::string, detail::CollectionInfo> ROOTFrameReader::storedClasses(const std::string& catName) { | |
std::map<std::string, detail::CollectionInfo> colls_info; | |
if (auto& catInfo = getCategoryInfo(catName); catInfo.table) { | |
for (const auto& [col, col_info] : catInfo.storedClasses) | |
colls_info[col] = col_info; | |
} | |
return colls_info; | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmadlener Very nice work around these frames! May I suggest this addition to ease their integration into consuming workflows? This allows the access to the TClass
objects as retrieved from the TChain
on readout, and thus allows retrieving a human- (and RDataSource
-) readable version of each collection type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @forthommel,
The detail::CollectionInfo
are currently really an implementation detail of the ROOTFrameReader
and I am not entirely sure if they should be exposed in an interface in their current form. Definitely not while they are still in the detail
namespace. What is the information that is actually necessary for RDataSource
to work nicely here? I would prefer a solution that only exposes what is really necessary instead of spilling everything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tmadlener, and thanks for the feedback!
What my scheme would need is the collection/category type with name catName
, as stored in the TChain
. Unless it is already exposed elsewhere in the API and I missed it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing in the API yet that exposes any of those details, as the currently foreseen EDM API does not need anything like this. Additionally, it gives podio the freedom to treat I/O implementations as a detail instead of making e.g. the Branch layout in the root files part of the API.
IIUC, then you would need the XYZData (resp. vectors of them), right?
I would suggest to move that discussion/implementation to another PR entirely, because this one is (too) large already and we would actually like to keep this one in its current form to make it possible to implement Schema evolution on top of this before iterating further.
Using a mutex to guard the internal map. Not yet a policy with possibility to disable locking from the outside
Use a unique_ptr<mutex> to preserve default movability of things
- Make them non-copyable - Fix read_tests to use const references instead of copying
- Make sure that frames can be read in different order than the on they have been written in
- Make all Frame constructors go through that - Makes sure that the EmptyRawData actually needs to complete RawData interface
Would need to be a public type of the ROOTFrameReader otherwise and would thus essentially leak an implementation detail.
- readNextFrame implies it returns a Frame which it does not
Clarify what it refers to in the implementation
Possible now (after AIDASoft#295)
Otherwise might pick up podio from LCG which seems to deadlock
@hegner as discussed briefly today:
As far as a first version goes this is complete and can be reviewed, resp. considered stable for schema evolution purposes. |
BEGINRELEASENOTES
podio::Frame
as a generalized, thread-safe (event) data container.ENDRELEASENOTES
This is a prototype version of Frame based I/O with the necessary changes to the
Frame
and the collections that are necessary to enable this. In its current form all existing I/O functionality remains intact. This current version supports the following:ROOTWriter
., while the writing part mainly has all the preparations for, but not actual support yet.SIOWriter
.From a usage perspective the
Frame
behaves very similar to the currentEventStore
for reading purposes. This can be seen by the few changes that have been necessary toread_test.h
to accommodate for the simultaneous usage with both. The major differences between the two at this point are:EventStore
).Frame
.For writing there are some conceptual differences that make a common implementation of the test code rather hard:
EventStore
hands out a reference to the collection and makes sure to clear it for every eventFrame
starts empty in every event and the collections are newly created and added to theFrame
Some more technicalities
Quite a few changes had to be introduced to make everything work, the most important ones are:
CollectionReadBuffers
that have acreateCollection
method that is dynamically assigned during runtime (and the necessary code is generated by code generation). This is necessary to separate the reading of the buffers (and their potential schema evolution) from the actual creation of the collections later.TTree
for everyFrame
category. Additionally, there is one meta data tree that holds all the information that is necessary for reading everything.TTree
and all the branches.Collections still have to be registered for write in theIn the latest iteration the collections to write have become an argument toROOTFrameReader
. It is not yet entirely clear to me how the best interface looks for this.writeFrame
(with a default to write all available collections).Frame
has quite some additional (public) API surface at the moment for the writer to get access to the necessary data.TODOs