Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an RDataSource for podio files and collections #593

Merged
merged 16 commits into from
Aug 28, 2024

Conversation

kjvbrt
Copy link
Contributor

@kjvbrt kjvbrt commented May 3, 2024

BEGINRELEASENOTES

  • Add a podio::DataSource as an RDataSource for working with podio (generated EDM) files and collections. This RDataSource exposes the podio collections directly as columns for RDataFrames. We consider this fairly stable, but still in early stages, there might still be some (breaking) changes to this

ENDRELEASENOTES

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kjvbrt. I wasn't aware this was possible in this generality. Very nice. I am probably still missing quite a few things, but I tried to summarize my current understanding below and added and a few comments / questions inline

For my general understanding, what this does is basically:

  • Create a reader for each thread / slot
  • Keep a vector of Frames, again one per thread / slot
  • Populate a vector of collections per thread / slot, where
    • each collection always is at the same position, so that we can simply index into it

I am wondering whether we could do with just one reader instead of multiple, but I suppose then we would probably have to do our own locking around that, because the RDataSource doesn't do that for us?

If I understand correctly, we could in the end even replace the current ROOTReader with the Reader that is introduced in #522 and effectively support all backends + RDataFrame?

include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator

The sanitizer workflows are most likely failing because the tests that produce the input files are not running. You can disable them by adding the new tests also to the list in CTestCustom.cmake

Copy link
Contributor

@Zehvogel Zehvogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the hardcoded strings :)

src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
Comment on lines 88 to 66
// Get collections stored in the files
std::vector<std::string> collNames = frame.getAvailableCollections();
// std::cout << "podio::ROOTDataSource: Found following collections:\n";
for (auto& collName : collNames) {
const podio::CollectionBase* coll = frame.get(collName);
if (coll->isValid()) {
m_columnNames.emplace_back(collName);
m_columnTypes.emplace_back(coll->getValueTypeName());
// std::cout << " - " << collName << "\n";
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also make the parameters of the frame available as columns, e.g. to access the cross section? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this in a follow up PR.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few new minor suggestions for using the reader interface, now that it's available. I think I got all the places, but I might have missed some.

include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
src/ROOTDataSource.cc Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
@kjvbrt
Copy link
Contributor Author

kjvbrt commented Jul 25, 2024

I continue with testing the changes made here in FCCAnalyses

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an issue with the installation now, because the EDM4hep tests are failing. As far as I can tell it has to do with header files that are no longer installed. Otherwise it's mainly minor comments.

CMakeLists.txt Outdated Show resolved Hide resolved
include/podio/ROOTDataSource.h Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
src/DataSource.cc Outdated Show resolved Hide resolved
src/DataSource.cc Outdated Show resolved Hide resolved
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this is ready, right? From which ROOT version onwards should this work? Then we could add a simple check in CMake (similar to RNTuple).

I will push the minor code changes immediately. Once we have this, we can the add support for parameters in a follow-up PR, I would suggest.

list(APPEND root_components_needed ROOTDataFrame)
endif()
find_package(ROOT REQUIRED COMPONENTS ${root_components_needed})
if((ENABLE_RNTUPLE) AND (${ROOT_VERSION} VERSION_LESS 6.28.02))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a minimal version for RDataSource that we could / should check here for the RDataSource as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RDataSource is in ROOT for a long time (not experimental since ROOT 6.14), I tested this in nightlies stack (6.32)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then I think we can also skip the check here. Thanks for checking.

src/DataSource.cc Outdated Show resolved Hide resolved
src/DataSource.cc Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator

EDM4hep CI workflows are fixed with #658

@tmadlener tmadlener changed the title Moving RDataSource closer to Podio/EDM4hep Add an RDataSource for podio files and collections Aug 28, 2024
@kjvbrt
Copy link
Contributor Author

kjvbrt commented Aug 28, 2024

I tested this in FCCAnalysis, seems to work...

@tmadlener tmadlener merged commit cfe4836 into AIDASoft:master Aug 28, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants