-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New sio backend #130
New sio backend #130
Conversation
@tmadlener: you had commented out reading of the meta data in the first record, which is uncompressed (for whatever reason) and then reading the first event in compressed mode failed... |
029df0a
to
4145b42
Compare
I have added the handling of the event meta data in a first version and also slightly reworked the handling of the collection IDs and the data that is necessary to reconstruct all collections afterwards. I have temporarily disabled the tests that check the collection meta data, as that still needs to be done. Currently the changes in #133 and #134 are also present here |
From a functionality point of view I think everything that is covered by the There are, however, a few things that could be improved in the current implementation:
For these reasons I am currently keeping this as [WIP], but any intermediate comments are very welcome. |
I am currently not entirely happy with the splitting of the different libraries, especially for the tests, where I have tried to have a "core" library, as well as two separate libraries that are only concerned with either root or sio based I/O. For root I have basically followed the same path as in edm4hep and this seems to work nicely out of the box. However, for sio I am currently struggling with getting the self registering factory approach to work nicely. The problem is that all the For the tests here, it doesn't really matter, but for actual usage it might be nice to separate the core libraries from the one that deal with I/O. Is there an easy way to do this? |
python/podio_class_generator.py
Outdated
@@ -205,6 +206,9 @@ def _process_datatype(self, name, definition): | |||
self._fill_templates('Obj', datatype) | |||
self._fill_templates('Collection', datatype) | |||
|
|||
if self.reader.options["createSIOHandlers"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.reader.options["createSIOHandlers"]: | |
if 'SIO' in self.reader.options["IOHandlers"]: |
Or something like this, and making IOHandlers a list of IOHandlers to create (ROOT and SIO for the moment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea. It would definitely scale better than the current approach in the yaml file of the datamodel. I will make this change, and since ROOT doesn't need any special treatment at the moment it will basically be simply ignored. Do you think that is a reasonable approach, or should we do something more explicit for IO systems that do not need additional generated code? E.g. only list the handlers that actually need additional code generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading more of the PR.
Maybe this should be coupled to the CMake option for enabling SIO, in the base CMakeLists in the block
if(SIO_HANDLERS)
LIST(APPEND PODIO_IO_HANDLERS SIO)
endif()
The PODIO_GENERATE_DATAMODEL function could just set environment variables for each handler,
FOREACH(HANDLER PODIO_IO_HANDLERS)
SET(EXPORT_HANDLER "${EXPORT_HANDLER} ${HANDLER}_HANDLER=1")
ENDFOREACH()
This
Line 141 in d4a0265
COMMAND python ${podio_PYTHON_DIR}/podio_class_generator.py --quiet ${YAML_FILE} ${ARG_OUTPUT_FOLDER} ${datamodel} |
becomes
COMMAND ${EXPORT_HANDLER} python ${podio_PYTHON_DIR}/podio_class_generator.py --quiet ${YAML_FILE} ${ARG_OUTPUT_FOLDER} ${datamodel}
and then the check becomes
if os.environ.get('SIO_HANDLER'):
The variable PODIO_IO_HANDLERS is exported to the podioConfig, so edm4hep can pick this up as well when calling PODIO_GENERATE_DATAMODEL
Which reminds me that the dependency on SIO should also be exported in the podioConfig, if enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROOT doesn't need any special treatment
Actually, I just realized that this is not true, because the selection.xml
file is only necessary for the ROOT dictionaries.
Regarding the handling of the generation of different IO systems, I have started some kind of discussion in that direction in #131. Maybe instead of setting environment variables, we could pass command line arguments to the generator from cmake? Or would that defeat the purpose of exporting PODIO_IO_HANDLERS
? I, personally, find explicitly passing command line arguments a bit easier to debug and to follow the flow of the whole thing.
For exporting the dependency on SIO, it would be ideal if SIO already exports targets, right?
With the latest changes I have now implemented the following:
|
CMakeLists.txt
Outdated
@@ -68,6 +68,7 @@ endif() | |||
|
|||
#--- Declare options ----------------------------------------------------------- | |||
option(CREATE_DOC "Whether or not to create doxygen doc target." OFF) | |||
option(SIO_HANDLERS "Build the SIO block handlers and readers and writers." OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming a bit too technical. Why not "SIO_SUPPORT" or "ENABLE_SIO" ? One can use equivalent naming for every further backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to ENABLE_SIO
. I realized that this has to be done all over the place, so we should come up with a more "compact" way of adding backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yap. that was my comment in the meeting already. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can broaden the discussion in #131 to cover the whole topic of how to best deal with different I/O backends.
python/podio_config_reader.py
Outdated
@@ -321,6 +321,8 @@ def __init__(self, yamlfile): | |||
"exposePODMembers": True, | |||
# use subfolder when including package header files | |||
"includeSubfolder": False, | |||
# generate the SIO block handlers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly comment and code are out of sync ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, most definitely... Thanks for checking
5c0b5bd
to
a8a23ea
Compare
tests/CMakeLists.txt
Outdated
file(GLOB sources_utils utilities/*.cc) | ||
file(GLOB headers_utils utilities/*.h) | ||
if (ENABLE_SIO) | ||
# Filter out the sio block handlers to avoid polluting the TestDataModel lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue will appear in edm4hep.
Can we find a way to make this automated for users of PODIO_GENERATE_DATAMODEL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we definitely should. I would suggest to discuss this in #131.
tests/read_test.h
Outdated
throw std::runtime_error("Couldn't read event meta data parameters 'UserEventWeight'"); | ||
} | ||
std::stringstream ss ; ss << " event_number_" << eventNum ; | ||
std::string evtName = evtMD.getStringVal( "UserEventName" ) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would have used const auto, you would have gotten const std::string& instead of a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this is true. But these changes are currently only here because this PR starts on top of #133. I will fix it there. There are probably also a few other instances, where the usage of auto
would make things a bit nicer in the tests. I haven't really checked, as I have basically just moved the already available code into a common header.
@tmadlener nice work. I have pushed an option to use the boost_filesystem library in SioIO as filesystem is not yet available on all Macs (at least on my mac running Mojave). |
@gaede Have you tried linking against c++fs with llvm, instead of bringing in boost? |
I have neither c++fs nor <experimental/filesystem> on my machine, so boost is the only reasonable option... |
I think we need something like https://github.com/AIDASoft/DD4hep/blob/a2f3c03d74f27e24fa8b4a1b8b7478ae654d8a35/cmake/DD4hepBuild.cmake#L721-L751 to support compilers older than gcc9.1 |
36cbd12
to
2313ed9
Compare
Hi @andresailer, @gaede, I added a simple TOC in c624808 as we have discussed. I am not yet entirely sure whether placing a marker outside of sio blocks at the end of the file really plays nicely with everything, but the tests don't uncover anything unexpected at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments for the filesystem checks
I have added some cmake functions to facilitate the downstream usage of the different I/O backends. Additionally I have added # For now unconditionally generate all the code that is supported by the intalled podio
PODIO_GENERATE_DATAMODEL(edm4hep ../edm4hep.yaml headers sources IO_BACKEND_HANDLERS ${podio_IO_HANDLERS})
PODIO_ADD_DATAMODEL_CORE_LIB(edm4hep "${headers}" "${sources}")
PODIO_ADD_ROOT_IO_DICT(edm4hepDict edm4hep "${headers}" src/selection.xml)
IF(TARGET edm4hepDict)
add_library(edm4hep::edm4hepDict ALIAS edm4hepDict )
ENDIF()
PODIO_ADD_SIO_IO_BLOCKS(edm4hep "${headers}" "${sources}")
IF(TARGET edm4hepSioBlocks)
message(STATUS "SioBlocks are here!")
ENDIF()
#
# The rest: Installing targets, etc..
# Depending on whether podio actually provides capabilities to do I/O via SIO edm4hep would also build the corresponding library. It is still necessary to check whether it is actually available, but the boilerplate code to conditionally add it is hidden in the cmake function. It is also still possible to pass different |
I have updated the release notes and I think this PR is now ready for (another) review. I am keeping at as [WIP] at the moment, because it still contains the changes from #133, and I think it would be best if that is merged first. Additionally, also #141 could be merged first to see if everything still works with the udpated CI. After that it might be necessary to rebase. In the course of that also the final review comments can be addressed and then it should be possible to merge this. |
Make podio look for a compatible filesystem library itself during the cmake stage, eventually falling back to boost in case the compiler doesn't support it natively.
Marking the getEntries and endOfEvent methods virtual and adding them to the IReader interface allows to re-use the same code for the read tests independent of the reader that actually provides the data. NOTE: The current implementation of SIOReader::getEntries is just here for feature completeness, and should be replaced with a more performant and const-correct version.
VERBOSE and DEBUG become available as modes to message only in cmake 3.15.
Provide cmake functions to: - Add the core datamodel library without any I/O backend dependencies. - Conditionally add a ROOT dict target (if the corresponding code has been generated) - Conditionally add a SioBlocks library target (if the corresponding code has been generated). Example usage in tests/CMakeLists.txt The usage still requires downstream users to check whether the target has been generated if they absolutely require it, but at least a lot of the boilerplate that comes with building (or trying to do so) is now hidden in these functions.
Co-authored-by: Andre Sailer <andre.philippe.sailer@cern.ch>
beec481
to
643fcba
Compare
Removed [WIP] and rebased on top of master. Barring any further review comments, this is ready to be merged from my side. |
CMakeLists.txt
Outdated
@@ -68,6 +68,8 @@ endif() | |||
|
|||
#--- Declare options ----------------------------------------------------------- | |||
option(CREATE_DOC "Whether or not to create doxygen doc target." OFF) | |||
option(ENABLE_SIO "Build the SIO block handlers and readers and writers." OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less technical: "Build SIO read/write backend"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or like said further below "build SIO I/O support"
Thanks @tmadlener ! I will review it during the day. It is quite a long one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me.
Hi @hegner, did you find the time to review/approve this PR ? We'd like to merge it soon... |
Closing and opening again, to retrigger CI. |
close + reopen to trigger CI |
BEGINRELEASENOTES
ENABLE_SIO
. If enabled, a separatepodioSioIO
library is built that allows reading and writing sio files. For serializing the different datatypes, additional code is generated to build anSioBlocks
library that is loaded at runtime (if found somewhere onLD_LIBRARY_PATH
). To facilitate the whole process at the cmake level, new cmake functions are provided to generate the core datamodel libraryPODIO_ADD_DATAMODEL_CORE_LIBRARY
, to (conditionally) define the ROOT dictionary targetPODIO_ADD_ROOT_IO_DICT
and to (conditionally) define the Sio Blocks library targetPODIO_ADD_SIO_IO_BLOCKS
. The I/O backends that are supported by podio are exported via thePODIO_IO_HANDLERS
list variable.podio_generate_datamodel.py
now additionally takes the I/O handlers that should be generated as arguments. This is also reflected in an additional argument toPODIO_GENERATE_DATAMODEL
. To have backwards compatibility, this additional argument defaults to ROOT in both cases and downstream packages should work as usual without changes.ENDRELEASENOTES
This is more or less a copy/paste exercise from a previous attempt from Frank with the main changes in the templates and the c++ code generator. At the moment this should be considered as a prototype version and there are still some things to do:
read_sio
test run successfully viactest