Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New sio backend #130

Merged
merged 34 commits into from
Nov 10, 2020
Merged

New sio backend #130

merged 34 commits into from
Nov 10, 2020

Conversation

tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Sep 1, 2020

BEGINRELEASENOTES

  • Add SIO as a second I/O backend (as alternative to ROOT) that can be enabled with ENABLE_SIO. If enabled, a separate podioSioIO library is built that allows reading and writing sio files. For serializing the different datatypes, additional code is generated to build an SioBlocks library that is loaded at runtime (if found somewhere on LD_LIBRARY_PATH). To facilitate the whole process at the cmake level, new cmake functions are provided to generate the core datamodel library PODIO_ADD_DATAMODEL_CORE_LIBRARY, to (conditionally) define the ROOT dictionary target PODIO_ADD_ROOT_IO_DICT and to (conditionally) define the Sio Blocks library target PODIO_ADD_SIO_IO_BLOCKS. The I/O backends that are supported by podio are exported via the PODIO_IO_HANDLERS list variable.
  • podio_generate_datamodel.py now additionally takes the I/O handlers that should be generated as arguments. This is also reflected in an additional argument to PODIO_GENERATE_DATAMODEL. To have backwards compatibility, this additional argument defaults to ROOT in both cases and downstream packages should work as usual without changes.

ENDRELEASENOTES

This is more or less a copy/paste exercise from a previous attempt from Frank with the main changes in the templates and the c++ code generator. At the moment this should be considered as a prototype version and there are still some things to do:

  • More detailed Release Notes
  • Cleanup code
  • Make read_sio test run successfully via ctest
  • Update CI to also have SIO available
  • I/O of meta data
    • collection IDs (and other information necessary to reconstruct the collections after reading)
    • event meta data
    • run meta data
    • collection meta data
  • separate SIO library part from core library part (similar to what is done for root)
  • export targets properly for downstream packages

@gaede
Copy link
Contributor

gaede commented Sep 2, 2020

@tmadlener: you had commented out reading of the meta data in the first record, which is uncompressed (for whatever reason) and then reading the first event in compressed mode failed...

@tmadlener
Copy link
Collaborator Author

I have added the handling of the event meta data in a first version and also slightly reworked the handling of the collection IDs and the data that is necessary to reconstruct all collections afterwards. I have temporarily disabled the tests that check the collection meta data, as that still needs to be done.

Currently the changes in #133 and #134 are also present here

@tmadlener
Copy link
Collaborator Author

From a functionality point of view I think everything that is covered by the ROOTReader and 'ROOTWriter' is now also covered by the SIOReader and SIOWriter, with the exception of reading multiple files.

There are, however, a few things that could be improved in the current implementation:

  • The GenericParameters now have public methods to access their internal maps, solely for the purpose of easier I/O using sio. I don't particularly like this.
  • The SIOBlockFactory with its static instance, is not really ideal for multi threading. From the looks of it, it could potentially be replaced by a constant map that could be generated from the yaml file, but I am not entirely sure about that.
  • The SIOReader currently owns the collections, which makes the EventStore behave slightly differently, as compared with when it is used with the ROOTReader (i.e. EventStore::clear will destroy resources that are not owned by it). Ideally the two readers would be drop-in replacable.

For these reasons I am currently keeping this as [WIP], but any intermediate comments are very welcome.

@tmadlener
Copy link
Collaborator Author

I am currently not entirely happy with the splitting of the different libraries, especially for the tests, where I have tried to have a "core" library, as well as two separate libraries that are only concerned with either root or sio based I/O. For root I have basically followed the same path as in edm4hep and this seems to work nicely out of the box. However, for sio I am currently struggling with getting the self registering factory approach to work nicely. The problem is that all the SIOBlocks that are necessary for this are placed in a separate library and so the SIOBlockFactory instance in the tests will not know about the one that is created and filled in this library. Including one SIOBlock header in the tests fixes this for now, but this is not really a solution in my opinion.

For the tests here, it doesn't really matter, but for actual usage it might be nice to separate the core libraries from the one that deal with I/O. Is there an easy way to do this?

CMakeLists.txt Outdated Show resolved Hide resolved
@@ -205,6 +206,9 @@ def _process_datatype(self, name, definition):
self._fill_templates('Obj', datatype)
self._fill_templates('Collection', datatype)

if self.reader.options["createSIOHandlers"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.reader.options["createSIOHandlers"]:
if 'SIO' in self.reader.options["IOHandlers"]:

Or something like this, and making IOHandlers a list of IOHandlers to create (ROOT and SIO for the moment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea. It would definitely scale better than the current approach in the yaml file of the datamodel. I will make this change, and since ROOT doesn't need any special treatment at the moment it will basically be simply ignored. Do you think that is a reasonable approach, or should we do something more explicit for IO systems that do not need additional generated code? E.g. only list the handlers that actually need additional code generation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading more of the PR.

Maybe this should be coupled to the CMake option for enabling SIO, in the base CMakeLists in the block

if(SIO_HANDLERS)
  LIST(APPEND PODIO_IO_HANDLERS SIO)
endif()

The PODIO_GENERATE_DATAMODEL function could just set environment variables for each handler,

FOREACH(HANDLER PODIO_IO_HANDLERS)
  SET(EXPORT_HANDLER "${EXPORT_HANDLER} ${HANDLER}_HANDLER=1")
ENDFOREACH()

This

COMMAND python ${podio_PYTHON_DIR}/podio_class_generator.py --quiet ${YAML_FILE} ${ARG_OUTPUT_FOLDER} ${datamodel}

becomes

    COMMAND ${EXPORT_HANDLER} python ${podio_PYTHON_DIR}/podio_class_generator.py --quiet ${YAML_FILE} ${ARG_OUTPUT_FOLDER} ${datamodel}

and then the check becomes

if os.environ.get('SIO_HANDLER'):

The variable PODIO_IO_HANDLERS is exported to the podioConfig, so edm4hep can pick this up as well when calling PODIO_GENERATE_DATAMODEL

Which reminds me that the dependency on SIO should also be exported in the podioConfig, if enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROOT doesn't need any special treatment

Actually, I just realized that this is not true, because the selection.xml file is only necessary for the ROOT dictionaries.


Regarding the handling of the generation of different IO systems, I have started some kind of discussion in that direction in #131. Maybe instead of setting environment variables, we could pass command line arguments to the generator from cmake? Or would that defeat the purpose of exporting PODIO_IO_HANDLERS? I, personally, find explicitly passing command line arguments a bit easier to debug and to follow the flow of the whole thing.

For exporting the dependency on SIO, it would be ideal if SIO already exports targets, right?

python/templates/SIOBlock.cc.jinja2 Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
src/CMakeLists.txt Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
tests/CMakeLists.txt Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator Author

With the latest changes I have now implemented the following:

  • A very basic runtime library loader in form of the SIOBlockLibraryLoader. It is designed to be used as a singleton that tries to load the necessary libraries during construction. It is then simply instantiated in the constructors of the SIOReader and SIOWriter. To load libraries all directories on LD_LIBRARY_PATH are checked for files containing "SioBlocks" in their name, which are then loaded. Since this also implies a naming convention, this does the job for the moment, but maybe this should be made more stable in the future.
  • SIOReader now follows the same interface as ROOTReader and should now be in principle usable as a drop-in replacement. However, some differences still remain due to the different philosophies of SIO and ROOT, e.g. ROOT offers a getEntries method, while SIO signals the end of the file with an exception and a corresponding error code.
  • Replaced the createSIOHandlers option with an IOHandlers one for configuring which IO handler code should be generated. I would put a potential switch to command line arguments (see discussion above and in Discussion: How to best deal with the different IO backends #131) in a different PR to not pollute this one too much.
  • Addressed Andres cmake comments

CMakeLists.txt Outdated
@@ -68,6 +68,7 @@ endif()

#--- Declare options -----------------------------------------------------------
option(CREATE_DOC "Whether or not to create doxygen doc target." OFF)
option(SIO_HANDLERS "Build the SIO block handlers and readers and writers." OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming a bit too technical. Why not "SIO_SUPPORT" or "ENABLE_SIO" ? One can use equivalent naming for every further backend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to ENABLE_SIO. I realized that this has to be done all over the place, so we should come up with a more "compact" way of adding backends.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap. that was my comment in the meeting already. thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can broaden the discussion in #131 to cover the whole topic of how to best deal with different I/O backends.

@@ -321,6 +321,8 @@ def __init__(self, yamlfile):
"exposePODMembers": True,
# use subfolder when including package header files
"includeSubfolder": False,
# generate the SIO block handlers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly comment and code are out of sync ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, most definitely... Thanks for checking

file(GLOB sources_utils utilities/*.cc)
file(GLOB headers_utils utilities/*.h)
if (ENABLE_SIO)
# Filter out the sio block handlers to avoid polluting the TestDataModel lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue will appear in edm4hep.
Can we find a way to make this automated for users of PODIO_GENERATE_DATAMODEL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely should. I would suggest to discuss this in #131.

tests/CMakeLists.txt Outdated Show resolved Hide resolved
throw std::runtime_error("Couldn't read event meta data parameters 'UserEventWeight'");
}
std::stringstream ss ; ss << " event_number_" << eventNum ;
std::string evtName = evtMD.getStringVal( "UserEventName" ) ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would have used const auto, you would have gotten const std::string& instead of a copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is true. But these changes are currently only here because this PR starts on top of #133. I will fix it there. There are probably also a few other instances, where the usage of auto would make things a bit nicer in the tests. I haven't really checked, as I have basically just moved the already available code into a common header.

@gaede
Copy link
Contributor

gaede commented Sep 16, 2020

@tmadlener nice work. I have pushed an option to use the boost_filesystem library in SioIO as filesystem is not yet available on all Macs (at least on my mac running Mojave).

@andresailer
Copy link
Member

@gaede Have you tried linking against c++fs with llvm, instead of bringing in boost?
See:
https://github.com/AIDASoft/DD4hep/blob/a2f3c03d74f27e24fa8b4a1b8b7478ae654d8a35/cmake/DD4hepBuild.cmake#L721
Or is dd4hep falling back on boost on your machine?

@gaede
Copy link
Contributor

gaede commented Sep 16, 2020

I have neither c++fs nor <experimental/filesystem> on my machine, so boost is the only reasonable option...

@andresailer
Copy link
Member

I think we need something like https://github.com/AIDASoft/DD4hep/blob/a2f3c03d74f27e24fa8b4a1b8b7478ae654d8a35/cmake/DD4hepBuild.cmake#L721-L751 to support compilers older than gcc9.1

@tmadlener
Copy link
Collaborator Author

Hi @andresailer,
can you check whether I got the essential parts right for detecting a suitable filesystem library in 74a9486 ?

@gaede, I added a simple TOC in c624808 as we have discussed. I am not yet entirely sure whether placing a marker outside of sio blocks at the end of the file really plays nicely with everything, but the tests don't uncover anything unexpected at the moment.

Copy link
Member

@andresailer andresailer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for the filesystem checks

cmake/podioMacros.cmake Outdated Show resolved Hide resolved
cmake/podioMacros.cmake Outdated Show resolved Hide resolved
cmake/podioMacros.cmake Outdated Show resolved Hide resolved
cmake/podioMacros.cmake Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator Author

I have added some cmake functions to facilitate the downstream usage of the different I/O backends. Additionally I have added podio_IO_HANDLERS to the exported variables. The cmake functions do not provide a complete automatization of the downstream process, but should facilitate the usage. Example usage can be seen in the tests directory. I have briefly checked with edm4hep to see if things actually work downstream. With the usage of these new macros the edm4hep/CMakeLists.txt could look something like this:

# For now unconditionally generate all the code that is supported by the intalled podio
PODIO_GENERATE_DATAMODEL(edm4hep ../edm4hep.yaml headers sources IO_BACKEND_HANDLERS ${podio_IO_HANDLERS})

PODIO_ADD_DATAMODEL_CORE_LIB(edm4hep "${headers}" "${sources}")

PODIO_ADD_ROOT_IO_DICT(edm4hepDict edm4hep "${headers}" src/selection.xml)
IF(TARGET edm4hepDict)
  add_library(edm4hep::edm4hepDict ALIAS edm4hepDict )
ENDIF()

PODIO_ADD_SIO_IO_BLOCKS(edm4hep "${headers}" "${sources}")
IF(TARGET edm4hepSioBlocks)
  message(STATUS "SioBlocks are here!")
ENDIF()

# 
# The rest: Installing targets, etc..
# 

Depending on whether podio actually provides capabilities to do I/O via SIO edm4hep would also build the corresponding library. It is still necessary to check whether it is actually available, but the boilerplate code to conditionally add it is hidden in the cmake function.

It is also still possible to pass different IO_BACKEND_HANDLERS to PODIO_GENERATE_DATAMODEL and the functions still work, because they only check for generated files and act depending on whether it is present or not.

@tmadlener
Copy link
Collaborator Author

I have updated the release notes and I think this PR is now ready for (another) review. I am keeping at as [WIP] at the moment, because it still contains the changes from #133, and I think it would be best if that is merged first. Additionally, also #141 could be merged first to see if everything still works with the udpated CI.

After that it might be necessary to rebase. In the course of that also the final review comments can be addressed and then it should be possible to merge this.

tmadlener and others added 7 commits October 6, 2020 22:49
Make podio look for a compatible filesystem library itself during the
cmake stage, eventually falling back to boost in case the compiler
doesn't support it natively.
Marking the getEntries and endOfEvent methods virtual and adding them to
the IReader interface allows to re-use the same code for the read tests
independent of the reader that actually provides the data.

NOTE: The current implementation of SIOReader::getEntries is just here
for feature completeness, and should be replaced with a more performant
and const-correct version.
VERBOSE and DEBUG become available as modes to message only in cmake
3.15.
Provide cmake functions to:
- Add the core datamodel library without any I/O backend dependencies.
- Conditionally add a ROOT dict target (if the corresponding code has
  been generated)
- Conditionally add a SioBlocks library target (if the corresponding
  code has been generated).

Example usage in tests/CMakeLists.txt

The usage still requires downstream users to check whether the target
has been generated if they absolutely require it, but at least a lot of
the boilerplate that comes with building (or trying to do so) is now
hidden in these functions.
Co-authored-by: Andre Sailer <andre.philippe.sailer@cern.ch>
@tmadlener tmadlener changed the title [WIP] New sio backend New sio backend Oct 7, 2020
@tmadlener
Copy link
Collaborator Author

Removed [WIP] and rebased on top of master. Barring any further review comments, this is ready to be merged from my side.

CMakeLists.txt Outdated
@@ -68,6 +68,8 @@ endif()

#--- Declare options -----------------------------------------------------------
option(CREATE_DOC "Whether or not to create doxygen doc target." OFF)
option(ENABLE_SIO "Build the SIO block handlers and readers and writers." OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less technical: "Build SIO read/write backend"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or like said further below "build SIO I/O support"

@hegner
Copy link
Collaborator

hegner commented Oct 7, 2020

Thanks @tmadlener !

I will review it during the day. It is quite a long one.

Copy link
Contributor

@gaede gaede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me.

@gaede
Copy link
Contributor

gaede commented Oct 14, 2020

Thanks @tmadlener !

I will review it during the day. It is quite a long one.

Hi @hegner, did you find the time to review/approve this PR ? We'd like to merge it soon...

@tmadlener
Copy link
Collaborator Author

Closing and opening again, to retrigger CI.

@tmadlener tmadlener closed this Oct 14, 2020
@tmadlener tmadlener reopened this Oct 14, 2020
@tmadlener
Copy link
Collaborator Author

close + reopen to trigger CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants