Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate ROOT dictionary of vector<XXXData> classes #464

Closed
feipengsy opened this issue Jul 22, 2023 · 5 comments
Closed

Generate ROOT dictionary of vector<XXXData> classes #464

feipengsy opened this issue Jul 22, 2023 · 5 comments

Comments

@feipengsy
Copy link

  • OS version: CentOS7
  • Compiler version: GCC 8.5.0
  • PODIO version: v00-16-06
  • Reproduced by: (ROOT 6.22/08)
  • Input:
  • Output:
  • Goal: Read podio generated ROOT files directly using ROOT macro.

I'm trying to read the podio generated ROOT files directly using the ROOT macro to do some very simple analysis quickly. The script is something like

using namespace edm;

void func(){
     TChain *data = new TChain("events");
     data->AddFile("testdata.root");

     vector<SimTrackerHitData>* tracker_hits=0;
     data->SetBranchAddress("tracker_hits",&tracker_hits);

     TCanvas* c = new TCanvas("c","c",800,600);
     c->SetGrid();

     int nEntries=data->GetEntries();
     std::cout << "nEntries: " << nEntries << std::endl;
     for(int i=0;i<nEntries;i++){
        data->GetEntry(i);
        std::cout << "size: " << tracker_hits->size() << std::endl;
        for(size_t j=0;j<tracker_hits->size();j++){
           // some analysis code
           //...
        }
     }
}

But ROOT complains about missing dictionary and crashes when it processes the line data->GetEntry(i);

Processing func.C...
Error in <TChain::SetBranchAddress>: The class requested (vector<edm::SimTrackerHitData>) for the branch "tracker_hits" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vector<edm::SimTrackerHitData>) to avoid to write corrupted data.
nEntries: 10000
0

 *** Break *** segmentation violation

However, after generating and loading the dictionary of vector<SimTrackerHitData> manually, the file can be read correctly. I'm not sure if this is an internal problem of ROOT not handling auto-generated dictionaries correctly. But it seems that simply adding entries of vector<XXXData> in selection.xml will avoid such issues.

@tmadlener
Copy link
Collaborator

Thanks for this report. There are several considerations at play here

  1. This could be an issue with older ROOT versions. We no longer generate the vector<XYZData> selection rule, because we found that it was no longer necessary. It is easily possible that we just changed ROOT versions and ROOT now does this for us automatically. Maybe you can quickly try with a newer ROOT version to see if that already solves the problem?
  2. It looks like you want to treat the SimTrackerHit as a whole in your analysis from the example you provided. May I ask the question on why you don't want to use the "full" interface of the generated EDM in this case? You would effectively have to replace the manual opening of the file with simply using the appropriate podio reader (most likely ROOTFrameReader), and the loop would have to change slightly, but you could simply drop all the handling of branches, etc., because podio does that for you.
  3. If really want such a simple analysis, something that should work without any dictionary is to go to the "sub-branch level", and directly setting branch addresses on the members of the SimTrackerHitData branches. This mechanism is used by e.g. RDataFrame or uproot.

@peremato
Copy link
Collaborator

peremato commented Feb 2, 2024

@tmadlener the missing dictionaries for vector is also the reason why the tool to convert TTree files into RNTuple files failed. So, if it is not too much work I would vote to adding these dictionaries.

@tmadlener
Copy link
Collaborator

@peremato sorry, this took a bit longer to get around to again. I have the feeling we are potentially running into another (mac only?) root issue here. I can use the podio-ttree-to-rntuple tool without issues on Ubuntu locally on EDM4hep files. In principle it doesn't cost us much (or anything really) to generate the vectors for the dictionaries as well (#554). However, I am not sure at this point what the expected behavior of ROOT is. I will try to investigate.

@peremato
Copy link
Collaborator

peremato commented Feb 5, 2024

Sorry Thomas. I am not referring to this tool polio-ttree-to-rntuple. I am referring to the script provided by Jakob to convert any TTree to RNTuple.

   import ROOT
   ROOT.TFile.Open("rntuple.root", 'RECREATE') # optional, clear output
   inputFile = ROOT.TFile.Open("ttree.root")
   for treeName in ['T1', 'T2', 'T3']:
       tree = file.Get(treeName)
       importer = ROOT.Experimental.RNTupleImporter.Create(tree, "rntuple.root")
       importer.Import()

@tmadlener
Copy link
Collaborator

Ah, my bad. I can reproduce the failure on my end with that. Let me just point out one thing (which probably doesn't concern you too much): The TTree based file format stores some map<K, V> like things directly into branches. This is currently not supported by RNTuple, so there we actually split them into a vector<K> and a vector<V> and rebuild the map on reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants