hitindex for pLS is a bit confusing #267

sgnoohc · 2023-04-08T12:42:08Z

Because hit index is already referenced against hitsInGPU.idxs here:

Lines 678 to 681 in 398270a

    
           hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]]; 
        
           hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]]; 
        
           hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]]; 
        
           hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];

Downstream it gets confusing.

TrackLooper/SDL/Event.cu

Lines 678 to 681 in 398270a

    
           hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]]; 
        
           hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]]; 
        
           hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]]; 
        
           hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];

There are two hit indexs (1) within GPU memory, (2) within CMSSW / tracking Ntuple.
(1) idxs is internally used, and then we obtain (1) -> (2) mapping via hitsInGPU.idxs[].
But for pLSs, where the internal indexs should be saved, it already saved the dereferenced version (1) -> (2) into the internal memory.

So far this has not caused a problem, because in the special case of # of outer tracker hits being > N inner tracker hits used in the pLSs, we are fine.
However, in the case tha N outer tracker hits < inner tracker hits from pLSs, we may be in trouble.

VourMa · 2023-04-28T23:23:30Z

I have a couple of questions on this issue and I would like to also bring @tresreid to the discussion, as he may understand a bit better the situation:

I can see that segmentsInGPU.pLSHitsIdxs.w/x/y/z are only used in the checkHitspLS kernel, where the order of pLS hits doesn't seem to matter. If that's true, are we safe to change the order to the proper one (0 1 2 3), instead of (0 2 1 3), as mentioned here?
I am not sure what you mean by "dereferencing". Can you elaborate on what this means and where it is done and how it differs for the IT and OT hits? Will the situation be resolved if we didn't do the "dereferencing" for the pLS hits?
I don't understand the last part of your text. Where/How does the number of the OT hits interfere with the "dereferencing" of the IT hits?

Maybe my confusion comes from the fact that you quoted twice the same piece of code, while you wanted to quote something else in the second case?

VourMa · 2023-04-29T15:42:33Z

After studying a bit more point 2, I think I reached the following understanding:

Situation for OT hits

The hitIdxs going into the TC collection for the OT objects come from the hitIdxs of lower level objects. Following the path for T5s:

TrackLooper/SDL/TrackCandidate.cu

Line 363 in ce2f36b

    
           addTrackCandidateToMemory(trackCandidatesInGPU, 4/*track candidate type T5=4*/, quintupletIndex, quintupletIndex, &quintupletsInGPU.logicalLayers[5 * quintupletIndex], &quintupletsInGPU.lowerModuleIndices[5 * quintupletIndex], &quintupletsInGPU.hitIndices[10 * quintupletIndex], quintupletsInGPU.regressionG[quintupletIndex], quintupletsInGPU.regressionF[quintupletIndex], quintupletsInGPU.regressionRadius[quintupletIndex], trackCandidateIdx, quintupletIndex);

TrackLooper/SDL/Quintuplet.cu

Lines 239 to 248 in ce2f36b

    
           quintupletsInGPU.hitIndices[10 * quintupletIndex] = tripletsInGPU.hitIndices[6 * innerTripletIndex]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 1] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 1]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 2] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 2]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 3] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 3]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 4] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 4]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 5] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 5]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 6] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 2]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 7] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 3]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 8] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 4]; 
        
           quintupletsInGPU.hitIndices[10 * quintupletIndex + 9] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 5];

TrackLooper/SDL/Triplet.cu

Lines 187 to 192 in ce2f36b

    
           tripletsInGPU.hitIndices[tripletIndex * 6] = mdsInGPU.anchorHitIndices[firstMDIndex]; 
        
           tripletsInGPU.hitIndices[tripletIndex * 6 + 1] = mdsInGPU.outerHitIndices[firstMDIndex]; 
        
           tripletsInGPU.hitIndices[tripletIndex * 6 + 2] = mdsInGPU.anchorHitIndices[secondMDIndex]; 
        
           tripletsInGPU.hitIndices[tripletIndex * 6 + 3] = mdsInGPU.outerHitIndices[secondMDIndex]; 
        
           tripletsInGPU.hitIndices[tripletIndex * 6 + 4] = mdsInGPU.anchorHitIndices[thirdMDIndex]; 
        
           tripletsInGPU.hitIndices[tripletIndex * 6 + 5] = mdsInGPU.outerHitIndices[thirdMDIndex];

The path above leads the hitIdxs from MDs. For OT objects, these come from:

TrackLooper/SDL/MiniDoublet.cu

Lines 154 to 170 in ce2f36b

    
           unsigned int anchorHitIndex, outerHitIndex; 
        
           if(modulesInGPU.moduleType[lowerModuleIdx] == PS and modulesInGPU.moduleLayerType[lowerModuleIdx] == Strip) 
        
           { 
        
               mdsInGPU.anchorHitIndices[idx] = upperHitIdx; 
        
               mdsInGPU.outerHitIndices[idx] = lowerHitIdx; 
        
               anchorHitIndex = upperHitIdx; 
        
               outerHitIndex = lowerHitIdx; 
        
           } 
        
           else 
        
           { 
        
               mdsInGPU.anchorHitIndices[idx] = lowerHitIdx; 
        
               mdsInGPU.outerHitIndices[idx] = upperHitIdx; 
        
               anchorHitIndex = lowerHitIdx; 
        
               outerHitIndex = upperHitIdx; 
        
           }

TrackLooper/SDL/MiniDoublet.cu

Line 935 in ce2f36b

    
           addMDToMemory(mdsInGPU,hitsInGPU, modulesInGPU, lowerHitArrayIndex, upperHitArrayIndex, lowerModuleIndex, dz, dphi, dphichange, shiftedX, shiftedY, shiftedZ, noShiftedDz, noShiftedDphi, noShiftedDphiChange, mdIndex);

TrackLooper/SDL/MiniDoublet.cu

Lines 903 to 912 in ce2f36b

    
           unsigned int lowerHitArrayIndex = loHitArrayIndex + lowerHitIndex; 
        
           float xLower = hitsInGPU.xs[lowerHitArrayIndex]; 
        
           float yLower = hitsInGPU.ys[lowerHitArrayIndex]; 
        
           float zLower = hitsInGPU.zs[lowerHitArrayIndex]; 
        
           float rtLower = hitsInGPU.rts[lowerHitArrayIndex]; 
        
           unsigned int upperHitArrayIndex = upHitArrayIndex+upperHitIndex; 
        
           float xUpper = hitsInGPU.xs[upperHitArrayIndex]; 
        
           float yUpper = hitsInGPU.ys[upperHitArrayIndex]; 
        
           float zUpper = hitsInGPU.zs[upperHitArrayIndex]; 
        
           float rtUpper = hitsInGPU.rts[upperHitArrayIndex];

i.e. the hitIdxs correspond to the "internal" hitIdxs, that can be used for derefenrecing the "CMSSW" hitIdxs from here:

TrackLooper/SDL/Event.cu

Line 626 in ce2f36b

    
           cudaMemcpyAsync(hitsInGPU->idxs, &idxInNtuple[0], nHits*sizeof(unsigned int), cudaMemcpyHostToDevice, stream);

Situation for IT hits

The pLSs are added to memory in a different way compared to the OT objects. More specifically, the path followed is:

TrackLooper/SDL/Event.cu

Line 673 in ce2f36b

    
           addPixelSegmentToMemory(segmentsInGPU, mdsInGPU, modulesInGPU, innerMDIndex, outerMDIndex, pixelModuleIndex, hits1, hitIndices0[tid], hitIndices2[tid], dPhiChange[tid], ptIn[tid], ptErr[tid], px[tid], py[tid], pz[tid], etaErr[tid], eta[tid], phi[tid], charge[tid], seedIdx[tid], pixelSegmentIndex, tid, superbin[tid], pixelType[tid],isQuad[tid],score_lsq);

where the hitIdxs are taken already dereferenced to the "CMSSW" version:

TrackLooper/SDL/Event.cu

Lines 669 to 672 in ce2f36b

    
           hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]]; 
        
           hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]]; 
        
           hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]]; 
        
           hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];

At the same time, the MDs for pLSs are handled in the same way as those of the OT:

TrackLooper/SDL/Event.cu

Lines 659 to 660 in ce2f36b

    
           addMDToMemory(mdsInGPU, hitsInGPU, modulesInGPU, hitIndices0[tid], hitIndices1[tid], pixelModuleIndex, 0,0,0,0,0,0,0,0,0,innerMDIndex); 
        
           addMDToMemory(mdsInGPU, hitsInGPU, modulesInGPU, hitIndices2[tid], hitIndices3[tid], pixelModuleIndex, 0,0,0,0,0,0,0,0,0,outerMDIndex);

where the "internal" hitIdxs are added.

When the segmentsInGPU.pLSHitsIdxs are used to get the pLS hitIdxs (as done in PR Added one function compute pt eta phi for TC #269 here), the "CMSSW" version is taken, instead of the "internal" one, as done for the rest of the objects. This can be fixed downstream by doing this trick:

TrackLooper/code/core/write_sdl_ntuple.cc

Lines 822 to 824 in 277cf27

    
           unsigned int hitidx_in_GPU   = trackCandidatesInGPU.hitIndices[14 * idx + i]; 
        
           unsigned int hitidx_original = hitsInGPU.idxs[hitidx_in_GPU];  
        
           unsigned int hitidx = type == pLS ? hitidx_in_GPU : hitidx_original; // trackCandidatesInGPU.hitIndices for pLS's are behaving weirdly (see: issue #267)

Some parting thoughts

Given the above, it seems like the situation for pLSs is actually simpler than the one for OT hits, where the dereferencing is required. But I am still missing the point of the last part of the initial issue text (point 3 above), so maybe that shifts the tide towards the OT hitIdx handling to be better.

One easy fix to homogenize (but probably for the worse otherwise) would be to pass the "internal" hitIdxs here:

TrackLooper/SDL/Event.cu

Line 673 in ce2f36b

    
           addPixelSegmentToMemory(segmentsInGPU, mdsInGPU, modulesInGPU, innerMDIndex, outerMDIndex, pixelModuleIndex, hits1, hitIndices0[tid], hitIndices2[tid], dPhiChange[tid], ptIn[tid], ptErr[tid], px[tid], py[tid], pz[tid], etaErr[tid], eta[tid], phi[tid], charge[tid], seedIdx[tid], pixelSegmentIndex, tid, superbin[tid], pixelType[tid],isQuad[tid],score_lsq);

Still, I am not sure what we would gain from that...

Note that, the way we would have to use for getting the hitIdxs for OT objects now:

TrackLooper/code/core/write_sdl_ntuple.cc

Lines 822 to 824 in 277cf27

    
           unsigned int hitidx_in_GPU   = trackCandidatesInGPU.hitIndices[14 * idx + i]; 
        
           unsigned int hitidx_original = hitsInGPU.idxs[hitidx_in_GPU];  
        
           unsigned int hitidx = type == pLS ? hitidx_in_GPU : hitidx_original; // trackCandidatesInGPU.hitIndices for pLS's are behaving weirdly (see: issue #267)

also requires the hit arrays to be copied from the GPU to the CPU for getting the output in CMSSW, apart from the TC arrays. My last timing measurements showed that the TC arrays copies take ~0.02 ms, while the hit arrays copies take ~0.55 ms. The latter could be reduced down to ~0.1 ms, if we only copy the hitsInCPU->idxs array, which is actually the only array needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hitindex for pLS is a bit confusing #267

hitindex for pLS is a bit confusing #267

sgnoohc commented Apr 8, 2023

VourMa commented Apr 28, 2023

VourMa commented Apr 29, 2023

hitindex for pLS is a bit confusing #267

hitindex for pLS is a bit confusing #267

Comments

sgnoohc commented Apr 8, 2023

VourMa commented Apr 28, 2023

VourMa commented Apr 29, 2023