Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interning in SQLMetadataSegmentManager may obliterate new segment metadata #6358

Open
leventov opened this issue Sep 20, 2018 · 7 comments
Open

Comments

@leventov
Copy link
Member

DataSegment interning in SQLMetadataSegmentManager.poll() is based on segment ids (if a segment with a matching segment id already exists on the heap, it is used instead of the newly polled segment), that may obliterate some new information about segments, such as different loadSpecs.

@surekhasaharan
Copy link

@leventov checking my understanding for this fix, so the equals in DataSegment should be either removed or modified to include not just the SegmentId equality, but other fields of DataSegment ? And the replaceWithExistingSegmentIfPresent in SQLMetadataSegmentManager might return something like

return alreadyExistingSegment != null
           ? (alreadyExistingSegment.equals(segment) ? alreadyExistingSegment : segment)
           : segment;

@leventov
Copy link
Member Author

leventov commented May 3, 2019

I'm not sure what would be the best way to handle this in terms of both usability and error-proneness. If you just make equals() more expensive and somebody puts it in HashSet as keys, a lot of unnecessary work may be done in the result. On the other hand, there may be cases when we would still want to have DataSegment as keys in Maps to "pack" more data into a Map without creating extra wrapping objects, that is, having ObjToIntMap<DataSegment> instead of Map<SegmentId, SomeWrapperWithDataSegmentAndIntFields>.

So what I would do is:

  • Leave DataSegment's equals() and hashCode() as they are now
  • Add DataSegment.allDataEquals() checking all data.
  • Add a Structural Search inspection in intelliJ which inhibits using DataSegment in Map keys and Sets, but can still be done using @SuppressWarnings("SSBasedInspection") when somebody really needs that (see the example above).

@drcrallen
Copy link
Contributor

I thought a segment ID was supposed to be immutable. Why would a new load spec not cause a new segment ID?

@gianm
Copy link
Contributor

gianm commented May 7, 2019

If you are migrating your segments from one deep storage to another, you might want to adjust things so the same segment objects have new loadSpecs but retain their IDs. It's valuable because changing the IDs implies changing the version and/or partition numbers, which would cause a lot of churn in the cluster.

Segment metadata also evolves naturally over time as segments are pushed from realtime tasks to deep storage. While they're on realtime tasks, they don't yet have loadSpecs or sizes (maybe other stuff too, I don't remember).

@surekhasaharan
Copy link

Add a Structural Search inspection in intelliJ which inhibits using DataSegment in Map keys and Sets, but can still be done using @SuppressWarnings("SSBasedInspection") when somebody really needs that (see the example above).

Did you mean to add this structural search inspection as a warning, if I add it, it's manifested as an error, and I'm not sure how to make it a warning while keeping the existing inspections as error. See the last item in this screenshot, how can we set it's severity to warning ?
Screen Shot 2019-05-06 at 5 19 57 PM

@leventov
Copy link
Member Author

leventov commented May 8, 2019

@surekhasaharan I meant to add it as an error. So every time a developer wants to have Set<DataSegment> in their code, they must put @SuppressWarnings (with justification) to pass CI.

@github-actions
Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants