-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180
Comments
Hmm, what happens if the marker file is corrupted or deleted? |
You can have incompatible configs which prevent accidental cross launching without resorting to cluster names. For example, you can make the zookeeper paths they announce completely separately, but even use the same zk quorum. You can have the segment cache locations have a special key per environment so if it gets deployed to the wrong one it won't work. You can specify which metadata databases you use or table prefixes so you don't accidentally use the wrong one in the wrong environment. Instead of cluster name, I think the "how to productionize a cluster" should be expanded to highlight the ways you can separate out the operating namespaces of these different components to either eliminate or minimize such a scenario. |
@jihoonson I think it would be OK to treat that as "don't use the segment cache" (treat as empty or refuse to start up) — if one file in the segment cache is corrupted, why would any other file be more trustworthy? @drcrallen Yes, in retrospect it is blindingly obvious how many things we could have done to avoid this issue. But we failed to do so, and the failure mode was incredibly scary. Our cluster nearly instantly dropped the majority of its data. Even once it was "fixed" it took hours for historicals to reload. And had we not turned off kill tasks that morning we could have permanently lost most of our data. I do think doing something to avoid the severity of the issue would be a good idea. Plus I want the cluster name in the web console :) |
Here's a simpler version of the above that removes the requirement to define cluster names (though that does remove my web console benefit too :) ) and solves the exact problem in a pretty targeted way. When historicals (and real-time nodes I guess, though I don't know much about them) run This is different from my previous proposal because it's just a single startup-time check, not an ongoing process, and it's done by historicals, not coordinators and brokers. Instead of "filter out announced segments that aren't in the DB", it's "don't announce segments that aren't in the DB". I don't know the most appropriate way to do this check is. Do historicals have access to MetadataSegmentManager or a similar class? I think not and they probably shouldn't just for this one. Or do they have access to something like CoordinatorClient to talk to coordinators? (Do historicals ever talk to anything other than ZooKeeper and deep storage?) (I also know ~nothing about realtime nodes other than that it looks like |
I updated the proposal to match my last comment. |
Motivation
If you are running multiple independent Druid clusters with the same data source names (eg, a production and a staging cluster) and accidentally start up a historical on a machine with a segment cache from the wrong cluster, the results can be disastrous. The historical will announce the segments it found on disk and immediately begin serving them segments to users. Worse, the coordinator can mark the segments that actually are part of your cluster as unused if they're overshadowed, and even delete it from deep storage if kill tasks are turned on.
This is not a theoretical problem: my company had a major outage because we accidentally did this. Fortunately we had disabled kill tasks literally that morning, so we had no major data loss, and I understood the innards of the druid_segments SQL table well enough to figure out how to recover, but it was still challenging.
While users should be careful not to reuse segment caches, it would be nice to have some protection against this failure mode.
Proposed changes
When historicals run
SegmentLoadDropHandler.loadLocalCache
on startup, they can check each segment that they find against the druid_segments table, and ignore any segments that aren't there. (They should ignore them and send an alert rather than crash, like it currently does for segment files whose info_dir filename doesn't match the actual segment id.)I don't know the most appropriate way to do this check is. Do historicals have access to MetadataSegmentManager or a similar class? I think not and they probably shouldn't just for this one. Or do they have access to something like CoordinatorClient to talk to coordinators? (Do historicals ever talk to anything other than ZooKeeper and deep storage?)
(I see that
SegmentLoadDropHandler
is also part of realtime nodes, but I know nothing about realtime nodes. I don't know if this suggestion should be applied to them too or if they should be left alone.)Rationale
Other considered approaches:
druid_segments
table. This potentially solves other hypothetical failure modes (running a historical on a machine with an existing segment cache which had been turned off for a long time, and some segments on it had been deleted already?) but requires more machines to talk to the metadata db more often, and you have to be careful to only apply this logic to segments announced from historicals, not segments announced by indexing tasks.Operational impact
Minimal unless you run a historical with a segment cache with unknown segments, in which case it rescues you from the problems described above.
This does mean that historicals won't be able to start up if the coordinator is down (unless they talk directly to the metadata database). The implementation could assume segments are valid if the coordinator can't be reached, to make historicals more independent.
If you rely on the ability for historicals to serve segments even if the
druid_segments
SQL table is corrupted, then this could be a problem.Test plan
Tests can validate that unknown segments are skipped.
The text was updated successfully, but these errors were encountered: