[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180

glasser · 2019-03-02T00:36:16Z

Motivation

If you are running multiple independent Druid clusters with the same data source names (eg, a production and a staging cluster) and accidentally start up a historical on a machine with a segment cache from the wrong cluster, the results can be disastrous. The historical will announce the segments it found on disk and immediately begin serving them segments to users. Worse, the coordinator can mark the segments that actually are part of your cluster as unused if they're overshadowed, and even delete it from deep storage if kill tasks are turned on.

This is not a theoretical problem: my company had a major outage because we accidentally did this. Fortunately we had disabled kill tasks literally that morning, so we had no major data loss, and I understood the innards of the druid_segments SQL table well enough to figure out how to recover, but it was still challenging.

While users should be careful not to reuse segment caches, it would be nice to have some protection against this failure mode.

Proposed changes

When historicals run SegmentLoadDropHandler.loadLocalCache on startup, they can check each segment that they find against the druid_segments table, and ignore any segments that aren't there. (They should ignore them and send an alert rather than crash, like it currently does for segment files whose info_dir filename doesn't match the actual segment id.)

I don't know the most appropriate way to do this check is. Do historicals have access to MetadataSegmentManager or a similar class? I think not and they probably shouldn't just for this one. Or do they have access to something like CoordinatorClient to talk to coordinators? (Do historicals ever talk to anything other than ZooKeeper and deep storage?)

(I see that SegmentLoadDropHandler is also part of realtime nodes, but I know nothing about realtime nodes. I don't know if this suggestion should be applied to them too or if they should be left alone.)

Rationale

Other considered approaches:

Allow you to give a "name" to a Druid cluster. Store the cluster name in a market file in the segment cache. Ignore segment caches that have the wrong name in them. (This also lets you use the name for other purposes — protecting server-to-server communication against misconfiguration that sets up nodes to talk to nodes in the wrong cluster, displaying the cluster name in the web console, etc.)
Instead of storing the cluster name in a marker file, add it to the DataSegment class directly. This accomplishes a similar goal with a lot more changes, and necessitates adding another configuration to allow clusters to load segments that were created before the cluster got a name.
Instead of a single startup-time check in historicals, brokers and coordinators could ignore all segments from historicals that aren't in the druid_segments table. This potentially solves other hypothetical failure modes (running a historical on a machine with an existing segment cache which had been turned off for a long time, and some segments on it had been deleted already?) but requires more machines to talk to the metadata db more often, and you have to be careful to only apply this logic to segments announced from historicals, not segments announced by indexing tasks.

Operational impact

Minimal unless you run a historical with a segment cache with unknown segments, in which case it rescues you from the problems described above.

This does mean that historicals won't be able to start up if the coordinator is down (unless they talk directly to the metadata database). The implementation could assume segments are valid if the coordinator can't be reached, to make historicals more independent.

If you rely on the ability for historicals to serve segments even if the druid_segments SQL table is corrupted, then this could be a problem.

Test plan

Tests can validate that unknown segments are skipped.

The text was updated successfully, but these errors were encountered:

jihoonson · 2019-03-02T01:12:06Z

Hmm, what happens if the marker file is corrupted or deleted?

drcrallen · 2019-03-02T03:23:57Z

You can have incompatible configs which prevent accidental cross launching without resorting to cluster names. For example, you can make the zookeeper paths they announce completely separately, but even use the same zk quorum. You can have the segment cache locations have a special key per environment so if it gets deployed to the wrong one it won't work. You can specify which metadata databases you use or table prefixes so you don't accidentally use the wrong one in the wrong environment.

Instead of cluster name, I think the "how to productionize a cluster" should be expanded to highlight the ways you can separate out the operating namespaces of these different components to either eliminate or minimize such a scenario.

glasser · 2019-03-03T23:32:06Z

@jihoonson I think it would be OK to treat that as "don't use the segment cache" (treat as empty or refuse to start up) — if one file in the segment cache is corrupted, why would any other file be more trustworthy?

@drcrallen Yes, in retrospect it is blindingly obvious how many things we could have done to avoid this issue. But we failed to do so, and the failure mode was incredibly scary. Our cluster nearly instantly dropped the majority of its data. Even once it was "fixed" it took hours for historicals to reload. And had we not turned off kill tasks that morning we could have permanently lost most of our data. I do think doing something to avoid the severity of the issue would be a good idea. Plus I want the cluster name in the web console :)

glasser · 2019-03-03T23:51:02Z

Brokers and coordinators could ignore all segments from historicals that aren't in the druid_segments table. This potentially solves other hypothetical failure modes (running a historical on a machine with an existing segment cache which had been turned off for a long time, and some segments on it had been deleted already?) but requires more machines to talk to the metadata db more often.

Here's a simpler version of the above that removes the requirement to define cluster names (though that does remove my web console benefit too :) ) and solves the exact problem in a pretty targeted way.

When historicals (and real-time nodes I guess, though I don't know much about them) run SegmentLoadDropHandler.loadLocalCache on startup, they can check each segment that they find against the druid_segments table, and ignore any segments that aren't there. (I think it should ignore them rather than crash, like it currently does for segment files whose info_dir filename doesn't match the actual segment id.)

This is different from my previous proposal because it's just a single startup-time check, not an ongoing process, and it's done by historicals, not coordinators and brokers. Instead of "filter out announced segments that aren't in the DB", it's "don't announce segments that aren't in the DB".

I don't know the most appropriate way to do this check is. Do historicals have access to MetadataSegmentManager or a similar class? I think not and they probably shouldn't just for this one. Or do they have access to something like CoordinatorClient to talk to coordinators? (Do historicals ever talk to anything other than ZooKeeper and deep storage?)

(I also know ~nothing about realtime nodes other than that it looks like SegmentLoadDropHandler is also part of it. If it's difficult to implement this feature for realtime nodes, I could consider only doing it for historicals.)

glasser · 2019-03-04T21:45:46Z

I updated the proposal to match my last comment.

glasser added the Proposal label Mar 2, 2019

glasser changed the title ~~[PROPOSAL] Cluster names to prevent multiple Druid clusters from interfering with each other~~ [PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180

[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180

glasser commented Mar 2, 2019 •

edited

Loading

jihoonson commented Mar 2, 2019

drcrallen commented Mar 2, 2019 •

edited

Loading

glasser commented Mar 3, 2019

glasser commented Mar 3, 2019 •

edited

Loading

glasser commented Mar 4, 2019

[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180

[PROPOSAL] Prevent historicals whose segment caches contain unknown segments from corrupting cluster #7180

Comments

glasser commented Mar 2, 2019 • edited Loading

Motivation

Proposed changes

Rationale

Operational impact

Test plan

jihoonson commented Mar 2, 2019

drcrallen commented Mar 2, 2019 • edited Loading

glasser commented Mar 3, 2019

glasser commented Mar 3, 2019 • edited Loading

glasser commented Mar 4, 2019

glasser commented Mar 2, 2019 •

edited

Loading

drcrallen commented Mar 2, 2019 •

edited

Loading

glasser commented Mar 3, 2019 •

edited

Loading