Enable querying entirely cold datasources #16676

findingrish · 2024-07-01T05:28:55Z

Problem

Currently, datasource schema doesn’t include columns from cold segments. This makes it impossible to query entirely cold datasource.

Approach

Mechanism to backfill schema for cold segments in the metadata database. Note, that this is required only for segments created prior to enabling CentralizedDatasourceSchema feature.
Update datasource schema building logic on the Coordinator to include schema from cold segments.
Make Brokers aware of entirely cold datasource.

Backfill schema for cold segments

Leverage the existing schema backfill flow added as part of CentralizedDatasourceSchema feature. Users are supposed to manually load the cold segments by making their replication factor as 1 and once the schema is backfilled (can be verified from the metadata database) they can unload the segment.

Handling entirely cold datasource

The problem with cold datasource is that Broker just doesn’t know about the datasource if none of the segment are available. So, the datasource wouldn’t even appear on the console for querying.
We need a way for the Brokers to be aware of cold datasource, so that it can fetch its schema from the Coordinator.

Currently, brokers request schema for available datasources from Coordinator in each refresh cycle.
Brokers now poll set of used datasources from the Coordinator first and then request their schema from the Coordinator.

Once Broker has schema for Cold datasources, it will show up in the console and become available for querying.

Key changes

CoordinatorSegmentMetadataCache
- It runs a scheduled thread to fetch used segments and build datasource schema from cold segments. It then merges this schema with datasource schema built using hot segments.
- The refresh logic is also updated to merge the hot datasource schema with cold schema.
BrokerSegmentMetadataCache
- The refresh condition is slightly updated, refresh is executed in each cycle if the feature is enabled.
- The refresh logic is also updated to poll used datasources from the Coordinator. This way Broker can fetch cold datasource schema.
DruidCoordinator
- Created a new class SegmentReplicationStatusManager which manages segmentReplicationStatus & broadcastSegments state. This was needed to avoid cyclic dependency between DruidCoordinator and CoordinatorSegmentMetadataCache.

This PR has:

…efresh on each cycle

server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java

kfaraz

This PR has some refactors which should be tackled separately, in order to facilitate a smoother review of the core changes here.

kfaraz · 2024-07-04T04:17:08Z

server/src/main/java/org/apache/druid/server/coordinator/SegmentReplicationStatusManager.java

+    return datasourceToUnavailableSegments;
+  }
+
+  public Object2IntMap<String> getDatasourceToDeepStorageQueryOnlySegmentCount()


@findingrish , this seems like the only new method that has been added here.
Please remove this new class SegmentReplicationStatusManager and move the code back toDruidCoordinator.

If a refactor is required, please do it in a separate PR.
This PR should focus only on the required changes.

Correction: It seems that this method had already existed too.
@findingrish , is there any new code in SegmentReplicationStatusManager?

@kfaraz There is no new code in SegmentReplicationStatusManager. The reason for refactoring was a cyclic dependency between CoordinatorSegmentMetadataCache and DruidCoordinator while trying to use DruidCoordinator#getSegmentReplicationFactor.

I will raise a separate PR for the refactor.

Okay. Can you share some more details on how the cyclic dependency is coming into picture?

Currently DruidCoordinator has a dependency on CoordinatorSegmentMetadataCache, for this patch I need to use DruidCoordinator#getSegmentReplicationFactor in CoordinatorSegmentMetadataCache which is resulting in cyclic dependency.

As a solution, I have refactored DruidCoordinator to separate out the code which updates segmentReplicationStatus and broadcastSegments.

Let me know if this solution makes sense.

@findingrish , you could just expose a method updateSegmentReplicationStatus() on CoordinatorSegmentMetadataCache. Call this method from DruidCoordinator.UpdateReplicationStatus.run() where we update broadcastSegments and segmentReplicationStatus.

Let me know if this works for you.

Yeah, this approach would work for me.
However, it seems bit odd that DruidCoordinator.UpdateReplicationStatus has to additionally update state in some other class, ideally the consumer CoordinatorSegmentMetadataCache should be pulling this information?

Is there a reason to avoid the refactor work?

Is there a reason to avoid the refactor work?

Yes, the dependencies are already all over the place which makes the code less readable and also complicates testing. A refactor is needed here but it would have to be thought through a little.

However, it seems bit odd that DruidCoordinator.UpdateReplicationStatus has to additionally update state in some other class,

Not really, you can think of the DruidCoordinator (or rather the UpdateReplicationStatus duty in this case) as sending a notification to the CoordinatorSegmentMetadatCache saying that the segment replication status has been updated. The DruidCoordinator already sends notification to the metadata cache about leadership status, this is another notification in the same vein.

Yes, the dependencies are already all over the place which makes the code less readable and also complicates testing. A refactor is needed here but it would have to be thought through a little.

Yes, this makes sense. DruidCoordinator refactoring would need more thought.

Thanks for the suggestion, I will update the patch.

cryptoe

Left some comments.

cryptoe · 2024-07-05T03:56:25Z

server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClient.java

+  /**
+   * Retrieves list of used datasources.
+   */
+  ListenableFuture<Set<String>> fetchUsedDataSources();


Please add the definition of used data sources here.

cryptoe · 2024-07-05T03:57:41Z

server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java

   */
-  protected final ConcurrentMap<String, T> tables = new ConcurrentHashMap<>();
+  protected final ConcurrentHashMap<String, T> tables = new ConcurrentHashMap<>();


Nit: Just wondering what specific hashMapMethods are you using which required this change.

I started using computeIfAbsent method. The explanation is captured here https://github.com/code-review-checklists/java-concurrency/blob/master/README.md#chm-type.

cryptoe · 2024-07-05T03:58:59Z

server/src/main/java/org/apache/druid/segment/metadata/CoordinatorSegmentMetadataCache.java

+    coldScehmaExec = Executors.newSingleThreadScheduledExecutor(
+        new ThreadFactoryBuilder()
+            .setNameFormat("DruidColdSchema-ScheduledExecutor-%d")
+            .setDaemon(true)


Why is this a demon thread ?

I will update, we don't need a daemon thread here.

cryptoe · 2024-07-05T04:04:04Z

server/src/main/java/org/apache/druid/segment/metadata/CoordinatorSegmentMetadataCache.java

@@ -181,6 +220,12 @@ public void onLeaderStart()
    try {
      segmentSchemaBackfillQueue.onLeaderStart();
      cacheExecFuture = cacheExec.submit(this::cacheExecLoop);
+      coldSchemaExecFuture = coldScehmaExec.schedule(
+          this::coldDatasourceSchemaExec,
+          coldSchemaExecPeriodMillis,


Is there a specific reason to undocumented these properties.
Do we have any metrics which tell us the performance of these executor service in terms of number of cold segments back filed ?

Do we have any metrics which tell us the performance of these executor service in terms of number of cold segments back filed

We are not backfilling segment here. It is just looping over the segments, identifying cold segment and building their schema.
If the datasource schema is updated it is logged.

cryptoe · 2024-07-05T04:06:53Z

server/src/main/java/org/apache/druid/segment/metadata/CoordinatorSegmentMetadataCache.java

+    coldSchemaTable.keySet().retainAll(dataSources);
+  }
+
+  private RowSignature mergeHotAndColdSchema(RowSignature hot, RowSignature cold)


I am very surprised you need a new method here. There should be existing logic which does this no ?

I can refactor this a bit to have a single method for merging the RowSignature.

cryptoe · 2024-07-05T06:35:37Z

server/src/main/java/org/apache/druid/segment/metadata/CoordinatorSegmentMetadataCache.java

+    }
+
+    // remove any stale datasource from the map
+    coldSchemaTable.keySet().retainAll(dataSources);


Do we have a test case for this ?

Yes, in CoordinatorSegmentMetadataCacheTest#testColdDatasourceSchema_verifyStaleDatasourceRemoved.

findingrish added 5 commits March 14, 2024 16:49

Fix build

ca33d8b

Merge branch 'master' of github.com:findingrish/druid

1abba25

temp changes

033c420

Coordinator changes to process schema for cold segments

96e77c1

Changes in the broker metadata cache to fetch used datasources schema

1739dce

github-actions bot added the Area - Querying label Jul 1, 2024

findingrish added 4 commits July 2, 2024 10:22

Refactor DruidCoordinator, broker metadata cache changes to execute r…

6ed0b8b

…efresh on each cycle

Fix tests

966f5ac

Merge remote-tracking branch 'upstream/master' into cold_ds_schema

d586a8e

Update MetadataResourceTest

b72b822

github-advanced-security bot found potential problems Jul 2, 2024

View reviewed changes

server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java Fixed Show fixed Hide fixed

findingrish added 5 commits July 2, 2024 14:28

minor changes

ad58f9b

Fix refresh condition on Broker, add UTs

2944959

Fix test

b09df54

Merge remote-tracking branch 'upstream/master' into cold_ds_schema

254cea2

minor code changes, add test

25d23c6

kfaraz requested changes Jul 4, 2024

View reviewed changes

cryptoe reviewed Jul 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable querying entirely cold datasources #16676

Enable querying entirely cold datasources #16676

findingrish commented Jul 1, 2024 •

edited

Loading

kfaraz left a comment

kfaraz Jul 4, 2024

kfaraz Jul 4, 2024

findingrish Jul 4, 2024

kfaraz Jul 4, 2024

findingrish Jul 4, 2024 •

edited

Loading

kfaraz Jul 5, 2024

findingrish Jul 5, 2024

kfaraz Jul 5, 2024 •

edited

Loading

findingrish Jul 5, 2024

cryptoe left a comment

cryptoe Jul 5, 2024

cryptoe Jul 5, 2024

findingrish Jul 5, 2024

cryptoe Jul 5, 2024

findingrish Jul 5, 2024

cryptoe Jul 5, 2024

findingrish Jul 5, 2024

cryptoe Jul 5, 2024

findingrish Jul 5, 2024

cryptoe Jul 5, 2024

findingrish Jul 5, 2024

Enable querying entirely cold datasources #16676

Are you sure you want to change the base?

Enable querying entirely cold datasources #16676

Conversation

findingrish commented Jul 1, 2024 • edited Loading

Problem

Approach

Backfill schema for cold segments

Handling entirely cold datasource

Key changes

kfaraz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findingrish Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cryptoe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findingrish commented Jul 1, 2024 •

edited

Loading

findingrish Jul 4, 2024 •

edited

Loading

kfaraz Jul 5, 2024 •

edited

Loading