Add manifest file for MSQ export #15953

adarshsanjeev · 2024-02-23T05:58:35Z

This PR adds the capability for MSQ export to create a manifest file at the destination.

Motivation

Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines. There is still a safety check that requires the destination to be empty, but this would be especially helpful if that condition is relaxed in the future. Druid currently does not support reading from a manifest file.

Structure

The manifest file created is in the symlink manifest format. The file is created at the
path <export destination>/_symlink_format_manifest/manifest. Normally, this would be <export destination>/_symlink_format_manifest/<partition path>/manifest, but since Druid does not support partitioning, the manifest is always created in the _symlink_format_manifest folder itself. Each line of the file contains an absolute path to a file created by the export.
The path is prefixed by file: if the destination is on a local disk.

Additionally, a file _symlink_format_manifest/druid_export_meta is created. The file contains additional information about the export. Currently, this only contains the manifest file version, to track which version of the manifest file was created by the export.

Example

Local storage:

└[~/export]> cat _symlink_format_manifest/manifest
file:/Users/adarshsanjeev/export/query-293c1f4c-d5ed-4b04-9690-d7d2d9db4995-worker2-partition23.csv
file:/Users/adarshsanjeev/export/query-293c1f4c-d5ed-4b04-9690-d7d2d9db4995-worker1-partition13.csv
file:/Users/adarshsanjeev/export/query-293c1f4c-d5ed-4b04-9690-d7d2d9db4995-worker0-partition24.csv
...
file:/Users/adarshsanjeev/export/query-293c1f4c-d5ed-4b04-9690-d7d2d9db4995-worker1-partition1.csv

S3 export file:

File created at s3://export-bucket/export/_symlink_format_manifest/manifest

s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker2-partition2.csv
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker1-partition1.csv
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-partition0.csv
...
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-partition24.csv

druid_export_meta:

version: 1

Export is still an experimental feature, and the structure of the file could be changed in the future.

Upgrade issues

During a rolling update, older versions of workers would not return a list of exported files, and older controller would not create a manifest file. Therefore, export queries run during this time might have incomplete manifests.

Release notes

Export queries will also create a manifest file at the destination, which lists the files created by the query.

This PR has:

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQExportTest.java

adarshsanjeev · 2024-03-04T10:25:15Z

To see if the created export file is in the symlink format, I generated manifest files using Apache Spark with Delta Lake. The generated file is in a similar format for both local disk and S3, with the only difference being that since DeltaLake uses s3a while writing, the paths in the manifest file also have the same absolute paths.

Local:

NBuser@c46fe1cca55f:/tmp/delta-table$ cat _symlink_format_manifest/manifest
file:/tmp/delta-table/part-00003-463556da-8423-41f9-a25a-0b68a51e0fff-c000.snappy.parquet
file:/tmp/delta-table/part-00005-f1682aee-f29b-41ad-8ac5-9b49d8de1394-c000.snappy.parquet
file:/tmp/delta-table/part-00007-486d1cda-5871-43f8-86cc-3a44e4adede7-c000.snappy.parquet
file:/tmp/delta-table/part-00001-50edffa4-06c3-4d70-bc8b-f67da8ef4195-c000.snappy.parquet
file:/tmp/delta-table/part-00009-dfd23cfe-e6c2-4001-ae42-8e964ec8f197-c000.snappy.parquet

S3:

s3a://export-bucket/delta_test_table2/part-00000-2c8c8389-e5a6-47f9-8394-6730c474357f-c000.snappy.parquet
s3a://export-bucket/delta_test_table2/part-00002-f897b11d-692a-427d-a1ca-9b15ef218d83-c000.snappy.parquet
...
s3a://export-bucket/delta_test_table2/part-00004-a4fc5555-6fa3-46da-a4f7-ccb6b3e8b8eb-c000.snappy.parquet

LakshSingla · 2024-03-20T04:48:20Z

docs/multi-stage-query/reference.md

@@ -99,6 +99,17 @@ For more information, see [Read external data with EXTERN](concepts.md#read-exte
 This variation of EXTERN requires one argument, the details of the destination as specified below.
 This variation additionally requires an `AS` clause to specify the format of the exported rows.

+While exporting data, some metadata files will also be created at the destination in addition to the data. These files will be created in a directory `_symlink_format_manifest`.
+- `_symlink_format_manifest/manifest`: Lists the files which were created as part of the export. The file is in the symlink manifest format, and consists of a list of absolute paths to the files created.


What is the symlink manifest format? I wasn't able to find a definitive answer while searching "symlink manifest format", therefore some clarification would be helpful.

Also, is it for Druid's internal use, or can other systems and operators make use of the manifest file created?

The format itself does not seem to be well documented. It's not for Druid's use, other data stores have the capability to read the format, like delta.compatibility.symlinkFormatManifest.enabled Athena, Trino etc support reading them.

Given this, I think that we can skip documenting the format itself, but mention that it follows the symlink manifest format, wdyt?

LakshSingla · 2024-03-20T04:50:14Z

docs/multi-stage-query/reference.md

+...
+s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-partition24.csv
+```
+- `_symlink_format_manifest/druid_export_meta`: Used to store additional information about the export metadata, such as the version of the manifest file format.


Is this version for internal use, or does it have relevance outside of Druid as well? Also, can you please add the format of this metadata file?

Removed this part since it is not intended to be user facing.

LakshSingla · 2024-03-20T04:56:30Z

...ns-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ExportMetadataManager.java

+   * <br>
+   * Currently, this only contains the manifest file version.
+   */
+  private void createDruidMetadataFile(StorageConnector storageConnector) throws IOException


Seems like we are writing the results in an ad-hoc format. I think it makes sense to use one of the standard formats like JSON, YAML, etc if this is a user-facing file. Else, we should remove it from the documentation as well, since it is an implementation detail.

Removed this documentation

LakshSingla · 2024-03-20T04:59:38Z

...ns-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ExportMetadataManager.java

+    log.info("Writing manifest file at [%s]", exportStorageProvider.getBasePath());
+
+    if (storageConnector.pathExists(MANIFEST_FILE) || storageConnector.pathExists(META_FILE)) {
+      throw DruidException.defensive("Found existing manifest file already present at path.");


Why is it a defensive check? A user can create a manifest file manually, and the job will fail. Then it isn't a defensive check. We should use something relevant to either the users or the operator here. I think it makes sense that we don't expect to encounter it, given that the files would be namespace with task id, however, it still shouldn't be a defensive check.

LakshSingla · 2024-03-20T05:00:36Z

...ns-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ExportMetadataManager.java

+  public void writeMetadata(List<String> exportedFiles) throws IOException
+  {
+    final StorageConnector storageConnector = exportStorageProvider.get();
+    log.info("Writing manifest file at [%s]", exportStorageProvider.getBasePath());


nit: The sentence should make sense when reading without the interpolation

Suggested change

log.info("Writing manifest file at [%s]", exportStorageProvider.getBasePath());

log.info("Writing manifest file at location[%s]", exportStorageProvider.getBasePath());

LakshSingla · 2024-03-20T05:01:32Z

...ns-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ExportMetadataManager.java

+    }
+
+    createManifestFile(storageConnector, exportedFiles);
+    createDruidMetadataFile(storageConnector);


What happens if the previous call succeeds and this one fails? Would we end up in a partial state where the manifest is created but the metadata isn't?

Yes, it would create manifest file, but not the metadata one and the query should fail. I think this should be fine, since the metadata file itself is for druid to track the version which had previously created the import only

LakshSingla · 2024-03-20T05:04:39Z