Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Azure Goverment storage #15523

Conversation

nozjkoitop
Copy link
Contributor

@nozjkoitop nozjkoitop commented Dec 8, 2023

Description

This pull request introduces a configurable Azure Storage Endpoint Suffix in the Druid Azure-Extensions.

Release note

Added support for Azure Government storage in Druid Azure-Extensions. This enhancement allows the Azure-Extensions to be compatible with different Azure storage types by updating the endpoint suffix from a hardcoded value to a configurable one.


Key changed/added classes in this PR
  • AzureAccountConfig
  • AzureUtils
  • AzureDataSegmentPuller
  • AzureStorageDruidModule

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@BartMiki BartMiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nozjkoitop nozjkoitop marked this pull request as ready for review December 12, 2023 09:58
Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the PR. I have left some comments.

* <code>"blob.core.chinacloudapi.cn"</code> or
* <code>"blob.core.usgovcloudapi.net</code>"
*/
public AzureUtils(String blobStorageEndpointSuffix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since its a utils class, lets not add a constructor here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -40,5 +40,6 @@ To use this Apache Druid extension, [include](../../configuration/extensions.md#
|`druid.azure.protocol`|the protocol to use|http or https|https|
|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3|
|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024|
|`druid.azure.endpointSuffix`|the endpoint suffix to use.|Examples: `core.windows.net`, `core.usgovcloudapi.net`|`core.windows.net`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some more colour here? when would one want to override this? Pointing to external documentation is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added usecase with link to docs


// The azure storage hadoop access pattern is:
// wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
// wasb[s]://<containername>@<accountname>.<blobStorageEndpointSuffix>/<path>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// wasb[s]://<containername>@<accountname>.<blobStorageEndpointSuffix>/<path>
// wasb[s]://<containername>@<accountname>.blob.<endpointSuffix>/<path>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

return endpointSuffix;
}

public String getBlobStorageEndpointSuffix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it the whole endpoint itself? why are we calling this method getBlobStorageEndpointSuffix instead of getBlobStorageEndpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

@@ -59,7 +64,7 @@ FileUtils.FileCopyResult getSegmentFiles(
"Loading container: [%s], with blobPath: [%s] and outDir: [%s]", containerName, blobPath, outDir
);

final String actualBlobPath = AzureUtils.maybeRemoveAzurePathPrefix(blobPath);
final String actualBlobPath = azureUtils.maybeRemoveAzurePathPrefix(blobPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could pass the endpoint suffix as an argument than passing it in the AzureUtils constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment on the tests. Please do fix the conflicts as well.

Comment on lines 321 to 322
String outputBlob = AzureUtils.maybeRemoveAzurePathPrefix("blob.core.usgovcloudapi.net/container/blob", config.getBlobStorageEndpoint());
Assert.assertEquals("container/blob", outputBlob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should move to AzureUtilsTest. Here you can test that AzureAccountConfig is correctly initialized with the properties you are setting. And in AzureUtilsTest, you will add these tests but without creating an AzureAccountConfig object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -40,5 +40,6 @@ To use this Apache Druid extension, [include](../../configuration/extensions.md#
|`druid.azure.protocol`|the protocol to use|http or https|https|
|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3|
|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024|
|`druid.azure.endpointSuffix`|The endpoint suffix to use. Could be overriden for connecting with [Azure Goverment](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api).|Examples: `core.windows.net`, `core.usgovcloudapi.net`|`core.windows.net`|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.azure.endpointSuffix`|The endpoint suffix to use. Could be overriden for connecting with [Azure Goverment](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api).|Examples: `core.windows.net`, `core.usgovcloudapi.net`|`core.windows.net`|
|`druid.azure.endpointSuffix`|The endpoint suffix to use. Override the default value to connect to [Azure Goverment](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api).|Examples: `core.windows.net`, `core.usgovcloudapi.net`|`core.windows.net`|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I've added your variant!

@@ -206,4 +207,18 @@ public void test_azureRetry_RunTimeExceptionWrappedInRunTimeException_returnsFal
boolean retry = AzureUtils.AZURE_RETRY.apply(RUNTIME_EXCEPTION_WRAPPED_IN_RUNTIME_EXCEPTON);
Assert.assertFalse(retry);
}

@Test
public void testGetAzureUtilsWithDefaultProperties()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testGetAzureUtilsWithDefaultProperties()
public void testRemoveAzurePathPrefixDefaultEndpoint()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

}

@Test
public void testGetAzureUtilsWithDefaultCustomBlobPath()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testGetAzureUtilsWithDefaultCustomBlobPath()
public void testRemoveAzurePathPrefixCustomEndpoint()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

@@ -259,6 +259,26 @@ public void testAllCredentialsUnset()
);
}

@Test
public void testGetAzureUtilsWithDefaultProperties()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testGetAzureUtilsWithDefaultProperties()
public void testGetBlobStorageEndpointWithDefaultProperties()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

}

@Test
public void testGetAzureUtilsWithDefaultCustomBlobPath()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testGetAzureUtilsWithDefaultCustomBlobPath()
public void testGetBlobStorageEndpointWithCustomBlobPath()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

|`druid.azure.protocol`|the protocol to use|http or https|https|
|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3|
|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024|
|Property| Description |Possible Values|Default|
Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting change needs to be reverted. LGTM otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changes reverted

@abhishekagarwal87 abhishekagarwal87 merged commit ea6ba40 into apache:master Jan 9, 2024
79 checks passed
@abhishekagarwal87
Copy link
Contributor

thank you for your contribution @nozjkoitop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants