Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support create multiple element ns together for nessie #10630

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

zymap
Copy link
Member

@zymap zymap commented Jul 4, 2024


Motivation

Create multiple level namespace together when creating namespace. For more time, the namespace level is one. But sometimes when we have multiple levels to construct the namespace, the Nessie will throw the parent namespace is not found.
For other catalogs, they may merge the multiple levels to one level so they don’t have such an issue.

We use pulsar build on the iceberg, so map the pulsar tenant/namespace to the namespace name. There are two levels then failed to create the table.

@github-actions github-actions bot added the NESSIE label Jul 4, 2024
@zymap
Copy link
Member Author

zymap commented Jul 4, 2024

@snazy Please take a look when you have time. Thanks! :)

---

### Motivation
Create multiple level namespace together when create namespace.
For more of time, the namespace level is one. But some times
when we have multiple level to construct the namespace, the nessie
will throw the parent namespace is not found.
For other catalog, they may merge the multiple level to one level
so they don’t have such issue.
@zymap zymap force-pushed the support-multiple-element-ns-create branch from 3c9319e to 2cbaede Compare July 4, 2024 10:57
@@ -290,6 +291,14 @@ public void renameTable(TableIdentifier from, TableIdentifier to) {

@Override
public void createNamespace(Namespace namespace, Map<String, String> metadata) {
if (namespace.levels().length > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail, if a parent namespace already exists.

A better place to do this would be org.apache.iceberg.nessie.NessieIcebergClient#createNamespace:

  • load all required namespaces
  • add the Put operation for all non-existing parent namespaces to the commit operation

OTOH I'm not sure about the actual contract of this function. Some implementation seem to allow creating a namespaces when parent namespaces do not exist - so the contract isn't clear, aka the functionality per se seems okay to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snazy Do you think it makes sense to move this create action to the server side? We also met the same issue when using the REST way to request Nessie.
I am not cleared with the namespace in the Nessie, not sure if it is a normal request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zymap we explicitly removed that functionality from the server side a while ago.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snazy I move the impl to the NessieIcebergClient. PTAL.

we explicitly removed that functionality from the server side a while ago.

You mean the Nessie namespace? Could you please point me to the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't remove namespaces - we have removed the explicit namespace endpoints in Nessie's REST API

@zymap zymap force-pushed the support-multiple-element-ns-create branch from c4ae465 to dabde44 Compare July 8, 2024 08:52
Copy link
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the potential access control tie up around namespaces, I wouldn't recommend auto creating missing namespaces. The current user may not have permission to drop parent namespaces and it may need manual clean up again from admin.

It is not very hard to write a script to create namespaces from admin level instead of code level. So, I am not really sure about this change.

Also, is there any catalog doing this already? Can you please point me to it?

org.projectnessie.model.Namespace parentContent =
org.projectnessie.model.Namespace.of(
contentKeyContentEntry.getKey().getElements(), ImmutableMap.of());
putOperations.add(Operation.Put.of(contentKeyContentEntry.getKey(), parentContent));
Copy link
Member

@ajantha-bhat ajantha-bhat Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we auto create the missing parent namespaces during the current namespace, who will be responsible for dropping these auto created parent namespace when we drop the child namespace?

So, we need to handle drop namespace as well.
But what if we the current user don't have permission to drop/create the parent namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we need to handle drop namespace as well.

No

@zymap
Copy link
Member Author

zymap commented Jul 9, 2024

Considering the potential access control tie up around namespaces, I wouldn't recommend auto creating missing namespaces. The current user may not have permission to drop parent namespaces and it may need manual clean up again from admin.

It is not very hard to write a script to create namespaces from admin level instead of code level. So, I am not really sure about this change.

Also, is there any catalog doing this already? Can you please point me to it?

@ajantha-bhat Our case is in the message queue and makes the data sink to the iceberg. We have a lot of namespace so it is not easy to do this with a script.
Yes, it would have permission issue but I think that should be properly configured by the user. If you want it auto-created, you can give a proper role to do this. That should be a part of the RBAC.

As @snazy said, the functionality is not very clear in the iceberg. The namespace levels is different for the different catalogs. I did some tests, for JDBC, and it will create all the levels. But for others, they may treat it as a single string.

For deletions, my understanding is it should never done automatically. You never know what is used for and who uses it. It must be a manual operation because you should be responsible for checking nobody still using it.

@ajantha-bhat
Copy link
Member

ACK and agree that it is a grey area in the spec. Maybe good to discuss in iceberg mailing list about standardising it and updating the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants