-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support create multiple element ns together for nessie #10630
base: main
Are you sure you want to change the base?
Conversation
@snazy Please take a look when you have time. Thanks! :) |
--- ### Motivation Create multiple level namespace together when create namespace. For more of time, the namespace level is one. But some times when we have multiple level to construct the namespace, the nessie will throw the parent namespace is not found. For other catalog, they may merge the multiple level to one level so they don’t have such issue.
3c9319e
to
2cbaede
Compare
@@ -290,6 +291,14 @@ public void renameTable(TableIdentifier from, TableIdentifier to) { | |||
|
|||
@Override | |||
public void createNamespace(Namespace namespace, Map<String, String> metadata) { | |||
if (namespace.levels().length > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail, if a parent namespace already exists.
A better place to do this would be org.apache.iceberg.nessie.NessieIcebergClient#createNamespace
:
- load all required namespaces
- add the
Put
operation for all non-existing parent namespaces to the commit operation
OTOH I'm not sure about the actual contract of this function. Some implementation seem to allow creating a namespaces when parent namespaces do not exist - so the contract isn't clear, aka the functionality per se seems okay to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy Do you think it makes sense to move this create action to the server side? We also met the same issue when using the REST way to request Nessie.
I am not cleared with the namespace in the Nessie, not sure if it is a normal request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zymap we explicitly removed that functionality from the server side a while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy I move the impl to the NessieIcebergClient. PTAL.
we explicitly removed that functionality from the server side a while ago.
You mean the Nessie namespace? Could you please point me to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't remove namespaces - we have removed the explicit namespace endpoints in Nessie's REST API
c4ae465
to
dabde44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the potential access control tie up around namespaces, I wouldn't recommend auto creating missing namespaces. The current user may not have permission to drop parent namespaces and it may need manual clean up again from admin.
It is not very hard to write a script to create namespaces from admin level instead of code level. So, I am not really sure about this change.
Also, is there any catalog doing this already? Can you please point me to it?
org.projectnessie.model.Namespace parentContent = | ||
org.projectnessie.model.Namespace.of( | ||
contentKeyContentEntry.getKey().getElements(), ImmutableMap.of()); | ||
putOperations.add(Operation.Put.of(contentKeyContentEntry.getKey(), parentContent)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we auto create the missing parent namespaces during the current namespace, who will be responsible for dropping these auto created parent namespace when we drop the child namespace?
So, we need to handle drop namespace as well.
But what if we the current user don't have permission to drop/create the parent namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we need to handle drop namespace as well.
No
@ajantha-bhat Our case is in the message queue and makes the data sink to the iceberg. We have a lot of namespace so it is not easy to do this with a script. As @snazy said, the functionality is not very clear in the iceberg. The namespace levels is different for the different catalogs. I did some tests, for JDBC, and it will create all the levels. But for others, they may treat it as a single string. For deletions, my understanding is it should never done automatically. You never know what is used for and who uses it. It must be a manual operation because you should be responsible for checking nobody still using it. |
ACK and agree that it is a grey area in the spec. Maybe good to discuss in iceberg mailing list about standardising it and updating the spec. |
Motivation
Create multiple level namespace together when creating namespace. For more time, the namespace level is one. But sometimes when we have multiple levels to construct the namespace, the Nessie will throw the parent namespace is not found.
For other catalogs, they may merge the multiple levels to one level so they don’t have such an issue.
We use pulsar build on the iceberg, so map the pulsar tenant/namespace to the namespace name. There are two levels then failed to create the table.