Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST Spec: Server-side Metadata Tables #10645

Open
2 of 6 tasks
flyrain opened this issue Jul 5, 2024 · 0 comments
Open
2 of 6 tasks

REST Spec: Server-side Metadata Tables #10645

flyrain opened this issue Jul 5, 2024 · 0 comments
Labels
improvement PR that improves existing functionality

Comments

@flyrain
Copy link
Contributor

flyrain commented Jul 5, 2024

Feature Request / Improvement

Proposed Change

This proposal introduces table metadata APIs to the Iceberg REST catalog (IRC) specification.

One of Iceberg's most advantageous features is the ability to inspect a table using metadata tables. For instance, we can query snapshots just like we query data rows using the following command: SELECT * FROM prod.db.table.snapshots;

With the REST catalog, we can simplify this process further by providing metadata directly from REST endpoints. Here are several benefits of this approach:

  1. Engine Independence: The metadata tables do not rely on a specific implementation of an engine. The REST server returns the results directly. For example, the Rust Iceberg does not need to implement its own logic to query the snapshot table if it connects to a server with this capability. This reduces the complexity and development effort required for different clients and engines.
  2. Enabled New Use Cases: A catalog UI or Lakehouse UI can present a table's metadata (e.g., snapshot/partition list) without relying on an engine like Trino. This opens up possibilities for lightweight UIs and tools that can directly interact with the REST endpoints to retrieve and display metadata.
  3. Enhanced Performance: With server-side caching, the server-side metadata tables will perform better. Caching reduces the need to repeatedly compute or retrieve metadata, leading to faster response times and reduced load on the underlying storage systems.

Proposal document

https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit#heading=h.fsbkvox608i

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other

Query engine

None

@flyrain flyrain added the improvement PR that improves existing functionality label Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant