Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No. Row group? #2540

Open
mengjie09 opened this issue Jun 27, 2024 · 4 comments
Open

No. Row group? #2540

mengjie09 opened this issue Jun 27, 2024 · 4 comments

Comments

@mengjie09
Copy link

mengjie09 commented Jun 27, 2024

According to the lance file layout, the current lance V2 cancels the concept of row group. What is the relationship between DataFragment and row group in the code?
The DataFragment concept describes how to express different numbers of rows in different columns of the same row. Is this function implemented?

@mengjie09
Copy link
Author

According to the lance file layout, the current lance V2 cancels the concept of row group. What is the relationship between DataFragment and row group in the code?
The DataFragment concept describes how to express different numbers of rows in different columns of the same row. Is this function implemented?

@wjones127
Copy link
Contributor

DataFragment is a table-level concept. It has a fixed number of rows. When you first write data, it typically corresponds to a single data file. This is different than a row group. Row groups are inside files; as in, there are multiple row groups in a file. But Lance V2 doesn't have row groups.

The layout of data fragments is described here: https://lancedb.github.io/lance/format.html#fragments

@mengjie09
Copy link
Author

Thank you. Here's another question. If lance supports different number of rows for different columns, and DataFragment needs to have the same number of rows, how is this DataFragment represented? Is this expressed in one DataFragment, or different DataFragments?

@wjones127
Copy link
Contributor

If lance supports different number of rows for different columns

Each file must have the same number of rows per column. No row groups means there isn't a smaller unit that is required to have the same number of rows per column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants