Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0 breaking change] - consolidate compression options #363

Open
dralley opened this issue Apr 27, 2023 · 9 comments
Open

[1.0 breaking change] - consolidate compression options #363

dralley opened this issue Apr 27, 2023 · 9 comments

Comments

@dralley
Copy link
Contributor

dralley commented Apr 27, 2023

createrepo_c currently has two different switches for compression types, --compress-type and --general-compress-type.

The latter applies to everything, and the former only applies to files that are not "primary", "filelists", and "other". Probably there should be one compression type option which is applied uniformly to all metadata produced by the "createrepo_c" tool. Yum would have choked on this because of the special handling around "comps", but I believe it should no longer be a concern given the other changes proposed?

Possible further suggestions:

More aggressive - remove Bzip2 support, contingent on #338 (comment). It's very rarely used apart from sqlite metadata, and pretty much strictly inferior to other options in both speed and ratios.

@dralley
Copy link
Contributor Author

dralley commented Jul 31, 2023

@kontura I know you didn't want to go quite this far but I thought there was going to be some functionality deprecated with warnings?

@kontura
Copy link
Contributor

kontura commented Aug 8, 2023

Unfortunately I didn't find the time to do any additional work on createrepo_c and we needed to get the other changes released at least a bit in advance in case there are some problems.
Therefore this was postponed.

@mattiaverga
Copy link

Chiming in the discussion, as docs are not really clear (at least, for me): am I right to understand that --xz option is just a shorthand for ----general-compress-type=xz? Or is it a shorthand for --compress-type?

@kontura
Copy link
Contributor

kontura commented Nov 1, 2023

It is a shorthand for just --compress-type=xz (so it doesn't affect "primary", "filelists", and "other" xml metadata).

When we will be unifying the compression options we should also remove the --xz option.

@ppisar
Copy link
Contributor

ppisar commented Nov 30, 2023

I guess --general-compress-type and --compress-type was motivated by a microoptimization when some (group) files are too small to be effectively compressed. I.e. a compressed file is larger than an uncompressed one. Or their decompression takes disproportional time. Or it was simply a hack when a package manager did not understood a compression at all files.

It's basically a hack that we cannot specify exactly what files should be compressed and how should be compress. I'm not actually sure it's important to implement such feature. I would simply merge the two options into one and instead allow placing output files in multiple compression formats, including no compression.

Would you like this interface?:

    --compress-type COMPRESSION_TYPE
       Which compression type to use. Supported compressions are: none, bz2, gz, zck, zstd, xz.
       "none" means no compression. You can use this option multiple times to output in multiple compress types.
       If this option is not used, "--compress-type none" is implied".

@dralley
Copy link
Contributor Author

dralley commented Nov 30, 2023

I guess --general-compress-type and --compress-type was motivated by a microoptimization when some (group) files are too small to be effectively compressed. I.e. a compressed file is larger than an uncompressed one. Or their decompression takes disproportional time. Or it was simply a hack when a package manager did not understood a compression at all files.

Probably the last one. comps.xml is so tiny compared to the others that compression is basically irrelevant.

The API is fine but I disagree strongly about "none" being a default compression type. Without compression, those files are massive. I have a copy of Fedora 38 "release" metadata on my disk, without compression filelists.xml is 747 megabytes, with compression it's 42 megabytes (zstd-compressed, gzip is a bit bigger but not that much bigger). other.xml is 110mb vs 5.5mb, primary.xml is 167mb vs 15mb

zstd is a good default. It has good compression ratios, it's fast to compress and decompress, and by this point it is widely supported on {Open}SUSE and Fedora / RH based distros (at least, I haven't checked the others. But anything that uses libsolv either has it enabled already or can enable it with a simple flag).

@ppisar
Copy link
Contributor

ppisar commented Dec 1, 2023

That's your use case. My use case is many small repositories. A good default is very subjective. Do you think zstd will be the best in 10 years? Is zstd supported on RHEL 7? A good default does not last in time. No compression has the advantage that it avoids a risk of breaking a compatibility. Or we can make the default build-time configurable, so it's not our = upstream problem anymore.

@dralley
Copy link
Contributor Author

dralley commented Dec 1, 2023

No compression has the advantage that it avoids a risk of breaking a compatibility.

"none" is not currently even an option in createrepo_c. Neither does legacy createrepo seem to have provided that option. (adding metadata manually with modifyrepo_c will allow it, but that's different, and still not the default).

The maximally compatible choice would be to use gzip (the default prior to 1.0) forevermore, which I would still vastly prefer over defaulting to no compression at all.

I do agree that a "none" choice should exist, but it should not be the default. It has no compatibility advantages whatsoever.

Do you think zstd will be the best in 10 years? Is zstd supported on RHEL 7? A good default does not last in time.

Does it really matter if it's "the best" in 10 years? It's meaningfully better across relevant metrics and widely supported by both Red Hat and SUSE derived distributions for several years now. Support was added to libsolv in 2018 and is inherited by both dnf and zypper.

But no, RHEL 7 and yum generally don't have support, that is true. That leads to a bit of frustration for the limited number of people / projects who are managing repos for the oldest distributions using the very newest versions of Fedora, but it's not a common case and will be only a short-term problem.

@dralley
Copy link
Contributor Author

dralley commented Dec 2, 2023

Submitted #411 for --compatibility defaulting to gzip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants