Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File with messages which swap a variable for a variable placeholder aren't compressed correctly #163

Closed
kirkrodrigues opened this issue Sep 18, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@kirkrodrigues
Copy link
Member

kirkrodrigues commented Sep 18, 2023

Bug

If a file contains two messages which are identical except one message contains a variable and the other contains its corresponding variable placeholder, the file will not be compressed correctly such that decompressing it either stops prematurely or crashes.

For example, these log messages (where ^Q is an integer-variable placeholder)...

2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties ^Q from
2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties 123 from
2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties 123 from

... will cause the following exception:

2023-09-17 21:36:24,750 [error] FileWriter not closed before being destroyed - may cause data loss                                                                                                               2023-09-17 21:36:24,750 [error] Decompression failed: src/DictionaryReader.hpp:208 DictionaryReader operation failed, error_code=3

If instead we reorder the messages...

2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties 123 from
2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties ^Q from
2015-03-23 07:29:48,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties 123 from

... decompression will stop after the 2nd message, where the 2nd message is also incorrect.

CLP version

5d0f7b3

Environment

Ubuntu 20.04

Reproduction steps

  • Compress the file described in the bug report: ./clp c archives log.txt
  • Decompress the file described in the bug report: ./clp x archives decomp
  • Observe a crash or incorrect decompressed output: diff log.txt decomp/log.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant