-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification: fix sorting and add unit-tests script #5869
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to run the script myself still, this will be very nice to have.
log_msg " By default is is set to test/resources/QualificationExpectations" | ||
log_msg " --qual-events-dir=<arg> - Path of the Spark events logs used as input for the unit tests" | ||
log_msg " By default is is set to test/resources/spark-events-qualification" | ||
log_msg " --prof-log-dir=<arg> - Some the unit tests reads files located in another directory." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit grammar "Some the". What happens if we just call this twice instead of having a separate option, once on qualification dir and once on files in profiling dir? We don't use all the profiling once I thought so I assume user picks and chooses which ones to copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably would be the way when the script supports both tools.
For now, having a single directory and calling twice could lead to:
- It will be more complicated to compare the final results (comparing entire folder). That final check is necessary to make sure that all unit tests exist and equal.
- forget to run twice for different output.
Should we consider duplicating the common spark-log files from spark-events-profiling
to spark-events-qualification
? Eventually, it will make things easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of the second directory. Basically, if an input directory is passed to the script, then we should expect that it contains all the required log files.
I also updated the description to emphasize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove the option then. sorry if I was confusing, what does that mean then for this script? we don't have all our eventlogs in one directory, so this script won't work with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you mean by default it works with our test setup, but if someone was using it for something else, they must all be in the same directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you mean by default it works with our test setup, but if someone was using it for something else, they must all be in the same directory?
Yes, exactly. this is what I meant.
By default, it will work fine for our test setup loading the log files from two directories.
fi | ||
|
||
# cleanup output folder before running tests | ||
if [ -d "$BATCH_OUT_DIR" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be safer to fail if directory exists unless users adds and overwrite flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but the problem with giving the user option not to overwrite is that a csv from previous run that was not generated by the most recent run would cause the final directory comparison to pass successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand, it would fail if any previous run is there and you try to run it again. If you select overwrite then it removes it first.
I'm ok with leaving it this way, but please add comment in top level documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I added one more flag "-f|--force"
qual_log_prefix="file:${QUAL_LOG_DIR}" | ||
prof_log_prefix="file:${PROF_LOG_DIR}" | ||
|
||
qualification_path_map[nds_q86_test_expectation]="${qual_log_prefix}/nds_q86_test ${qual_log_prefix}/malformed_json_eventlog.zstd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nartal1 just fyi, lets make sure we update this when adding new files.
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
I made some changes and added an update to the PR description. |
build |
log_msg " 3- Logs the result of \"diff\" between the two CSV files" | ||
log_msg " diff \${qual_expectations_dir}/\$K.csv \${out-dir}/qualification-output/csv/\$K.csv" | ||
log_msg | ||
log_msg "Finally, the script logs the result of \"diff\" for two directories:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is same statement as 3- above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified the difference.
"3-" is diff between two CSV files.
The following step is to compare the entire two directories. I found that this step is necessary in case the Map is missing keys. Comparing the final directory is a second level to validate that the Map is not broken.
fi | ||
|
||
# cleanup output folder before running tests | ||
if [ -d "$BATCH_OUT_DIR" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand, it would fail if any previous run is there and you try to run it again. If you select overwrite then it removes it first.
I'm ok with leaving it this way, but please add comment in top level documentation.
log_msg " By default is is set to test/resources/QualificationExpectations" | ||
log_msg " --qual-events-dir=<arg> - Path of the Spark events logs used as input for the unit tests" | ||
log_msg " By default is is set to test/resources/spark-events-qualification" | ||
log_msg " --prof-log-dir=<arg> - Some the unit tests reads files located in another directory." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove the option then. sorry if I was confusing, what does that mean then for this script? we don't have all our eventlogs in one directory, so this script won't work with it?
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
build |
1 similar comment
build |
@@ -0,0 +1,451 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry can we update the script name to something like generate-qualification-test-results, otherwise someone might think this is to run the tests
so one thing we didn't address in #2949, is the documentation, I'm fine with waiting and see if we really need it. |
build |
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
build |
1 similar comment
build |
Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me
fixes #5859 and fixes #2949
Changes
tools/src/test/resources/dev/run-qualification-tests.sh
The above command will simply iterate on all the expected csv file and re-run the qualification tools. The generated output is then compared to the expected files.