Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High sensitivity defaults #268

Merged
merged 9 commits into from
Sep 2, 2024
Merged

High sensitivity defaults #268

merged 9 commits into from
Sep 2, 2024

Conversation

ekg
Copy link
Collaborator

@ekg ekg commented Sep 2, 2024

Previously, settings that might make runtime slightly better when aligning pangenomes hurt performance in comparative genomics contexts. Updates related to mashmap3 and alignment have made us much more robust to defaults that are more sensitive.

In this PR, I'm setting a bunch of defaults which have become standard in my testing:

  • Default minimum mapping identity reduced from 90% to 70%.
  • Set maximum mapping length to 50k by default (previously unlimited).
  • Changed block length default from 5x segment length to 3x segment length.
  • Set default chain gap to 30kb (previously was 6x segment length, up to 30k).
  • Reduced default segment length from 5k to 1k.
  • Changed default kmer size from 19 to 15.
  • Modified wflign to run on all fragments except very small ones (less than 1000 bp).
  • Changed filtering logic to use Euclidean distance as an absolute cutoff instead of axis-weighted Euclidean distance, while still ranking based on axis-weighted distance.

These should tend to make wfmash more sensitive at the edges of its performance envelope with minimal costs for easy, low-divergence pangenome alignment problems.

@ekg ekg merged commit 4521c10 into main Sep 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant