Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Sem-dedup #130

Merged
merged 101 commits into from
Jul 5, 2024
Merged

Enable Sem-dedup #130

merged 101 commits into from
Jul 5, 2024

Commits on Jun 27, 2024

  1. Applying SEO Best Pratices (NVIDIA#104)

    * Rename CPUvsGPU.rst to cpuvsgpu.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DataCuration.rsts to datacuration.rsts
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DistributedDataClassification.rst to distributeddataclassification.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DocumentDataset.rst to documentdataset.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename Download.rst to download.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename GpuDeduplication.rst to gpudeduplication.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename KubernetesCurator.rst to kubernetescurator.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename QualityFiltering.rst to qualityfiltering.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename TaskDecontamination.rst to taskdecontamination.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Update index.rst
    
    Setting all RST files to lowercase names.
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Ignore docs for EOF fixer hook
    
    Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
    
    ---------
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
    Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    2 people authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    18a2fc0 View commit details
    Browse the repository at this point in the history
  2. Shuffle CC result on group before writing out (NVIDIA#110)

    Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    ayushdg authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    f19df32 View commit details
    Browse the repository at this point in the history
  3. Update index.rst (NVIDIA#113)

    Added links to tutorials
    
    Signed-off-by: jgerh <163925524+jgerh@users.noreply.github.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    jgerh authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    42309e6 View commit details
    Browse the repository at this point in the history
  4. first commit

    Signed-off-by: avinashvem <avem@nvidia.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    33332a8 View commit details
    Browse the repository at this point in the history
  5. mv under modules dir

    Signed-off-by: avinashvem <avem@nvidia.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    c633677 View commit details
    Browse the repository at this point in the history
  6. first commit

    Signed-off-by: avinashvem <avem@nvidia.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    d9b8545 View commit details
    Browse the repository at this point in the history
  7. mv under modules dir

    Signed-off-by: avinashvem <avem@nvidia.com>
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    dc135c4 View commit details
    Browse the repository at this point in the history
  8. first commit

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    968a3eb View commit details
    Browse the repository at this point in the history
  9. mv under modules dir

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    f5c51bb View commit details
    Browse the repository at this point in the history
  10. embed by cluster saved

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    f286678 View commit details
    Browse the repository at this point in the history
  11. id map script

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    103c366 View commit details
    Browse the repository at this point in the history
  12. test commit

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    451fa2d View commit details
    Browse the repository at this point in the history
  13. add id map script

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    avem-nv authored and VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    dec4913 View commit details
    Browse the repository at this point in the history
  14. Cleanup compute_embeddings_crossfit.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    bbbe400 View commit details
    Browse the repository at this point in the history
  15. Cleanup compute_embeddings_crossfit.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    5d56cd0 View commit details
    Browse the repository at this point in the history
  16. Pre-commit style fixes

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    9ddf558 View commit details
    Browse the repository at this point in the history
  17. clustering_dask_crossfit.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    4ebab04 View commit details
    Browse the repository at this point in the history
  18. Minor clean up to sort_clusters_crossfit.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    eeee758 View commit details
    Browse the repository at this point in the history
  19. cleanup semdedup_crossfit

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    79beb61 View commit details
    Browse the repository at this point in the history
  20. Remove undo changes

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    e11bbd5 View commit details
    Browse the repository at this point in the history
  21. Remove rename changes

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    3179e24 View commit details
    Browse the repository at this point in the history
  22. Fix rename

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    cbc9960 View commit details
    Browse the repository at this point in the history
  23. Readme formatting

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    57469cb View commit details
    Browse the repository at this point in the history
  24. add dask to semdedup_crossfit.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    f60fc01 View commit details
    Browse the repository at this point in the history
  25. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    c0e36f2 View commit details
    Browse the repository at this point in the history
  26. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    61b21fd View commit details
    Browse the repository at this point in the history
  27. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    94b70f0 View commit details
    Browse the repository at this point in the history
  28. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    2ba596e View commit details
    Browse the repository at this point in the history
  29. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    d8cbd42 View commit details
    Browse the repository at this point in the history
  30. configure max memory using a cli

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    11fcf9d View commit details
    Browse the repository at this point in the history
  31. Dumb id results to parquet

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    8c0d0ce View commit details
    Browse the repository at this point in the history
  32. Embedding fixes

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    cd3f842 View commit details
    Browse the repository at this point in the history
  33. README.md updates

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    1c28b83 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. Working end to end

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    be5a608 View commit details
    Browse the repository at this point in the history
  2. Minor yaml fixes

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    fd6ff60 View commit details
    Browse the repository at this point in the history
  3. Undo changes to index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b307375 View commit details
    Browse the repository at this point in the history
  4. Update .pre-commit-config.yaml

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b30dd52 View commit details
    Browse the repository at this point in the history
  5. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    5d5e07c View commit details
    Browse the repository at this point in the history
  6. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    7d32fb4 View commit details
    Browse the repository at this point in the history
  7. Undo changes to docs/personalidentifiableinformationidentificationand…

    …removal.rst
    
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    d6ead05 View commit details
    Browse the repository at this point in the history
  8. Update fuzzy_dedup.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    e37a1db View commit details
    Browse the repository at this point in the history
  9. Undo changes to docs/personalidentifiableinformationidentificationand…

    …removal.rst
    
    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    d6cd233 View commit details
    Browse the repository at this point in the history
  10. Merge branch 'main' into vjawa/dev_semdedup

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    dfe7db8 View commit details
    Browse the repository at this point in the history
  11. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    79167ec View commit details
    Browse the repository at this point in the history
  12. Add end to end script in readme.md

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    27b5248 View commit details
    Browse the repository at this point in the history
  13. Add type hints

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6c196cf View commit details
    Browse the repository at this point in the history
  14. Use dask for sort_clusters

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    1072d56 View commit details
    Browse the repository at this point in the history
  15. Make sort_clusters work on MNMG scales

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    0258923 View commit details
    Browse the repository at this point in the history
  16. Cleaned up dask shutdown

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b896c8b View commit details
    Browse the repository at this point in the history
  17. Decrease noise in E2E scripts

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    2c03601 View commit details
    Browse the repository at this point in the history
  18. Clean up scripts

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    cde12c2 View commit details
    Browse the repository at this point in the history
  19. Fix scripts/end_to_end_script.sh

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    2e71f65 View commit details
    Browse the repository at this point in the history
  20. Some more cleanup

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    e49573d View commit details
    Browse the repository at this point in the history
  21. Add copyright

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    d291e9d View commit details
    Browse the repository at this point in the history
  22. Fix README.md

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    81cc71c View commit details
    Browse the repository at this point in the history
  23. Address reviews

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    5cd14f1 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2024

  1. Make work with a SemDedupConfig

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    e4713b2 View commit details
    Browse the repository at this point in the history
  2. Make work with SemDedupConfig

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    0b5782d View commit details
    Browse the repository at this point in the history
  3. Move to nemo-curator's logger

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jun 29, 2024
    Configuration menu
    Copy the full SHA
    e119880 View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2024

  1. Semdedup-extract_dedup_data.py

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    f961a2b View commit details
    Browse the repository at this point in the history
  2. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    cd4dab9 View commit details
    Browse the repository at this point in the history
  3. Applying SEO Best Pratices (NVIDIA#104)

    * Rename CPUvsGPU.rst to cpuvsgpu.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DataCuration.rsts to datacuration.rsts
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DistributedDataClassification.rst to distributeddataclassification.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename DocumentDataset.rst to documentdataset.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename Download.rst to download.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename GpuDeduplication.rst to gpudeduplication.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename KubernetesCurator.rst to kubernetescurator.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename QualityFiltering.rst to qualityfiltering.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Rename TaskDecontamination.rst to taskdecontamination.rst
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Update index.rst
    
    Setting all RST files to lowercase names.
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    
    * Ignore docs for EOF fixer hook
    
    Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
    
    ---------
    
    Signed-off-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
    Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
    Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
    2 people authored and VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    c07411c View commit details
    Browse the repository at this point in the history
  4. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    155188e View commit details
    Browse the repository at this point in the history
  5. Fix bad merge

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    d096721 View commit details
    Browse the repository at this point in the history
  6. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    a339e59 View commit details
    Browse the repository at this point in the history
  7. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    d6f2c98 View commit details
    Browse the repository at this point in the history
  8. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    6d6c21c View commit details
    Browse the repository at this point in the history
  9. Update index.rst

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    b761e77 View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2024

  1. Add Module for embedding+clustering

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    7d5fbe9 View commit details
    Browse the repository at this point in the history
  2. Add sorting to clustering

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    9419338 View commit details
    Browse the repository at this point in the history
  3. Refactor Semdup modules

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    5fdb3b4 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2024

  1. Refactor Semdup modules

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    3a6f10c View commit details
    Browse the repository at this point in the history
  2. Refactor Semdup modules

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    9ff7397 View commit details
    Browse the repository at this point in the history
  3. Fix Readme.md

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    a5b5f17 View commit details
    Browse the repository at this point in the history
  4. Add a environment variable to silence HF warnings

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    993ba92 View commit details
    Browse the repository at this point in the history
  5. Merge in main

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    d505d8f View commit details
    Browse the repository at this point in the history
  6. dask-cudf fix

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    835c3a0 View commit details
    Browse the repository at this point in the history
  7. dask-cudf fix

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    2eba719 View commit details
    Browse the repository at this point in the history
  8. dask-cudf fix

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    05d0e88 View commit details
    Browse the repository at this point in the history
  9. Make config a flat file based on reviews

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    0bde039 View commit details
    Browse the repository at this point in the history
  10. Add docstrings

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    ae03905 View commit details
    Browse the repository at this point in the history
  11. Fix argparse and seed function

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    f957b50 View commit details
    Browse the repository at this point in the history
  12. Use argparse to read config

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    eaada91 View commit details
    Browse the repository at this point in the history
  13. Move around config files

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    07f8290 View commit details
    Browse the repository at this point in the history
  14. Move around config files

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    d5997b5 View commit details
    Browse the repository at this point in the history
  15. Move around config files

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    94efa3a View commit details
    Browse the repository at this point in the history
  16. Remove end_to_end_script.sh

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    e9d21e3 View commit details
    Browse the repository at this point in the history
  17. Append Readme

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    14faf60 View commit details
    Browse the repository at this point in the history
  18. Address Reviews

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    a304629 View commit details
    Browse the repository at this point in the history
  19. Change config

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    e7fa30d View commit details
    Browse the repository at this point in the history
  20. Make embedding creation optionally lazy

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    4f46f78 View commit details
    Browse the repository at this point in the history
  21. fix docstring

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    bd43d5d View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2024

  1. Address Reviews and docstrings

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    52480aa View commit details
    Browse the repository at this point in the history
  2. Address Reviews and make eps_thresholds a list of values

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    16ad760 View commit details
    Browse the repository at this point in the history
  3. Minor import fix

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    584340a View commit details
    Browse the repository at this point in the history
  4. Empty Commit

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    01affbb View commit details
    Browse the repository at this point in the history
  5. Add modules to __init__ and README.md

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    eaee1e5 View commit details
    Browse the repository at this point in the history
  6. Fix init

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    1c0f706 View commit details
    Browse the repository at this point in the history
  7. Move comment

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    12373a7 View commit details
    Browse the repository at this point in the history
  8. Empty commit to restart CI (which failed due to a download issue)

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    da909f3 View commit details
    Browse the repository at this point in the history
  9. Empty commit to restart CI (which failed due to a download issue)

    Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
    VibhuJawa committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    c2cd97c View commit details
    Browse the repository at this point in the history