Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TranslationSuggester and friends #49

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

ddaspit
Copy link
Contributor

@ddaspit ddaspit commented Oct 26, 2023

  • replace Mockito with Decoy for mocking

This change is Reviewable

- replace Mockito with Decoy for mocking
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Attention: 65 lines in your changes are missing coverage. Please review.

Files Coverage Δ
machine/jobs/nmt_engine_build_job.py 72.60% <100.00%> (ø)
machine/tokenization/__init__.py 100.00% <100.00%> (ø)
machine/translation/__init__.py 100.00% <100.00%> (ø)
machine/translation/ecm_score_info.py 100.00% <100.00%> (ø)
...hine/translation/interactive_translation_engine.py 76.92% <100.00%> (ø)
...chine/translation/interactive_translation_model.py 75.00% <100.00%> (ø)
machine/translation/translation_constants.py 100.00% <100.00%> (ø)
machine/translation/translation_result_builder.py 90.19% <100.00%> (+51.63%) ⬆️
machine/translation/translation_suggestion.py 100.00% <100.00%> (ø)
machine/translation/word_edit_distance.py 100.00% <100.00%> (ø)
... and 17 more

... and 2 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@johnml1135
Copy link
Collaborator

machine/translation/edit_distance.py line 57 at r1 (raw file):

                dist_matrix[i][j], _, _, _ = self._process_dist_matrix_cell(
                    x, y, dist_matrix, use_prefix_del_op, j != y_count or is_last_item_complete, i, j
                )

Is there a way to speed this up more? It looks very slow - list comprehension? cython? I am assuming that this will be computed many times...

@johnml1135
Copy link
Collaborator

machine/translation/edit_distance.py line 63 at r1 (raw file):

    def _process_dist_matrix_cell(
        self, x: Seq, y: Seq, dist_matrix: List[List[float]], use_prefix_del_op: bool, is_complete: bool, i: int, j: int
    ) -> Tuple[float, int, int, EditOperation]:

Do we need to try to convert this to C or cython code?

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 29 files reviewed, 2 unresolved discussions (waiting on @johnml1135)


machine/translation/edit_distance.py line 57 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is there a way to speed this up more? It looks very slow - list comprehension? cython? I am assuming that this will be computed many times...

This is a dynamic programming algorithm. Generally, the best computational complexity that you can achieve for this algorithm is O(m*n). You can make small optimizations, but not anything significant.


machine/translation/edit_distance.py line 63 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Do we need to try to convert this to C or cython code?

This method is fairly simple. I don't think that converting it to C or Cython will give us much of a boost in performance.

@ddaspit
Copy link
Contributor Author

ddaspit commented Oct 27, 2023

I'm going to go ahead and merge this in, so I can start using it in silnlp. We can defer any optimizations until we have known performance issues.

@ddaspit ddaspit merged commit f2015aa into main Oct 27, 2023
13 of 14 checks passed
@ddaspit ddaspit deleted the interactive-translator branch October 27, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants