Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint to return pretanslations as USFM #292

Merged
merged 1 commit into from
Feb 1, 2024

Conversation

ddaspit
Copy link
Contributor

@ddaspit ddaspit commented Jan 30, 2024

  • add Scripture data file service
  • cleanup translation controllers
  • cleanup tests
  • update test dependencies
  • customize NSwag client template to support "text/plain" response

This change is Reviewable

@codecov-commenter
Copy link

Codecov Report

Attention: 222 lines in your changes are missing coverage. Please review.

Comparison is base (4c45abf) 71.23% compared to head (ed75d04) 69.48%.

Files Patch % Lines
src/Serval.Client/Client.g.cs 28.04% 109 Missing and 9 partials ⚠️
...hared/Services/ZipParatextProjectSettingsParser.cs 38.77% 26 Missing and 4 partials ⚠️
...Serval.Shared/Services/ScriptureDataFileService.cs 60.86% 17 Missing and 1 partial ⚠️
src/Serval.Shared/Services/ZipContainer.cs 0.00% 17 Missing ⚠️
...lation/Controllers/TranslationEnginesController.cs 87.17% 7 Missing and 3 partials ⚠️
src/Serval.Translation/Services/EngineService.cs 74.07% 6 Missing and 1 partial ⚠️
.../Serval.DataFiles/Consumers/GetDataFileConsumer.cs 40.00% 6 Missing ⚠️
...rval.Translation/Services/PretranslationService.cs 91.30% 2 Missing and 2 partials ⚠️
src/Serval.DataFiles/Services/DataFileService.cs 76.92% 2 Missing and 1 partial ⚠️
src/Serval.Shared/Services/FileSystem.cs 0.00% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #292      +/-   ##
==========================================
- Coverage   71.23%   69.48%   -1.76%     
==========================================
  Files         141      148       +7     
  Lines        6498     6770     +272     
  Branches     1060     1076      +16     
==========================================
+ Hits         4629     4704      +75     
- Misses       1429     1629     +200     
+ Partials      440      437       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnml1135
Copy link
Collaborator

src/Serval.ApiServer/nswag.json line 26 at r1 (raw file):

      "configurationClass": null,
      "generateClientClasses": true,
      "suppressClientClassesOutput": false,

What is the reason for these targeted changes?

@johnml1135
Copy link
Collaborator

@johnml1135
Copy link
Collaborator

src/Serval.Client/Serval.Client.csproj line 8 at r1 (raw file):

		<Description>Client classes for Serval.</Description>
		<RootNamespace>Serval.Client</RootNamespace>
		<NoWarn>8618</NoWarn>

Was this because of the Nswag client generation? Won't this be fixed through RicoSuter/NSwag#4704? Should we make a new issue to track removing this warning?

@johnml1135
Copy link
Collaborator

src/Serval.DataFiles/Consumers/GetDataFileConsumer.cs line 1 at r1 (raw file):

using Serval.Shared.Utils;

This is unneeded.

@johnml1135
Copy link
Collaborator

src/Serval.DataFiles/Controllers/DataFilesController.cs line 60 at r1 (raw file):

    {
        DataFile dataFile = await _dataFileService.GetAsync(id, cancellationToken);
        await AuthorizeAsync(dataFile);

Nice refactoring!

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Controllers/BadRequestExceptionFilter.cs line 7 at r1 (raw file):

    public override void OnException(ExceptionContext context)
    {
        if (context.Exception is InvalidOperationException)

Is this trying to address #288? BadRequestObjectResult will just return a 400. That should be ok (enough).

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Controllers/ErrorResultFilter.cs line 1 at r1 (raw file):

using System.Diagnostics;

Should this using also be stuffed into globals?

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Services/ScriptureDataFileService.cs line 12 at r1 (raw file):

    {
        using IZipContainer container = _fileSystem.OpenZipFile(GetFilePath(filename));
        ZipParatextProjectSettingsParser settingsParser = new(_fileSystem, container);

This is duplicated code - should (likely) be replaced with ParseProjectSettings(container).

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Services/ScriptureDataFileService.cs line 40 at r1 (raw file):

    private static string GetBookFileName(ParatextProjectSettings settings, string bookId)
    {
        string bookPart;

There is some fun and crazy logic here. Is this different versions of Paratext? Is there some link or explanation that can be put here to help with breadcrumbs in the future when someone wants to know why these algorithms are in place?

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Services/ZipParatextProjectSettingsParser.cs line 3 at r1 (raw file):

namespace Serval.Shared.Services;

public class ZipParatextProjectSettingsParser(IFileSystem fileSystem, IZipContainer projectContainer)

How is this different or distinct from the Machine ZipParatextProjectSettingsParser.cs? You appear to be using the IZipContainer and IFileSystem in Serval (as well as the more recent version of C#) which are not available in Machine. Is there anything else I am missing?

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 107 at r1 (raw file):

    /// <response code="403">The authenticated client cannot perform the operation or does not own the translation engine</response>
    /// <response code="404">The engine does not exist</response>
    /// <response code="503">A necessary service is currently unavailable. Check `/health` for more details. </response>

So, 422's got coalesced into 400's?

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 354 at r1 (raw file):

        Engine engine = await _engineService.GetAsync(id, cancellationToken);
        await AuthorizeAsync(engine);
        Corpus corpus = await MapAsync(getDataFileClient, idGenerator.GenerateId(), corpusConfig, cancellationToken);

So, we don't care if the engine and corpus source languages don't match? It may be an unnecessary restriction.

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 613 at r1 (raw file):

    /// <remarks>
    /// If the book exists in the target corpus, then the pretranslated target book will be returned. If the book does
    /// not exist in the target corpus, then the pretranslated source book will be returned.

Potential update: "If the USFM book exists in the target corpus, then the pretranslated text will be inserted into the target book and returned. If the USFM book does not exist in the target corpus, then the pretranslated text will be inserted into the source book and returned."

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 620 at r1 (raw file):

    /// <param name="cancellationToken"></param>
    /// <response code="200">The book in USFM format</response>
    /// <response code="204">The book does not exist in the corpus</response>

Potential update: The USFM book specified does not exist in the source or target corpus.

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEngineTypesController.cs line 6 at r1 (raw file):

[Route("api/v{version:apiVersion}/translation/engine-types")]
[OpenApiTag("Translation Engines")]
public class TranslationEngineTypesController(IAuthorizationService authService, IEngineService engineService)

Nice catch.

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Services/EngineService.cs line 259 at r1 (raw file):

        catch (RpcException re)
        {
            if (re.StatusCode is StatusCode.Aborted)

This is to handle the aborted race condition (if I am correct) so that it will appear to the outside world that the aborted engine is already gone. Still, I don't see you throwing anything on the CancelBuildAsync in Machine that would catch this code...

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Services/PretranslationService.cs line 1 at r1 (raw file):

using SIL.Machine.Corpora;

Put in globals.

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 613 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Potential update: "If the USFM book exists in the target corpus, then the pretranslated text will be inserted into the target book and returned. If the USFM book does not exist in the target corpus, then the pretranslated text will be inserted into the source book and returned."

Further: "If the USFM book exists in the target corpus, then the pretranslated text will be inserted into any empty segments in the the target book and returned. If the USFM book does not exist in the target corpus, then the pretranslated text will be inserted into the source book and returned."

@johnml1135
Copy link
Collaborator

tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1145 at r1 (raw file):

            await _env.Builds.InsertAsync(build);
            _env.NmtClient.CancelBuildAsync(Arg.Any<CancelBuildRequest>(), null, null, Arg.Any<CancellationToken>())
                .Returns(CreateAsyncUnaryCall<Empty>(StatusCode.Aborted));

As per the other comment, when will this exist in real life?

@johnml1135
Copy link
Collaborator

tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1261 at r1 (raw file):

    }

    public async Task GetLanguageInfoAsync()

Why was the [Test] annotation removed?

@johnml1135
Copy link
Collaborator

tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1210 at r1 (raw file):

        await _env.Pretranslations.InsertAsync(pret);

        string usfm = await client.GetPretranslatedUsfmAsync(ECHO_ENGINE1_ID, addedCorpus.Id, "MAT");

You may do this other places, but we should test stripping all text out and also inserting into a target USFM where only some are added.

@johnml1135
Copy link
Collaborator

tests/Serval.DataFiles.Tests/Services/DataFileServiceTests.cs line 2 at r1 (raw file):

using Serval.Shared.Contracts;
using Serval.Shared.Utils;

Shouldn't these be put in the globals?

@johnml1135
Copy link
Collaborator

tests/Serval.E2ETests/ServalApiTests.cs line 104 at r1 (raw file):

    public async Task NmtBatch()
    {
        string engineId = await _helperClient.CreateNewEngine("Nmt", "es", "en", "NMT1");

So, without ClearEngines, then there will be a glob of engines built up on the k8s services. Are you addressing this another way? Is there another concern you are trying to address?

@johnml1135
Copy link
Collaborator

tests/Serval.E2ETests/ServalApiTests.cs line 104 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, without ClearEngines, then there will be a glob of engines built up on the k8s services. Are you addressing this another way? Is there another concern you are trying to address?

I see - the disposal methods.

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 62 files reviewed, 22 unresolved discussions (waiting on @johnml1135)


src/Serval.ApiServer/nswag.json line 26 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

What is the reason for these targeted changes?

The new version of NSwag made these changes. I'm assuming that added/removed some properties.


src/Serval.ApiServer/Templates/Client.Class.ProcessResponse.liquid line 1 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, you are using https://github.com/RicoSuter/NSwag/blob/master/src/NSwag.CodeGeneration.CSharp/Templates/Client.Class.ProcessResponse.liquid. Why did you include it here? What changes are in here?

I added a check for "text/plain" content type. NSwag was not generating the correct code for this case.


src/Serval.Client/Serval.Client.csproj line 8 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Was this because of the Nswag client generation? Won't this be fixed through RicoSuter/NSwag#4704? Should we make a new issue to track removing this warning?

Yes, the issue you mentioned is tracking this problem. We can make an issue.


src/Serval.Shared/Controllers/BadRequestExceptionFilter.cs line 7 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is this trying to address #288? BadRequestObjectResult will just return a 400. That should be ok (enough).

No, this isn't. This was just a refactoring to simplify the controller code.


src/Serval.Shared/Services/ScriptureDataFileService.cs line 40 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

There is some fun and crazy logic here. Is this different versions of Paratext? Is there some link or explanation that can be put here to help with breadcrumbs in the future when someone wants to know why these algorithms are in place?

This is based on how Paratext interprets the naming settings internally. Unfortunately, there isn't really any documentation on it anywhere.


src/Serval.Shared/Services/ZipParatextProjectSettingsParser.cs line 3 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

How is this different or distinct from the Machine ZipParatextProjectSettingsParser.cs? You appear to be using the IZipContainer and IFileSystem in Serval (as well as the more recent version of C#) which are not available in Machine. Is there anything else I am missing?

You are correct. This is the Serval version of the ZipParatextProjectSettingsParser, which uses IFileSystem instead of the file system APIs directly.


src/Serval.Translation/Controllers/TranslationEnginesController.cs line 354 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, we don't care if the engine and corpus source languages don't match? It may be an unnecessary restriction.

Yes, that is correct. Serval was specifically designed to allow for the case where you can include a training corpus that doesn't match the source and target language of the engine. This is to support including related language data in the training corpus.


src/Serval.Translation/Services/EngineService.cs line 259 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

This is to handle the aborted race condition (if I am correct) so that it will appear to the outside world that the aborted engine is already gone. Still, I don't see you throwing anything on the CancelBuildAsync in Machine that would catch this code...

We should add an issue to Machine to check for this.


tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1210 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

You may do this other places, but we should test stripping all text out and also inserting into a target USFM where only some are added.

I have tests for this in Machine.


tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1261 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Why was the [Test] annotation removed?

I missed this when I was resolving conflicts.


tests/Serval.E2ETests/ServalApiTests.cs line 104 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I see - the disposal methods.

Yes, that is correct. Although now that I think about it a bit more, it probably makes sense to add this call to the SetUp method as well just in case a test crashed and never cleaned up the engines.

@johnml1135
Copy link
Collaborator

ZipParatextProjectSettingsParser.cs is not tested for actual stylesheets and custom.sty - and there are a number of other caes that are not covered as per the coverage reports.

@johnml1135
Copy link
Collaborator

src/Serval.Client/Serval.Client.csproj line 8 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Yes, the issue you mentioned is tracking this problem. We can make an issue.

#294

@johnml1135
Copy link
Collaborator

src/Serval.Shared/Services/ScriptureDataFileService.cs line 40 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This is based on how Paratext interprets the naming settings internally. Unfortunately, there isn't really any documentation on it anywhere.

Is there at least an open source repo that you are pulling from to do this? Such as "duplicating logic found here..."

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Services/EngineService.cs line 259 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should add an issue to Machine to check for this.

sillsdev/machine#162

@johnml1135
Copy link
Collaborator

tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1210 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I have tests for this in Machine.

That "should" be ok...

@johnml1135
Copy link
Collaborator

tests/Serval.E2ETests/ServalApiTests.cs line 104 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Yes, that is correct. Although now that I think about it a bit more, it probably makes sense to add this call to the SetUp method as well just in case a test crashed and never cleaned up the engines.

That sounds like a good path forward.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 62 of 62 files at r1, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ddaspit)


src/Serval.Shared/Controllers/ForbiddenExceptionFilter.cs line 10 at r1 (raw file):

        {
            context.Result = new ForbidResult();
            context.ExceptionHandled = true;

I see - for the refactoring. I like it.

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some of the Paratext project settings code over to the Machine library.

Reviewable status: 48 of 63 files reviewed, 13 unresolved discussions (waiting on @johnml1135)


src/Serval.DataFiles/Consumers/GetDataFileConsumer.cs line 1 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

This is unneeded.

Done.


src/Serval.Shared/Controllers/ErrorResultFilter.cs line 1 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Should this using also be stuffed into globals?

Done.


src/Serval.Shared/Services/ScriptureDataFileService.cs line 12 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

This is duplicated code - should (likely) be replaced with ParseProjectSettings(container).

Done.


src/Serval.Shared/Services/ScriptureDataFileService.cs line 40 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Is there at least an open source repo that you are pulling from to do this? Such as "duplicating logic found here..."

No, Paratext is not open source. I think I might move this code to Machine.


src/Serval.Translation/Controllers/TranslationEnginesController.cs line 107 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, 422's got coalesced into 400's?

I couldn't find a case where the 422 would occur.


src/Serval.Translation/Controllers/TranslationEnginesController.cs line 613 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Further: "If the USFM book exists in the target corpus, then the pretranslated text will be inserted into any empty segments in the the target book and returned. If the USFM book does not exist in the target corpus, then the pretranslated text will be inserted into the source book and returned."

Done.


src/Serval.Translation/Controllers/TranslationEnginesController.cs line 620 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Potential update: The USFM book specified does not exist in the source or target corpus.

Done.


src/Serval.Translation/Services/PretranslationService.cs line 1 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Put in globals.

That causes naming collisions in other files.


tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1145 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

As per the other comment, when will this exist in real life?

As I said elsewhere, we need to add a check in Machine.


tests/Serval.ApiServer.IntegrationTests/TranslationEngineTests.cs line 1261 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I missed this when I was resolving conflicts.

Done


tests/Serval.DataFiles.Tests/Services/DataFileServiceTests.cs line 2 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Shouldn't these be put in the globals?

Done.


tests/Serval.E2ETests/ServalApiTests.cs line 104 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

That sounds like a good path forward.

Done

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Controllers/TranslationEnginesController.cs line 613 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Done.

Hmm. one more edit... "will be inserted into an empty template created from the source USFM book and returned." What do you think?

@johnml1135
Copy link
Collaborator

src/Serval.Translation/Services/PretranslationService.cs line 1 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

That causes naming collisions in other files.

Good enough.

@johnml1135
Copy link
Collaborator

tests/Serval.E2ETests/ServalClientHelper.cs line 49 at r2 (raw file):

            throw new InvalidOperationException("The environment variable SERVAL_CLIENT_SECRET is not set.");

        string authToken = await GetAuth0AuthenticationAsync(authUrl, _audience, clientId, clientSecret);

The only thing about this is that we are refreshing the token before every single test - I would think that we would want to just get the token and ENV variables once and then clear the engines before/after every test.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 15 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor Author

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


src/Serval.Translation/Controllers/TranslationEnginesController.cs line 613 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Hmm. one more edit... "will be inserted into an empty template created from the source USFM book and returned." What do you think?

Done.


tests/Serval.E2ETests/ServalClientHelper.cs line 49 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

The only thing about this is that we are refreshing the token before every single test - I would think that we would want to just get the token and ENV variables once and then clear the engines before/after every test.

We were already getting the auth token before every test. The client helper was created in the setup method and the setup method is run before every test.

@johnml1135
Copy link
Collaborator

tests/Serval.E2ETests/ServalClientHelper.cs line 49 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We were already getting the auth token before every test. The client helper was created in the setup method and the setup method is run before every test.

Ok - if it works, it works.

@johnml1135
Copy link
Collaborator

:lgtm:

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@ddaspit ddaspit force-pushed the usfm-pretranslations branch 2 times, most recently from 9c01e06 to 1c0a147 Compare February 1, 2024 19:54
- add Scripture data file service
- cleanup translation controllers
- cleanup tests
- update test dependencies
- customize NSwag client template to support "text/plain" response
Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit 3d70a91 into main Feb 1, 2024
2 checks passed
@ddaspit ddaspit deleted the usfm-pretranslations branch February 1, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants