Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Canceling a job #511

Open
johnml1135 opened this issue Oct 15, 2024 · 3 comments
Open

Issue with Canceling a job #511

johnml1135 opened this issue Oct 15, 2024 · 3 comments
Assignees
Labels
sf_watching Scripture Forge should be updated when this is resolved or updated

Comments

@johnml1135
Copy link
Collaborator

Sometimes a job is told to cancel from the API, but it doesn't cancel.

  • slow cancel - "/api/v1/translation/engines/66c646da2c3a820e41266b83/builds/6707ebe5f324bff3134d715e",
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[100]
      Start processing HTTP request POST https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build/cancel
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[100]
      Sending HTTP request POST https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build/cancel
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[101]
      Received HTTP response headers after 373.1305ms - 204
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[101]
      End processing HTTP request after 373.44ms - 204
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[100]
      Start processing HTTP request GET https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[100]
      Sending HTTP request GET https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[101]
      Received HTTP response headers after 194.7167ms - 200
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[101]
      End processing HTTP request after 195.1597ms - 200
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[100]
      Start processing HTTP request GET https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[100]
      Sending HTTP request GET https://qa.serval-api.org/api/v1/translation/engines/66c646da2c3a820e41266b83/current-build
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.ClientHandler[101]
      Received HTTP response headers after 195.3227ms - 200
[2024-10-10 15:15:59] info: System.Net.Http.HttpClient.machine_api.LogicalHandler[101]
      End processing HTTP request after 195.6328ms - 200

@johnml1135 johnml1135 added the sf_watching Scripture Forge should be updated when this is resolved or updated label Oct 15, 2024
@ddaspit
Copy link
Contributor

ddaspit commented Oct 15, 2024

Is this an NMT or SMT engine?

@johnml1135
Copy link
Collaborator Author

{
    "id": "6707ebe5f324bff3134d715e",
    "url": "/api/v1/translation/engines/66c646da2c3a820e41266b83/builds/6707ebe5f324bff3134d715e",
    "revision": 108,
    "engine": {
      "id": "66c646da2c3a820e41266b83",
      "url": "/api/v1/translation/engines/66c646da2c3a820e41266b83"
    },
    "trainOn": [
      {
        "corpus": {
          "id": "66f3006ff324bff3134ae5eb",
          "url": "/api/v1/translation/engines/66c646da2c3a820e41266b83/corpora/66f3006ff324bff3134ae5eb"
        },
        "scriptureRange": "MAT;MRK;LUK;JHN;ACT;ROM;1CO;2CO;GAL;EPH;PHP;COL;1TH;2TH;1TI;2TI;TIT;PHM;HEB;JAS;1PE;2PE;1JN;2JN;3JN;JUD;REV"
      }
    ],
    "pretranslate": [
      {
        "corpus": {
          "id": "66c646dd2c3a820e41266b86",
          "url": "/api/v1/translation/engines/66c646da2c3a820e41266b83/corpora/66c646dd2c3a820e41266b86"
        },
        "scriptureRange": "GEN"
      }
    ],
    "step": 5000,
    "percentCompleted": 1,
    "message": "Canceled",
    "queueDepth": 0,
    "state": "Canceled",
    "dateFinished": "2024-10-10T18:03:16.816Z",
    "options": {
      "train_params": {
        "warmup_steps": 1000,
        "learning_rate": 0.0002,
        "lr_scheduler_type": "cosine",
        "max_steps": 5000
      }
    }
  }

@johnml1135
Copy link
Collaborator Author

johnml1135 commented Oct 16, 2024

I don't quite know what happened.

  • Completed: 2024-09-27T17:15:30.651Z,
  • Completed: 2024-10-08T15:38:17.99Z,
  • 6707ebe5f324bff3134d715e - Canceled: 2024-10-10T18:03:16.816Z -> ClearML - completed
  • 67093902f324bff3134d80d8 - Canceled: 2024-10-11T15:18:33.362Z, -> ClearML - aborted
  • 670ed5d4694f517c27c639b8 - Completed: 2024-10-16T04:08:01.022Z, ClearML - completed.
    Loki logs only go back a week, but only the 10-16 build shows up in the Loki logs (the 10-11 and 10-10 one should as well). The 10-11 one correctly shows in ClearML as cancelled (aborted), but the 10-10 one completed. What is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sf_watching Scripture Forge should be updated when this is resolved or updated
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants