[FEATURE]: Get Training Backend to Update User on Training Progress with AWS Appsync #920

dwu359 · 2023-08-21T03:24:20Z

Feature Name

Get Training Backend to Update User on Training Progress with AWS Appsync

Your Name

Daniel Wu

Description

The http protocol that we use to communicate between the frontend and backend is unidirectional, meaning that the frontend needs to send a request for the backend to send back a response. To send back training progress, the backend needs to send multiple messages back to the frontend after the initial training request. Luckily, AWS AppSync handles that for us with its websocket pub/sub apis, which can be used to allow bidirectional communication between the frontend and backend. More specifically, we can have both the frontend and backend listen to AppSync's websocket endpoint for messages of a particular channel id and have both the frontend and backend make graph api requests to AppSync to send messages with the same channel id.

Use AWS AppSync to update the user on training progress for a particular training request. For now, let's say that training progress means the # of epochs completed.

github-actions · 2023-08-21T03:24:30Z

Hello @dwu359! Thank you for submitting the Feature Request Form. We appreciate your contribution. 👋

We will look into it and provide a response as soon as possible.

To work on this feature request, you can follow these branch setup instructions:

Checkout the main branch:

```
 git checkout nextjs
```

Pull the latest changes from the remote main branch:

```
 git pull origin nextjs
```

Create a new branch specific to this feature request using the issue number:

```
 git checkout -b feature-920
```

Feel free to make the necessary changes in this branch and submit a pull request when you're ready.

Best regards,
Deep Learning Playground (DLP) Team

karkir0003 · 2023-08-21T03:24:45Z

@dwu359 can you provide more detail?

karkir0003 · 2023-08-22T00:26:02Z

can you provide more detail here?

andrewpeng02 · 2024-02-12T21:49:58Z

Why aws appsync instead of websockets?

karkir0003 · 2024-02-13T02:06:02Z

@dwu359

dwu359 · 2024-02-13T02:12:59Z

Appsync seems to handle the websockets stuff for us, but if you are able to find a way to implement it via websockets, then go for it. I will say though that I looked into implementing it via websockets before and the library support for websockets isn't as good as rest apis.

andrewpeng02 · 2024-02-13T15:08:34Z

I just don't see the need to use another service, and it'll also complicate development (we'd have to deploy to some staging env every time we want to test something?). I'll look into libraries

andrewpeng02 · 2024-02-13T15:47:45Z

What other uses of websockets do you think we'd want to add in the future?

andrewpeng02 · 2024-02-13T21:52:07Z

Django channels seem to be the accepted library for websockets, and the implementation won't be too bad. The one thing is we'd probably have to port our training methods as new websocket consumers and also deal with authentication a bit differently in a middleware. So, it seems like either:

Port the entire train endpoints into websockets. Ninja schemas and stuff may not be supported?
Define a websocket to just check on the current training epoch, will require 2 separate requests and figuring out how to connect the two will be annoying but it'll involve less refactoring (we can likely just create a job uuid on the client side and pass it to the endpoint and websocket)
Retain the original HTTP training endpoints so we don't have to create new authentication and we have schema support for the input, but instead of doing the training in the endpoint, create a task via Celery and return the job id to the user (this is better for long-running tasks too). Then, the user will open a websocket with Django Channels and the Celery task will update the websocket group periodically with the progress and eventually return the result. Long-term, using Celery tasks would be best if we're planning on having long running train times especially with image data.

In terms of effort, 2 < 1 = 3

karkir0003 · 2024-02-14T00:38:22Z

@dwu359 ?

dwu359 · 2024-02-14T00:54:23Z

Django channels seems like a good start, keep in mind you will need to find some way to host the websockets server (likely thru ec2) and access it (likely through api gateway or something else). I'm sorry I can't help much further, I'm no longer a direct contributor to this project and it seems like at this point you know more about websockets than I do.

dwu359 added enhancement New feature or request backend backend tasks labels Aug 21, 2023

noah-iversen assigned noah-iversen, andrewpeng02 and MugPand and unassigned noah-iversen and MugPand Feb 10, 2024

andrewpeng02 mentioned this issue Feb 21, 2024

[FEATURE]: Migrate training into Celery #1136

Closed

andrewpeng02 mentioned this issue Apr 9, 2024

Migrate training into celery and upload results to s3 #1157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Get Training Backend to Update User on Training Progress with AWS Appsync #920

[FEATURE]: Get Training Backend to Update User on Training Progress with AWS Appsync #920

dwu359 commented Aug 21, 2023 •

edited

Loading

github-actions bot commented Aug 21, 2023

karkir0003 commented Aug 21, 2023

karkir0003 commented Aug 22, 2023

andrewpeng02 commented Feb 12, 2024

karkir0003 commented Feb 13, 2024

dwu359 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024 •

edited

Loading

karkir0003 commented Feb 14, 2024

dwu359 commented Feb 14, 2024

[FEATURE]: Get Training Backend to Update User on Training Progress with AWS Appsync #920

[FEATURE]: Get Training Backend to Update User on Training Progress with AWS Appsync #920

Comments

dwu359 commented Aug 21, 2023 • edited Loading

Feature Name

Your Name

Description

github-actions bot commented Aug 21, 2023

karkir0003 commented Aug 21, 2023

karkir0003 commented Aug 22, 2023

andrewpeng02 commented Feb 12, 2024

karkir0003 commented Feb 13, 2024

dwu359 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024

andrewpeng02 commented Feb 13, 2024 • edited Loading

karkir0003 commented Feb 14, 2024

dwu359 commented Feb 14, 2024

dwu359 commented Aug 21, 2023 •

edited

Loading

andrewpeng02 commented Feb 13, 2024 •

edited

Loading