Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Livestreams fail to progress to convert and move steps #457

Open
Dragonatorul opened this issue Jun 27, 2024 · 8 comments
Open

Livestreams fail to progress to convert and move steps #457

Dragonatorul opened this issue Jun 27, 2024 · 8 comments

Comments

@Dragonatorul
Copy link

Dragonatorul commented Jun 27, 2024

I think this is caused by a network connectivity issue. To be specific: A livestream was being recorded, when the power on the router cut out, thus cutting the network and internet connection. The server remained running as it was on a separate circuit. When the network came back on the stream recording appears to have continued on a new queue item.

However, all subsequent streams, even those that started normally 24h later, are stuck. The recording appears to have completed successfully, but it does not progress to the convert or move steps. I don't see any corresponding workflows in the workflow tab.

Additionally, the chat doesn't seem to work either. I have discord alerts configured and I keep receiving alerts such as this:

 Error: Queue ID 4b0ba51f-2e10-44c4-84b8-ff110d6db7d6 for OniGiri failed at task kill-chat-download.

Edit: I just noticed the difference in the stream IDs (the numeric ID used by twitch). The one starting with 4 is the ID for the livestream itself. The second one starting with 2 is the ID of the corresponding VOD. What's weird is that they're both labeled as LIVE streams and are both stuck. I presume that's why the actual VOD wasn't archived either. I had the same problem with another stream, which I manually deleted from the VODs section, which was then re-downloaded as a VOD. But that was only AFTER deleting the "livestream" which had the VOD ID and was stuck. It's still downloading now, so I don't know if it will succeed yet.

image

image

image

@Zibbp
Copy link
Owner

Zibbp commented Jun 27, 2024

Can you run the below to save the logs and upload them here?

docker logs ganymede-api >& /tmp/ganymede.log

@Majow
Copy link

Majow commented Jun 27, 2024

I would love to contribute to the issue if you both don't mind, I'm having the same problem
ganymede.log
chrome_2024-06-27_20-05-23
chrome_2024-06-27_20-06-20
chrome_2024-06-27_20-05-45

@Zibbp
Copy link
Owner

Zibbp commented Jun 27, 2024

@Majow I see the worker is panicing when trying to stop the chat. I've fixed the panic in #458 but I'm not 100% it's going to resolve your issue. When you're not archiving anything can you pull and use the :main image tag for the API? If this happens again when using the :main image tag, please post back with the logs again.

If you want to try to recover the 'stuck' streams before restarting the container, follow these steps: #450 (comment)

@Dragonatorul
Copy link
Author

Here are my logs. They're 14MB of junk because I'm checking for livestreams every minute. Hope it helps though.

ganymede.zip

@Zibbp
Copy link
Owner

Zibbp commented Jun 28, 2024

@Dragonatorul Looks like something is failing to kill the chat download. Can you run docker exec ganymede-api ps aux and post the output? When this is run, are you archiving anything?

If you don't have the /tmp directory mounted to your host then I would run the docker cp command in #450 (comment) to make a copy of the archives then restart the api container if nothing is being archived. You can go through the recovery instructions n the linked issue to get those manually imported.

@Dragonatorul
Copy link
Author

Dragonatorul commented Jun 28, 2024

Thanks! Makes sense on the chat, but what about the video convert and move? Now that I think about it, this was a problem before moving to the new queue system, but when it happened before I'd just stop the download task and start the convert task, which then triggered the move task. That way I'd have at least a partial download/vod available. Is there a way to do that now? The buttons to do that seem to be gone.

This is the output.

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   3924    24 ?        Ss   Jun22   0:00 /bin/bash /usr/local/bin/entrypoint.sh
abc           23  0.0  0.2 1270324 36488 ?       Sl   Jun22   0:55 /opt/app/ganymede-api
abc           35  0.3  0.4 1520996 58696 ?       Sl   Jun22  27:04 /opt/app/ganymede-worker
abc         1018  0.0  0.0  35084  8592 ?        S    Jun25   2:10 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/makofukasame --output /tmp/44427342619_15c36b19-32d2-11ef-9940-0242c0a81007-live-chat.json -q
abc         1027  0.0  0.0  35096 10552 ?        S    Jun25   2:08 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/vividlyvivi --output /tmp/44427438267_ca6aab4e-32d8-11ef-9940-0242c0a81007-live-chat.json -q
abc         1045  0.0  0.0  35104  9196 ?        S    Jun25   2:08 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/vividlyvivi --output /tmp/44427438267_64949582-32ea-11ef-9940-0242c0a81007-live-chat.json -q
abc         1047  0.0  0.0  35100 11660 ?        S    Jun25   2:08 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/makofukasame --output /tmp/44427342619_648c4701-32ea-11ef-9940-0242c0a81007-live-chat.json -q
abc         1258  0.0  0.1  35084 14640 ?        S    Jun26   1:30 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/makofukasame --output /tmp/44432814891_f8a881c6-339a-11ef-9940-0242c0a81007-live-chat.json -q
abc         1396  0.0  0.0  35116 11392 ?        S    Jun27   0:56 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/vividlyvivi --output /tmp/44436975883_a2121cb4-342e-11ef-9940-0242c0a81007-live-chat.json -q
abc         1500  0.0  0.0  35264 10936 ?        S    Jun27   0:45 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/onigiri --output /tmp/41402912407_59567ecf-347a-11ef-9940-0242c0a81007-live-chat.json -q
abc         1694  0.0  0.0  35096  8676 ?        S    Jun27   0:36 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/onigiri --output /tmp/41402912407_59567ecf-347a-11ef-9940-0242c0a81007-live-chat.json -q
abc         1702  0.0  0.0  35092  8636 ?        S    Jun27   0:36 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/onigiri --output /tmp/41402912407_59567ecf-347a-11ef-9940-0242c0a81007-live-chat.json -q
abc         2175  2.7  0.3 351612 53432 ?        Sl   12:03   2:54 /usr/bin/python3 /usr/local/bin/streamlink --progress=force --force https://twitch.tv/makofukasame best,best --http-header Authorization=OAuth 65gwiiuges3gis9n721ye2ov1e7ast --twitch-low-latency --twitch-disable-hosting -o /tmp/44443664363_29b6ac80-352d-11ef-9940-0242c0a81007-video.mp4
abc         2176  0.0  0.2  35084 30404 ?        S    12:03   0:03 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/makofukasame --output /tmp/44443664363_29b6ac80-352d-11ef-9940-0242c0a81007-live-chat.json -q
abc         2183  3.0  0.3 351612 53636 ?        Sl   13:00   1:25 /usr/bin/python3 /usr/local/bin/streamlink --progress=force --force https://twitch.tv/miyagalactic best,best --http-header Authorization=OAuth 65gwiiuges3gis9n721ye2ov1e7ast --twitch-low-latency --twitch-disable-hosting -o /tmp/44443790555_43f1b294-3535-11ef-9940-0242c0a81007-video.mp4
abc         2184  0.0  0.2  35080 29912 ?        S    13:00   0:01 /usr/bin/python3 /usr/local/bin/chat_downloader https://twitch.tv/miyagalactic --output /tmp/44443790555_43f1b294-3535-11ef-9940-0242c0a81007-live-chat.json -q
root        2191 37.5  0.0   8100  3964 ?        Rs   13:48   0:00 ps aux

And this is the current queue. Only the last two items are actually live. The rest are stuck there for days now.

image

@Dragonatorul
Copy link
Author

Dragonatorul commented Jun 28, 2024

Also, this is my deployment docker compose if it helps. The temp folder is local, but the destination is over SMB, in another VM, but on the same hardware node. It's all being served through an Nginx Proxy Manager reverse proxy front-end.

version: "3.3"
services:
  ganymede-api:
    container_name: ganymede-api
    image: ghcr.io/zibbp/ganymede:latest
    restart: unless-stopped
    depends_on:
      - ganymede-temporal
    environment:
      - TZ=$TZ
      - DB_HOST=$DB_HOST
      - DB_PORT=$DB_PORT
      - DB_USER=$DB_USER
      - DB_PASS=$DB_PASS
      - DB_NAME=$DB_NAME
      - DB_SSL=$DB_SSL
      - JWT_SECRET=$JWT_SECRET
      - JWT_REFRESH_SECRET=$JWT_REFRESH_SECRET
      - TWITCH_CLIENT_ID=$TWITCH_CLIENT_ID
      - TWITCH_CLIENT_SECRET=$TWITCH_CLIENT_SECRET
      - FRONTEND_HOST=$FRONTEND_HOST
      - COOKIE_DOMAIN=$ROOT_DOMAIN
      # OPTIONAL
      # - OAUTH_PROVIDER_URL=
      # - OAUTH_CLIENT_ID=
      # - OAUTH_CLIENT_SECRET=
      # - OAUTH_REDIRECT_URL=http://IP:PORT/api/v1/auth/oauth/callback # Points to the API service
      - TEMPORAL_URL=ganymede-temporal:7233
      # WORKER
      - MAX_CHAT_DOWNLOAD_EXECUTIONS=5
      - MAX_CHAT_RENDER_EXECUTIONS=3
      - MAX_VIDEO_DOWNLOAD_EXECUTIONS=5
      - MAX_VIDEO_CONVERT_EXECUTIONS=3
    volumes:
      - ganymede_vods:/vods
      - ganymede_logs:/logs
      - ganymede_data:/data
      - ganymede_tmp:/tmp
    # ports:
    #   - 4800:4000
    networks:
      - npm_shared_public
    dns:
      - 192.168.1.50
      - 1.1.1.1
      - 8.8.8.8
  ganymede-frontend:
    container_name: ganymede-frontend
    image: ghcr.io/zibbp/ganymede-frontend:latest
    restart: unless-stopped
    environment:
      - API_URL=$API_URL
      - CDN_URL=$CDN_URL
      - SHOW_SSO_LOGIN_BUTTON=$SHOW_SSO_LOGIN_BUTTON
      - FORCE_SSO_AUTH=$FORCE_SSO_AUTH
      - REQUIRE_LOGIN=$REQUIRE_LOGIN
    # ports:
    #   - 4801:3000
    networks:
      - npm_shared_public
    dns:
      - 192.168.1.50
      - 1.1.1.1
      - 8.8.8.8
  ganymede-temporal:
    image: temporalio/auto-setup:1
    container_name: ganymede-temporal
    depends_on:
      - ganymede-db
    environment:
      - DB=postgresql # this tells temporal to use postgres (not the db name)
      - DB_PORT=5432
      - POSTGRES_USER=$DB_USER
      - POSTGRES_PWD=$DB_PASS
      - POSTGRES_SEEDS=ganymede-db # name of the db service
    networks:
      - npm_shared_public
    ports:
      - 7233:7233
  ganymede-db:
    container_name: ganymede-db
    image: postgres:14
    restart: unless-stopped
    volumes:
      - ganymede_db:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=$DB_PASS
      - POSTGRES_USER=$DB_USER
      - POSTGRES_DB=$DB_NAME
    ports:
      - 4803:5432
    networks:
      - npm_shared_public

  ganymede-nginx:
    container_name: ganymede-nginx
    image: nginx
    restart: unless-stopped
    volumes:
      - $GANYMEDE_GIT_PATH/nginx.conf:/etc/nginx/nginx.conf:ro
      - ganymede_vods:/mnt/vods
  #   ports:
  #     - 4802:8080
    networks:
      - npm_shared_public

volumes:
  git:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_GIT_PATH}
  ganymede_data:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_DATA_PATH}
  ganymede_logs:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_LOGS_PATH}
  ganymede_db:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_DB_PATH}
  ganymede_nginx:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_NGINX_PATH}
  ganymede_tmp:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: ${GANYMEDE_TMP_PATH}
  ganymede_vods:
    driver: local
    driver_opts:
      type: "cifs"
      o: "addr=${SHARE_HOST},username=${SHARE_USER},password=${SHARE_PASS},file_mode=0777,dir_mode=0777,iocharset=utf8"
      device: "//${SHARE_HOST}/ganymede_vods"

networks:
  npm_shared_public:
    external: true

@Majow
Copy link

Majow commented Jul 1, 2024

@Majow I see the worker is panicing when trying to stop the chat. I've fixed the panic in #458 but I'm not 100% it's going to resolve your issue. When you're not archiving anything can you pull and use the :main image tag for the API? If this happens again when using the :main image tag, please post back with the logs again.

If you want to try to recover the 'stuck' streams before restarting the container, follow these steps: #450 (comment)

I believe it did fix my issues, no stream has been stuck since that. Will update you if anything changes.

Thanks for solving this problem so fast!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants