Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicate subtitle entries in db #23

Closed
NotJoeMartinez opened this issue May 28, 2023 · 2 comments
Closed

Prevent duplicate subtitle entries in db #23

NotJoeMartinez opened this issue May 28, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@NotJoeMartinez
Copy link
Owner

NotJoeMartinez commented May 28, 2023

The current way we parse vtt files inserts duplicate quote entries with time stamp off by a couple seconds. This is because the vtt files we get from yt-dlp contain duplicate entries except one of them has a bunch of markup to segment the quote. See line 192. Removing these duplicates would probably speed something up

@NotJoeMartinez NotJoeMartinez added the bug Something isn't working label May 28, 2023
@chapmanjacobd
Copy link

I wonder if the ttml format has fewer duplicates. I read that here: https://old.reddit.com/r/youtubedl/comments/yckryy/getting_a_clean_subtitle_form_auto_generated/

@NotJoeMartinez
Copy link
Owner Author

Fixed with #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants