Complete noob questions - 1) model purpose? 2) pre-trained weights? 3) other languages? #10

taewookim · 2018-04-05T05:11:04Z

Excuse my complete noob-ness

is the model trying to accurately determine if the video (i.e. shape of lips) and audio are sync'ed?
Any pre-trained weights I can download to run ?
Assuming my Q1 is correct.. has anyone tested to see if this model can accurately detect audio/video synchronization on non-english languages?

astorfi · 2018-04-07T16:06:31Z

Yes, ideally the method should be able to do so.
No. Unfortunately, due to some data privacy, the trained weights have not been released. Although the dataset is public and available as The BBC-Oxford 'Lip Reading in the Wild' (LRW) Dataset.
A similar model without 3D convolution operation and online pair selection has been proposed and implemented and titled as Out of time: automated lip sync in the wild. We compared our method with the aforementioned research effort but did not go to that level.

taewookim · 2018-04-08T11:25:10Z

thank you @astorfi
Regarding Q3.. have you ever run the model on videos where speakers are speaking in non-English language? The models don't have to be super accurate, but i was wondering if this model was 'good enough' to determine audio spoofing of videos of non-English speakers.

Suppose a spoofer was attempting to bypass a system that uses face and speech recognition. He would hold up a video that contains the victim's face and voice recorded on, say, an ipad. He would be hiding from the detection camera (to defeat facial recognition) and would use his own voice, not the voice from ipad (to defeat the the speech recocognition system).

Simple solution might be to just look at time offset of the words and compare with the time offset of when the lips move. Of course, this isn't perfect, but at least somewhere to start from. Any idea what part of your code I can modify to detect this?

astorfi · 2018-04-08T18:46:33Z

No, I personally did not run it on Non-English dataset but the paper that I mentioned did it (Out of time: automated lip sync in the wild). About the question you are asking, unfortunately, I am not expert.

taewookim · 2018-04-09T08:41:08Z

thanks you

taewookim closed this as completed Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete noob questions - 1) model purpose? 2) pre-trained weights? 3) other languages? #10

Complete noob questions - 1) model purpose? 2) pre-trained weights? 3) other languages? #10

taewookim commented Apr 5, 2018

astorfi commented Apr 7, 2018

taewookim commented Apr 8, 2018 •

edited

Loading

astorfi commented Apr 8, 2018

taewookim commented Apr 9, 2018

Complete noob questions - 1) model purpose? 2) pre-trained weights? 3) other languages? #10

Complete noob questions - 1) model purpose? 2) pre-trained weights? 3) other languages? #10

Comments

taewookim commented Apr 5, 2018

astorfi commented Apr 7, 2018

taewookim commented Apr 8, 2018 • edited Loading

astorfi commented Apr 8, 2018

taewookim commented Apr 9, 2018

taewookim commented Apr 8, 2018 •

edited

Loading