-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S2 pretrained model of InternVideo2 does not work well for Zero-Shot Video-Text Retrieval #107
Comments
我也遇到了类似的问题 |
提供的示例是个极端难的样例。这些句子是通过从视频中提取的关键元素(例如狗、人类、雪、玩耍)由GPT生成,来描述视频内容的,所以它们本身很多意思都很接近。我们打算利用这个案例研究来展示我们对进一步深入理解运动描述中的微妙区别的兴趣。 正常的视频检索,看看试试我们模型在主流基准上的测试或者试一些正常的例子可以感受下它的效果。 The provided example poses a considerable challenge for video understanding models due to its high level of complexity. These sentences have been generated by GPT, utilizing key elements such as dogs, humans, snow, and play that were extracted from the video in order to accurately capture the essence of the footage. We intend to leverage this case study to demonstrate our commitment to the deep understanding of subtle and detailed motion descriptions in the future. Furthermore, we encourage you to evaluate our performance in typical video retrieval cases, as it consistently ranks among the top performers across mainstream benchmarks. |
在导入包之后加上下面的代码,如果直接运行能够保证结果一样。但是模型内部应该还有随机化流程,如果同样的seed在运行一次加载权重,结果就会变。 np.random.seed(seed) |
@SHYuanBest @1093842024 @Wenju-Huang 之前的demo加载权重有些问题,实际上没加载对预训练权重,现已修复,模型内部是没有随机化流程的,多次运行的轻微差异可能是由于Pytorch的计算误差。 |
我上面是修复模型加载问题后的结果,只不过是没有经softmax做归一化,加上softmax后结果和您的还是不一样,请问你测试的是https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/blob/main/InternVideo2-stage2_1b-224p-f4.pt 这个模型吗。 |
是的,请问是得到的分数排序不一样吗,我理解只要排序一样,分数有细微差异也可以接受,这个问题其实我们很早就发现了,可能是代码隐式的包含了一些并行加速模块之类的,导致每次计算有一些随机波动,但是不影响最终排序结果 |
|
Yes, the results are very relevant to the setting of the random number seed. |
Can you release the 6B model? Although it may not improve much. |
如果在加载完模型后更换多个随机数种子,结果还是会发生变化吗,如果会的话,说明确实是计算误差,反之可能是因为权重没加载对导致有权重是随机初始化的。 |
在导入包之后加上下面的代码,如果直接运行能够保证结果一样。但是模型内部应该还有随机化流程,如果同样的seed在运行一次加载权重,结果就会变。 np.random.seed(seed) |
@shepnerd @Wenju-Huang @1093842024 @SHYuanBest I find this bug is because we forget add |
I changed the text_candidates to 700 list of kinetics class names but no playing with dog or play snow likewise is picked for example.mp4 . Could you help me. thx :P |
直接跑demo/demo.ipynb, 模型选用https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/blob/main/InternVideo2-stage2_1b-224p-f4.pt 发现效果不太理想。
首先需要修改两个地方才能正确加载模型:
1、demo/demo.ipynb 中在setup_internvideo2(config)前面加上一句 config['pretrained_path'] = model_pth
2、demo/utils.py 第82和84行改成is_pretrain=True
修改后demo中提供的视频和10个句子的相似度分数(不经过softmax)为:
可以发现分数最高者并不是正确的描述,同时十个句子得分都比较接近。
The text was updated successfully, but these errors were encountered: