Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code to generate Stage 3 dataset from ActivityNet or DiDeMo #32

Open
anilbatra2185 opened this issue Jun 24, 2024 · 0 comments
Open

Code to generate Stage 3 dataset from ActivityNet or DiDeMo #32

anilbatra2185 opened this issue Jun 24, 2024 · 0 comments

Comments

@anilbatra2185
Copy link

Hi @huangb23,

Thanks for sharing code for a great work!

Can you please share the inference code to generate the Stage 3 Dataset from ActivityNet/DiDeMo? Specifically, the inference configuration for model to generate the response. I tried the following, however, the responses are different from the shared Stage3 data.

tokenizer = AutoTokenizer.from_pretrained("checkpoints/vicuna-7b-v1.5/")
model = AutoModelForCausalLM.from_pretrained("checkpoints/vicuna-7b-v1.5/")

image

Input Prompt
image

ActivityNet VideoID - v_QOlSCBRmfWY

Output in Shared Stage3 Json

User: What was the primary purpose of the video?
Assistant: The primary purpose of the video appears to be showcasing the young woman's dance routine, highlighting her movements and expression while executing them.
User: Can you pinpoint when the girl changes her pose or dance style?
Assistant: From to , the girl transitions from one pose or dance style to another as the camera captures her movements.
User: Is there any specific reason for the change in dance style or pose?
Assistant: Based on the video content, there is no evident reason for the change in dance style or pose. It could be a creative decision by the videographer or an expression of the dancer's artistic expression.

Generated Output

User: What is the primary action in the video?
Assistant: The primary action in the video is the young woman’s dance, which spans from to .
User: Can you pinpoint when she starts dancing?
Assistant: The dancing commences from to .
User: What is the woman’s expression during her dance?
Assistant: I’m unable to provide an answer as the video doesn’t contain any explicit expressions. However, the woman’s movements can be observed to gain insights into her emotions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant