Skip to content

Combines computer vision and natural language processing to answer text-based questions about images.

Notifications You must be signed in to change notification settings

Gunnika/Visual-Question-Answering

Repository files navigation

Visual-Question-Answering

Working:

Main file: VQA_model.ipynb

Requires annotations file from: https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip

and

Bert.pkl from : https://drive.google.com/file/d/1rStLsUbgAC1uU7Mai5X_-c0qJPxnJk63/view?usp=sharing

both of these files need to be placed in the Files folder.

Dataset we are using:

VQA v2.0

(https://github.com/GT-Vision-Lab/VQA/blob/master/README.md) It consists of :

  • Real

    • 82,783 MS COCO training images, 40,504 MS COCO validation images and 81,434 MS COCO testing images (images are obtained from [MS COCO website] (http://mscoco.org/dataset/#download))
    • 443,757 questions for training, 214,354 questions for validation and 447,793 questions for testing
    • 4,437,570 answers for training and 2,143,540 answers for validation (10 per question)
  • There is only one type of task - Open-ended task

Research paper we are referring to:

https://arxiv.org/abs/1704.03162

PDF : https://arxiv.org/pdf/1704.03162.pdf

Base Code:

https://github.com/iamaaditya/VQA_Demo

https://github.com/iamaaditya/VQA_Keras

Other Datasets :

https://visualqa.org/download.html,

https://arxiv.org/abs/1905.13648,

https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/

https://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

https://paperswithcode.com/task/visual-question-answering

https://vqa.cloudcv.org/

Useful Resources:

https://keras.io/getting-started/functional-api-guide/

https://github.com/Cyanogenoid/pytorch-vqa

https://github.com/nithinraok/VisualQuestion_VQA

Improvements over this code :

https://github.com/Cyanogenoid/vqa-counting/tree/master/vqa-v2

https://github.com/KaihuaTang/VQA2.0-Recent-Approachs-2018.pytorch

About

Combines computer vision and natural language processing to answer text-based questions about images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published