Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add m3exam #93

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add m3exam #93

wants to merge 3 commits into from

Conversation

Jiawei-Guo
Copy link

Add m3exam

# (lang, method, setting, model, test_question, dev_question):
lang = doc["language"]
method = "default"
setting = "zero-shot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the setting has been fixed into zero-shot. But I see that you write a lot of if-else statement evaluating the setting is zero-shot or not, is it true that there will be more settings in the future?

Copy link
Contributor

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I think there are too many duplicated utils file. Is it possible to move different mmmu into the same folder and use a single utils? Or at least remove the repeat code. There is no need for creating different folders for mmmu_English, mmmu_French etc. Can you try to put it under the same folder and maybe a yaml file that allow user to run all the task using a group

Luodian added a commit that referenced this pull request Jun 12, 2024
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5)

* Update author name and email in pyproject.toml

* add mmvet and try to modify llava arch

* Add coco, refcoco support

* Fix doc_to_visual error

* Fix segmentation mask error

* Add refcoco+, refcocog

* Remove debug code

* black lint

* Remove unused code and scripts

* Fix group stderr N/A error between str and int

* Fix letter case issue

* Update lmms_eval tasks and utils

* Fix coco test_split name

* Add llava-bench-in-the-wild support

* Black codestyle, lint

* Add COCO evaluation metric

* Add refcoco, refcocog, refcoco+ evaluation kit

* Add llava bench coco support

---------

Co-authored-by: Bo Li <drluodian@gmail.com>

* VQAv2 eval (#4)

* vqav2

* Add vqav2_process_results function and update vqav2_doc_to_text function

* Implement vqav2_process_results function to return exact match score

* Refactor fewshot_docs() to use config.fewshot_config

* Refactor Task class to handle fewshot_docs when training and validation docs are not available

* Add answer processing logic in vqav2_process_results function

* Refactor vqav2_process_results function and add submission aggregation

* Add vqav2_aggreate_submissions function to utils.py

* textvqa

* Refactor answer processing in textvqa_process_results() function

* textvqa eval

* Update dataset path and modify textvqa_doc_to_text function

* Capitalize the question in textvqa_doc_to_text function

* Update textvqa.yaml and utils.py

* Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py

---------

Co-authored-by: Li Bo <drluodian@gmail.com>

* [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7)

* Update author name and email in pyproject.toml

* add mmvet and try to modify llava arch

* black lint

* Remove unused code and scripts

* Update lmms_eval tasks and utils

* Update LMMS-Eval dependencies and configurations

* Squashed commit of the following:

commit 209f3904f33210bec0b4b146e96fcbd67a4e1541
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Wed Jan 17 20:27:13 2024 +0800

    Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5)

    * Update author name and email in pyproject.toml

    * add mmvet and try to modify llava arch

    * Add coco, refcoco support

    * Fix doc_to_visual error

    * Fix segmentation mask error

    * Add refcoco+, refcocog

    * Remove debug code

    * black lint

    * Remove unused code and scripts

    * Fix group stderr N/A error between str and int

    * Fix letter case issue

    * Update lmms_eval tasks and utils

    * Fix coco test_split name

    * Add llava-bench-in-the-wild support

    * Black codestyle, lint

    * Add COCO evaluation metric

    * Add refcoco, refcocog, refcoco+ evaluation kit

    * Add llava bench coco support

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>

commit f9e48cec5493010a363b446b81a335ef1484e42f
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Wed Jan 17 20:26:58 2024 +0800

    Update utils.py (#6)

* Fix logging issue and remove unnecessary whitespace

* Add openai and pycocoevalcap dependencies

* Fix device mapping issue in Llava constructor

* Add support for truncating context in generation

* Update Llava model and evaluation configuration

* Update YAML configuration files

* Update YAML configuration files

* add otterhd and gemini models

* Add support for custom image aspect ratio in Llava model

* Add dataset_kwargs and max_gen_toks to YAML files

* Fix log_samples suffix typo and use hash for output name

* Refactor LMMS evaluation code and update LLAVA model properties

* matched response for mistral-llava

* Refactor logging in llava_aggregation function

* Print evaluation statistics instead of logging them

* Fix logging information in llava_aggregation function

* Add new models and dataset_kwargs for COCO tasks

* Update truncate_context parameter in Llava class constructor

* Update dataset_kwargs in YAML files

* Remove issue type tags from issue and pull request templates

* add mmvet and try to modify llava arch

* black lint

* Update lmms_eval tasks and utils

* Update LMMS-Eval dependencies and configurations

* Squashed commit of the following:

commit 209f3904f33210bec0b4b146e96fcbd67a4e1541
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Wed Jan 17 20:27:13 2024 +0800

    Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5)

    * Update author name and email in pyproject.toml

    * add mmvet and try to modify llava arch

    * Add coco, refcoco support

    * Fix doc_to_visual error

    * Fix segmentation mask error

    * Add refcoco+, refcocog

    * Remove debug code

    * black lint

    * Remove unused code and scripts

    * Fix group stderr N/A error between str and int

    * Fix letter case issue

    * Update lmms_eval tasks and utils

    * Fix coco test_split name

    * Add llava-bench-in-the-wild support

    * Black codestyle, lint

    * Add COCO evaluation metric

    * Add refcoco, refcocog, refcoco+ evaluation kit

    * Add llava bench coco support

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>

commit f9e48cec5493010a363b446b81a335ef1484e42f
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Wed Jan 17 20:26:58 2024 +0800

    Update utils.py (#6)

* Fix logging issue and remove unnecessary whitespace

* Add openai and pycocoevalcap dependencies

* Fix device mapping issue in Llava constructor

* Add support for truncating context in generation

* Update Llava model and evaluation configuration

* Update YAML configuration files

* Update YAML configuration files

* add otterhd and gemini models

* Add support for custom image aspect ratio in Llava model

* Add dataset_kwargs and max_gen_toks to YAML files

* Fix log_samples suffix typo and use hash for output name

* Refactor LMMS evaluation code and update LLAVA model properties

* matched response for mistral-llava

* Refactor logging in llava_aggregation function

* Print evaluation statistics instead of logging them

* Fix logging information in llava_aggregation function

* Add new models and dataset_kwargs for COCO tasks

* Update truncate_context parameter in Llava class constructor

* Update dataset_kwargs in YAML files

* Remove issue type tags from issue and pull request templates

* Refactor pope utils functions

* Update transformers dependency to version 4.36.2

* Revise llava-in-the-wild prompt for align

* Add default values for gen_kwargs in Llava class

* Fix formatting issues and import pdb for debugging

* Remove pdb.set_trace() and update default value for max_new_tokens

* Add llava loglikelihood

* Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py

* Update function to handle edge cases

This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code.

* Update black version in pre-commit config

* Remove duplicate lines in gqa

* Another way to solve memory issue

* Handle exception in model generation

* Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy"

* Update pope metrics aggregation functions

* Add model_to_prompt in pope.yaml

* Update pope.yaml configuration

* Refactor code to simplify construct_requests call

---------

Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>

* Add datetime to output name in cli_evaluate function

Add get_datetime_str function to utils.py

* Refactor pope_aggregate_f1_score function

* Fix datetime format in get_datetime_str function

* Update JSON dump indentation in cli_evaluate function

* Add datetime to output name in cli_evaluate function (#10)

* Revert "Add datetime to output name in cli_evaluate function"

This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3.

* Add datetime to output name in cli_evaluate function

* [Datasets] Added POPE and Aligned. (#11)

* Update generation_kwargs in pope.yaml

* Update pope_doc_to_text function

* [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12)

* Change coco from print to logger

* Add llava loglikelihood

* Add Nocaps support

* Fix pass through function

* Add textcaps support

* Fix textcaps eval image_id

* Add seedbench support

* Add seedbench ppl evaluation

* black lint

* [Datasets] Add four internal evaluation datasets (#13)

* Update generation_kwargs in pope.yaml

* Update pope_doc_to_text function

* Remove unused variable in mmvet_process_results function

* Remove unused imports in utils.py

* Refactor get_chat_response function to include retries for API requests

* Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

* Update prompt variable in lmms_eval tasks

* Refactor output_name variable in cli_evaluate function

* Fix logging message in mmvet_process_results function

* Update sleep time in get_chat_response function

* Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

* Refactor get_eval function to include retries

* Add token parameter to load_dataset function in gqa_doc_to_visual

* Refactor llava_process_results and llava_aggregation functions

* [Datasets] Add four internal evaluation datasets (#13)

* Update generation_kwargs in pope.yaml

* Update pope_doc_to_text function

* Remove unused variable in mmvet_process_results function

* Remove unused imports in utils.py

* Refactor get_chat_response function to include retries for API requests

* Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

* Update prompt variable in lmms_eval tasks

* Refactor output_name variable in cli_evaluate function

* Fix logging message in mmvet_process_results function

* Update sleep time in get_chat_response function

* Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

* Refactor get_eval function to include retries

* Add token parameter to load_dataset function in gqa_doc_to_visual

* Refactor llava_process_results and llava_aggregation functions

* add mmmu (#15)

* add mmme

* black

* add mmmu (#15)

* add mmme

* black

* [Memory issue] Solve memory issue for building context (#14)

* Update generation_kwargs in pope.yaml

* Update pope_doc_to_text function

* Remove unused variable in mmvet_process_results function

* Remove unused imports in utils.py

* Refactor get_chat_response function to include retries for API requests

* Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

* Update prompt variable in lmms_eval tasks

* Refactor output_name variable in cli_evaluate function

* Fix logging message in mmvet_process_results function

* Update sleep time in get_chat_response function

* Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

* Refactor get_eval function to include retries

* Add token parameter to load_dataset function in gqa_doc_to_visual

* Refactor llava_process_results and llava_aggregation functions

* Remove unused function llava_aggregation

* Refractor llava-bench aggregation code

* Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml

* Update generation parameters in scienceqa.yaml

* Solve memory issue for building context

* Solved gather result error

* Update lmms_eval scienceqa_img config

* Fixed nocaps store results

* Revise seedbench prompt

* Squashed commit of the following:

commit 290126e6a269db4cca9b3544bd017d6c17012793
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Wed Jan 24 14:07:36 2024 +0800

    add mmmu (#15)

    * add mmme

    * black

commit 8b0227cd7b2602d096d773a01b2199d1f4110f22
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 10:00:33 2024 +0800

    [Datasets] Add four internal evaluation datasets (#13)

    * Update generation_kwargs in pope.yaml

    * Update pope_doc_to_text function

    * Remove unused variable in mmvet_process_results function

    * Remove unused imports in utils.py

    * Refactor get_chat_response function to include retries for API requests

    * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

    * Update prompt variable in lmms_eval tasks

    * Refactor output_name variable in cli_evaluate function

    * Fix logging message in mmvet_process_results function

    * Update sleep time in get_chat_response function

    * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

    * Refactor get_eval function to include retries

    * Add token parameter to load_dataset function in gqa_doc_to_visual

    * Refactor llava_process_results and llava_aggregation functions

commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Jan 23 19:17:40 2024 +0800

    [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12)

    * Change coco from print to logger

    * Add llava loglikelihood

    * Add Nocaps support

    * Fix pass through function

    * Add textcaps support

    * Fix textcaps eval image_id

    * Add seedbench support

    * Add seedbench ppl evaluation

    * black lint

commit 4c3c2c63a681f29c537c2467957de1a90568748d
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Jan 23 19:17:12 2024 +0800

    [Datasets] Added POPE and Aligned. (#11)

    * Update generation_kwargs in pope.yaml

    * Update pope_doc_to_text function

---------

Co-authored-by: Bo Li <drluodian@gmail.com>

* [Memory issue] Solve memory issue for building context (#14)

* Update generation_kwargs in pope.yaml

* Update pope_doc_to_text function

* Remove unused variable in mmvet_process_results function

* Remove unused imports in utils.py

* Refactor get_chat_response function to include retries for API requests

* Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

* Update prompt variable in lmms_eval tasks

* Refactor output_name variable in cli_evaluate function

* Fix logging message in mmvet_process_results function

* Update sleep time in get_chat_response function

* Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

* Refactor get_eval function to include retries

* Add token parameter to load_dataset function in gqa_doc_to_visual

* Refactor llava_process_results and llava_aggregation functions

* Remove unused function llava_aggregation

* Refractor llava-bench aggregation code

* Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml

* Update generation parameters in scienceqa.yaml

* Solve memory issue for building context

* Solved gather result error

* Update lmms_eval scienceqa_img config

* Fixed nocaps store results

* Revise seedbench prompt

* Squashed commit of the following:

commit c3cc24a89415aeccad31ccbb10642af677cd6fe5
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Wed Jan 24 14:07:36 2024 +0800

    add mmmu (#15)

    * add mmme

    * black

commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 10:00:33 2024 +0800

    [Datasets] Add four internal evaluation datasets (#13)

    * Update generation_kwargs in pope.yaml

    * Update pope_doc_to_text function

    * Remove unused variable in mmvet_process_results function

    * Remove unused imports in utils.py

    * Refactor get_chat_response function to include retries for API requests

    * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function

    * Update prompt variable in lmms_eval tasks

    * Refactor output_name variable in cli_evaluate function

    * Fix logging message in mmvet_process_results function

    * Update sleep time in get_chat_response function

    * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f'

    * Refactor get_eval function to include retries

    * Add token parameter to load_dataset function in gqa_doc_to_visual

    * Refactor llava_process_results and llava_aggregation functions

commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Jan 23 19:17:40 2024 +0800

    [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12)

    * Change coco from print to logger

    * Add llava loglikelihood

    * Add Nocaps support

    * Fix pass through function

    * Add textcaps support

    * Fix textcaps eval image_id

    * Add seedbench support

    * Add seedbench ppl evaluation

    * black lint

commit 4c3c2c63a681f29c537c2467957de1a90568748d
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Jan 23 19:17:12 2024 +0800

    [Datasets] Added POPE and Aligned. (#11)

    * Update generation_kwargs in pope.yaml

    * Update pope_doc_to_text function

---------

Co-authored-by: Bo Li <drluodian@gmail.com>

* Add output path file naming convention (#16)

Update datetime format in get_datetime_str() function

* Add output path file naming convention (#16)

Update datetime format in get_datetime_str() function

* [Datasets] modify NoCaps data path and prompts (#17)

* Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

* Update dataset paths and improve user prompts

* [Datasets] modify NoCaps data path and prompts (#17)

* Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

* Update dataset paths and improve user prompts

* [Dataset] Add flickr30k (#18)

* Add flickr30k support

* Black lint

* Align prompt with NoCaps

* [Dataset] Add flickr30k (#18)

* Add flickr30k support

* Black lint

* Align prompt with NoCaps

* add model specific prompt and gen kwargs in sqa (#19)

* add mmme

* black

* add model specific prompt and gen kwargs

* black

* add yaml config to supprot multi-model eval

* print table at the end

* refactor multi model code

* add model specific prompt and gen kwargs in sqa (#19)

* add mmme

* black

* add model specific prompt and gen kwargs

* black

* add yaml config to supprot multi-model eval

* print table at the end

* refactor multi model code

* Dev/add chartqa and ai2d (#23)

* add mmme

* black

* add model specific prompt and gen kwargs

* black

* add yaml config to supprot multi-model eval

* print table at the end

* refactor multi model code

* add chartqa

* black

* add ai2d

* black

* update chartqa

* blacl

* update ai2d dataset

* black

* Add 'submissions/' directory to .gitignore

* Add Python setup and Black version installation workflow
Refactor ContextSampler class in samplers.py
Remove unnecessary line in DecontaminationFilter class
Update dependencies in pyproject.toml

* Refactor code in ContextSampler class

---------

Co-authored-by: Bo Li <drluodian@gmail.com>

* Dev/add chartqa and ai2d (#23)

* add mmme

* black

* add model specific prompt and gen kwargs

* black

* add yaml config to supprot multi-model eval

* print table at the end

* refactor multi model code

* add chartqa

* black

* add ai2d

* black

* update chartqa

* blacl

* update ai2d dataset

* black

* Add 'submissions/' directory to .gitignore

* Add Python setup and Black version installation workflow
Refactor ContextSampler class in samplers.py
Remove unnecessary line in DecontaminationFilter class
Update dependencies in pyproject.toml

* Refactor code in ContextSampler class

---------

Co-authored-by: Bo Li <drluodian@gmail.com>

* [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20)

* Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

* Update dataset paths and improve user prompts

* Add submission folder and update file paths for storing prediction results

* Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e'

* Update dataset_path in flickr30k.yaml

* Add coco_val and coco_test tasks to coco.yaml

* Squashed commit of the following:

commit 542a34dc5721ecdff6c5c68b0568692ad3a17149
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit 2626383d99b5eac59d531ca0f293df960570c524
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 8349935fe145e33af0007ad4fb0d71fd925be7a0
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit d4e8e2552d40752bfdc5bbf4cd962c1798096258
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit 4bf0504fabc3b62f356c467b2fd1119083d27313
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

    * Update dataset paths and improve user prompts

commit 520c7a2cafe60810aca79df814ce6829d4576032
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit 542a34dc5721ecdff6c5c68b0568692ad3a17149
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit 2626383d99b5eac59d531ca0f293df960570c524
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 8349935fe145e33af0007ad4fb0d71fd925be7a0
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit d4e8e2552d40752bfdc5bbf4cd962c1798096258
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit 520c7a2cafe60810aca79df814ce6829d4576032
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit b13a805623dfd9d826ddd440e1b5ecde773fbb12
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Fix cli itself can not run with config file

* Fix bug in login functionality

Refactor code for better performance

Add new feature for user authentication

Update UI layout for improved user experience

Fix typo in variable name

Optimize database queries for faster response time

Add error handling for edge cases

Update dependencies to latest versions

Remove unused code

Improve code readability and maintainability

* Refactor get_task_dict function to handle nested groups

* Add submission file for coco, flickr30k, nocaps, and textcaps tasks

* Remove unused files and update task configuration

* Fix tasks issue for nocaps, refcoco/+/g

* Fix file path and raise error if config file does not exist

* Exclude train in refcoco/+/g config

* Solve doc_iterator_for_counting crashing issue

* Black lint

* Refactor code to improve performance and readability

* Squashed commit of the following:

commit a2cc9303dc72e4d53983bb56e54a32e977c3e270
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:03:57 2024 +0800

    change okvqa yaml

commit 35e87e7c7a480d005abf607c2527a35457d92311
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:55:40 2024 +0800

    change yaml

commit 89755323596b85208ed33aa88c296604a39af6eb
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:42:43 2024 +0800

    add okvqa task

commit b13a805623dfd9d826ddd440e1b5ecde773fbb12
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Squashed commit of the following:

commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:06:02 2024 +0800

    change ocr reference

commit e273f9cbd91540df86bdbc652bff88a847bd0d2d
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:05:46 2024 +0800

    revert example_eval

commit e84126aaaf8a07bd371a0571a914ccbcd3697f20
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:17:28 2024 +0800

    edit vizwiz utils

commit 110deab53dc1a2fd349b1872cd261b69074c5fa8
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:49:47 2024 +0800

    reorganize __init__

commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:46:20 2024 +0800

    minor fixes

commit 2aaca579120def99860f90054233f3358950fa66
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 17:41:03 2024 +0800

    add vizwizvqa eval rask

commit b13a805623dfd9d826ddd440e1b5ecde773fbb12
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Refactor mathvista.yaml and utils.py

* Add gpt_eval_score to mathvista_process_results

* Refactor mathvista_aggregate_results to return average accuracy score

* Fix refcoco evaluation error

* Fix evaluation problem for refcoco+/g

* Refactor mathvista.yaml and mathvista_evals.py

* Add dependencies and update YAML files

* Refactor mmbench_en/utils.py to save test results to separate Excel file

* Fix caption task prompt

* Add group field to mmbench_en_test and mmbench_en_val yaml files

* Delete mmbench_en_val.yaml file

* Update mmbench_cn.yaml and mmbench_cn_test.yaml

* Update mmbench_cn_val.yaml and utils.py

* Remove unused fields in mmbench_cn_cc_process_results function

* Update aggregation function for mmbench_en_dev.yaml

* Fix capitalization of L2-category key in utils.py

* Fix variable name in mmbench_process_results function

* Delete mmbench_cn_val.yaml file

* Update mathvista_test.yaml and mathvista_testmini.yaml

* Fix warnings and update mathvista.yaml

* Remove system message from MathVistaEvaluator

* Update GPT model version in MathVistaEvaluator constructor

* Update GQA_RAW_IMAGE_DATASET path in utils.py

* change vizwiz to test set

* Add split flag to mathvista_aggregate_results function

* Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files

* Add download configuration for dataset

* Update GQA_RAW_IMAGE_DATASET path in utils.py

* add datasets

* Update gpt_eval_model_name in mathvista.yaml

* Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96'

* Update pyproject.toml with dependencies and URLs

* Squashed commit of the following:

commit f253968ad703f682a29317bdd51ec6c1fd7c5465
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Sat Jan 27 13:56:37 2024 +0800

    Dev/add chartqa and ai2d (#23)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

    * add chartqa

    * black

    * add ai2d

    * black

    * update chartqa

    * blacl

    * update ai2d dataset

    * black

    * Add 'submissions/' directory to .gitignore

    * Add Python setup and Black version installation workflow
    Refactor ContextSampler class in samplers.py
    Remove unnecessary line in DecontaminationFilter class
    Update dependencies in pyproject.toml

    * Refactor code in ContextSampler class

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>

* Refactor image processing and submission file path

* Refactor directory creation logic in cli_evaluate_single function

* Update dataset path and test split in vqav2.yaml

* Remove "total" column from cap_details_columns DataFrame

* Add retry logic for dataset download

* Add 'tenacity' to dependencies in pyproject.toml

* Refactor code in ContextSampler class

* Update Black version and configuration, and improve code readability in ContextSampler

* Update Black version and line length

---------

Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20)

* Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

* Update dataset paths and improve user prompts

* Add submission folder and update file paths for storing prediction results

* Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384'

* Update dataset_path in flickr30k.yaml

* Add coco_val and coco_test tasks to coco.yaml

* Squashed commit of the following:

commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit 63739fc6fa0a462d807ae81de0db0173102de584
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit edcc752f97ea3845cefad56624e5d2855066f680
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit ecb47d73d6e000b472be6c5c0cdc9413c7734384
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit dc23f4b42b1dd60b41904d7ddbee1412d6851077
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

    * Update dataset paths and improve user prompts

commit 5f55126484a7c9325db586d26cf2052538222804
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit aa6f8853cf82384fb3b15306fec4769212fbc5ab
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit 63739fc6fa0a462d807ae81de0db0173102de584
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit edcc752f97ea3845cefad56624e5d2855066f680
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit 5f55126484a7c9325db586d26cf2052538222804
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit aa6f8853cf82384fb3b15306fec4769212fbc5ab
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit 18e984cfe173390843c73048a931baa17800f918
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Fix cli itself can not run with config file

* Fix bug in login functionality

Refactor code for better performance

Add new feature for user authentication

Update UI layout for improved user experience

Fix typo in variable name

Optimize database queries for faster response time

Add error handling for edge cases

Update dependencies to latest versions

Remove unused code

Improve code readability and maintainability

* Refactor get_task_dict function to handle nested groups

* Add submission file for coco, flickr30k, nocaps, and textcaps tasks

* Remove unused files and update task configuration

* Fix tasks issue for nocaps, refcoco/+/g

* Fix file path and raise error if config file does not exist

* Exclude train in refcoco/+/g config

* Solve doc_iterator_for_counting crashing issue

* Black lint

* Refactor code to improve performance and readability

* Squashed commit of the following:

commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:03:57 2024 +0800

    change okvqa yaml

commit b9d9f9896993033b92346e9f47420c55b866c715
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:55:40 2024 +0800

    change yaml

commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:42:43 2024 +0800

    add okvqa task

commit 18e984cfe173390843c73048a931baa17800f918
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Squashed commit of the following:

commit 0c8a3919885b8fe2880bb2892f7a619d060012d1
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:06:02 2024 +0800

    change ocr reference

commit d2bc7c92ac61179b8c4031e11bc31970355252f6
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:05:46 2024 +0800

    revert example_eval

commit c78fa29cd0d161641ee05db57bd39314b998c8c7
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:17:28 2024 +0800

    edit vizwiz utils

commit 397f0906968fd8ba04b883469b96217737c43e09
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:49:47 2024 +0800

    reorganize __init__

commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:46:20 2024 +0800

    minor fixes

commit f706b2aaf9b288c582611191a1841b58feaeb741
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 17:41:03 2024 +0800

    add vizwizvqa eval rask

commit 18e984cfe173390843c73048a931baa17800f918
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Refactor mathvista.yaml and utils.py

* Add gpt_eval_score to mathvista_process_results

* Refactor mathvista_aggregate_results to return average accuracy score

* Fix refcoco evaluation error

* Fix evaluation problem for refcoco+/g

* Refactor mathvista.yaml and mathvista_evals.py

* Add dependencies and update YAML files

* Refactor mmbench_en/utils.py to save test results to separate Excel file

* Fix caption task prompt

* Add group field to mmbench_en_test and mmbench_en_val yaml files

* Delete mmbench_en_val.yaml file

* Update mmbench_cn.yaml and mmbench_cn_test.yaml

* Update mmbench_cn_val.yaml and utils.py

* Remove unused fields in mmbench_cn_cc_process_results function

* Update aggregation function for mmbench_en_dev.yaml

* Fix capitalization of L2-category key in utils.py

* Fix variable name in mmbench_process_results function

* Delete mmbench_cn_val.yaml file

* Update mathvista_test.yaml and mathvista_testmini.yaml

* Fix warnings and update mathvista.yaml

* Remove system message from MathVistaEvaluator

* Update GPT model version in MathVistaEvaluator constructor

* Update GQA_RAW_IMAGE_DATASET path in utils.py

* change vizwiz to test set

* Add split flag to mathvista_aggregate_results function

* Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files

* Add download configuration for dataset

* Update GQA_RAW_IMAGE_DATASET path in utils.py

* add datasets

* Update gpt_eval_model_name in mathvista.yaml

* Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c'

* Update pyproject.toml with dependencies and URLs

* Squashed commit of the following:

commit 8b600f55b6cf5627504c407871539db59f6085a3
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Sat Jan 27 13:56:37 2024 +0800

    Dev/add chartqa and ai2d (#23)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

    * add chartqa

    * black

    * add ai2d

    * black

    * update chartqa

    * blacl

    * update ai2d dataset

    * black

    * Add 'submissions/' directory to .gitignore

    * Add Python setup and Black version installation workflow
    Refactor ContextSampler class in samplers.py
    Remove unnecessary line in DecontaminationFilter class
    Update dependencies in pyproject.toml

    * Refactor code in ContextSampler class

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>

* Refactor image processing and submission file path

* Refactor directory creation logic in cli_evaluate_single function

* Update dataset path and test split in vqav2.yaml

* Remove "total" column from cap_details_columns DataFrame

* Add retry logic for dataset download

* Add 'tenacity' to dependencies in pyproject.toml

* Refactor code in ContextSampler class

* Update Black version and configuration, and improve code readability in ContextSampler

* Update Black version and line length

---------

Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* vqav2 (#25)

* Update tqdm progress bar position

* Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e'

* Squashed commit of the following:

commit b13a805623dfd9d826ddd440e1b5ecde773fbb12
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit 4bf0504fabc3b62f356c467b2fd1119083d27313
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

    * Update dataset paths and improve user prompts

commit 767f7e2cae60cf67ec5878234d84321395a3ed15
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 19:51:34 2024 +0800

    Add output path file naming convention (#16)

    Update datetime format in get_datetime_str() function

* remove useless output file

* Update dataset path in vqav2.yaml

* Squashed commit of the following:

commit eeb2b9827502f044ef67d8440f53124baf219ba3
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:56:45 2024 +0800

    Black lint

commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:55:47 2024 +0800

    Solve doc_iterator_for_counting crashing issue

commit e12b3bb41ed4f51540cfac84e5e96d15777540c4
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:55:13 2024 +0800

    Exclude train in refcoco/+/g config

commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4
Merge: 6a1ae69 697a438
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 17:17:13 2024 +0000

    Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets

commit aed08303fe87808986d206540a0c0ee6d8764988
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 17:17:06 2024 +0000

    Fix file path and raise error if config file does not exist

commit a105386613c443d9e740c89725cbd1281bbdfef6
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 00:47:24 2024 +0800

    Fix tasks issue for nocaps, refcoco/+/g

commit 21c8119e377760f44c769bed2528d863a8f4333b
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 10:09:43 2024 +0000

    Remove unused files and update task configuration

commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:43:56 2024 +0000

    Add submission file for coco, flickr30k, nocaps, and textcaps tasks

commit 5365e13e93c702a1e0e259ee6a08d6a427d72470
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:32:54 2024 +0000

    Refactor get_task_dict function to handle nested groups

commit 6773348c807bcfa1b09ceffc90c75e15cad908f7
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:13:46 2024 +0000

    Fix bug in login functionality

    Refactor code for better performance

    Add new feature for user authentication

    Update UI layout for improved user experience

    Fix typo in variable name

    Optimize database queries for faster response time

    Add error handling for edge cases

    Update dependencies to latest versions

    Remove unused code

    Improve code readability and maintainability

commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 17:07:20 2024 +0800

    Fix cli itself can not run with config file

commit df1bad47f6ed13f94848d2bee29b28e00c2384b2
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:09:04 2024 +0000

    Squashed commit of the following:

    commit b13a805623dfd9d826ddd440e1b5ecde773fbb12
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Thu Jan 25 17:08:25 2024 +0800

        add model specific prompt and gen kwargs in sqa (#19)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:02:57 2024 +0000

    Squashed commit of the following:

    commit 542a34dc5721ecdff6c5c68b0568692ad3a17149
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:59:12 2024 +0800

        refactor multi model code

    commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:51:16 2024 +0800

        print table at the end

    commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:20:59 2024 +0800

        add yaml config to supprot multi-model eval

    commit 2626383d99b5eac59d531ca0f293df960570c524
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:39:42 2024 +0800

        black

    commit 8349935fe145e33af0007ad4fb0d71fd925be7a0
    Merge: 7e8b57d 1d3fdd4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:37:57 2024 +0800

        resolve conflicts in sqa

    commit d4e8e2552d40752bfdc5bbf4cd962c1798096258
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:36:46 2024 +0800

        add model specific prompt and gen kwargs

    commit 520c7a2cafe60810aca79df814ce6829d4576032
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:56:51 2024 +0800

        black

    commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:55:43 2024 +0800

        add mmme

commit 7a71fd6022ee5985100dda38b94956595cec77a5
Merge: 22c3adf 1d3fdd4
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:43:15 2024 +0000

    Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets

commit 6870cba13cb54976480c1d5e8d97602c246f881b
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:38:52 2024 +0000

    Squashed commit of the following:

    commit 542a34dc5721ecdff6c5c68b0568692ad3a17149
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:59:12 2024 +0800

        refactor multi model code

    commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:51:16 2024 +0800

        print table at the end

    commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:20:59 2024 +0800

        add yaml config to supprot multi-model eval

    commit 2626383d99b5eac59d531ca0f293df960570c524
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:39:42 2024 +0800

        black

    commit 8349935fe145e33af0007ad4fb0d71fd925be7a0
    Merge: 7e8b57d 1d3fdd4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:37:57 2024 +0800

        resolve conflicts in sqa

    commit d4e8e2552d40752bfdc5bbf4cd962c1798096258
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:36:46 2024 +0800

        add model specific prompt and gen kwargs

    commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Thu Jan 25 09:47:31 2024 +0800

        [Dataset] Add flickr30k (#18)

        * Add flickr30k support

        * Black lint

        * Align prompt with NoCaps

    commit 4bf0504fabc3b62f356c467b2fd1119083d27313
    Author: Li Bo <drluodian@gmail.com>
    Date:   Wed Jan 24 22:10:14 2024 +0800

        [Datasets] modify NoCaps data path and prompts (#17)

        * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

        * Update dataset paths and improve user prompts

    commit 520c7a2cafe60810aca79df814ce6829d4576032
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:56:51 2024 +0800

        black

    commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:55:43 2024 +0800

        add mmme

commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:38:11 2024 +0000

    Add coco_val and coco_test tasks to coco.yaml

commit 5bf643f73d06f1e540897b753450352bb92fd9ec
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 04:58:28 2024 +0000

    Update dataset_path in flickr30k.yaml

commit 95f110f0eef5196205bc501367e3642c57cc7a17
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 02:12:25 2024 +0000

    Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e'

commit c844ae49b18c1334711832208b0359c9439fe1c0
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 02:10:18 2024 +0000

    Add submission folder and update file paths for storing prediction results

commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit 4bf0504fabc3b62f356c467b2fd1119083d27313
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

    * Update dataset paths and improve user prompts

commit f0446227f0dd93651e9d6c06254bbf5212ede2dd
Merge: c6370bf 51f2eaa
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:07 2024 +0800

    Merge branch 'main' into dev/bli_add_datasets

commit 1e1f6cfccba758dc606fa4217102518fab73c936
Author: Bo Li <drluodian@gmail.com>
Date:   Wed Jan 24 14:08:06 2024 +0000

    Update dataset paths and improve user prompts

commit 966933754b9e5179995b3ab41d746603e13e75c6
Author: Bo Li <drluodian@gmail.com>
Date:   Wed Jan 24 11:52:33 2024 +0000

    Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

commit 767f7e2cae60cf67ec5878234d84321395a3ed15
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 19:51:34 2024 +0800

    Add output path file naming convention (#16)

    Update datetime format in get_datetime_str() function

* Fix bug in login functionality

* create vqav2_val

* Update vqav2_test.yaml

* Update vqav2_test.yaml

* Update vqav2_val.yaml

---------

Co-authored-by: Li Bo <drluodian@gmail.com>

* vqav2 (#25)

* Update tqdm progress bar position

* Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384'

* Squashed commit of the following:

commit 18e984cfe173390843c73048a931baa17800f918
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

commit ecb47d73d6e000b472be6c5c0cdc9413c7734384
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit dc23f4b42b1dd60b41904d7ddbee1412d6851077
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

    * Update dataset paths and improve user prompts

commit 95ef3ea519cbd772924f9a6afa5394979eb00432
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 19:51:34 2024 +0800

    Add output path file naming convention (#16)

    Update datetime format in get_datetime_str() function

* remove useless output file

* Update dataset path in vqav2.yaml

* Squashed commit of the following:

commit 75bb7043ea5a533ab6351fc0f5ab055e86106423
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:56:45 2024 +0800

    Black lint

commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:55:47 2024 +0800

    Solve doc_iterator_for_counting crashing issue

commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 09:55:13 2024 +0800

    Exclude train in refcoco/+/g config

commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730
Merge: 6a1ae69 697a438
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 17:17:13 2024 +0000

    Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets

commit 6a1ae69923d79ae32a001edac38206b605274ec3
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 17:17:06 2024 +0000

    Fix file path and raise error if config file does not exist

commit 697a4387827ceeec3e393237dd1baa217c714c88
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Fri Jan 26 00:47:24 2024 +0800

    Fix tasks issue for nocaps, refcoco/+/g

commit 47e40437126d39a5f062c9a33b4de426c1a29804
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 10:09:43 2024 +0000

    Remove unused files and update task configuration

commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:43:56 2024 +0000

    Add submission file for coco, flickr30k, nocaps, and textcaps tasks

commit 95f97a69faa6129676e89eee14960fcfe2076b7c
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:32:54 2024 +0000

    Refactor get_task_dict function to handle nested groups

commit 3b79ee842b2488714baf92ab34528ef77989d392
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:13:46 2024 +0000

    Fix bug in login functionality

    Refactor code for better performance

    Add new feature for user authentication

    Update UI layout for improved user experience

    Fix typo in variable name

    Optimize database queries for faster response time

    Add error handling for edge cases

    Update dependencies to latest versions

    Remove unused code

    Improve code readability and maintainability

commit f5c353f2ce93a2d96add4312b695b57432f68cbb
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 17:07:20 2024 +0800

    Fix cli itself can not run with config file

commit 9a68fec37be74cfe8d4a73390bc83edee147ae24
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:09:04 2024 +0000

    Squashed commit of the following:

    commit 18e984cfe173390843c73048a931baa17800f918
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Thu Jan 25 17:08:25 2024 +0800

        add model specific prompt and gen kwargs in sqa (#19)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

commit 93f847c5851fd246716367935d6b807b17d53949
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 09:02:57 2024 +0000

    Squashed commit of the following:

    commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:59:12 2024 +0800

        refactor multi model code

    commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:51:16 2024 +0800

        print table at the end

    commit 63739fc6fa0a462d807ae81de0db0173102de584
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:20:59 2024 +0800

        add yaml config to supprot multi-model eval

    commit edcc752f97ea3845cefad56624e5d2855066f680
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:39:42 2024 +0800

        black

    commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
    Merge: 7e8b57d 1d3fdd4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:37:57 2024 +0800

        resolve conflicts in sqa

    commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:36:46 2024 +0800

        add model specific prompt and gen kwargs

    commit 5f55126484a7c9325db586d26cf2052538222804
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:56:51 2024 +0800

        black

    commit aa6f8853cf82384fb3b15306fec4769212fbc5ab
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:55:43 2024 +0800

        add mmme

commit fa4ad4404e26d8924f55208746dbb9143b464011
Merge: 22c3adf 1d3fdd4
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:43:15 2024 +0000

    Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets

commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:38:52 2024 +0000

    Squashed commit of the following:

    commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:59:12 2024 +0800

        refactor multi model code

    commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:51:16 2024 +0800

        print table at the end

    commit 63739fc6fa0a462d807ae81de0db0173102de584
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 11:20:59 2024 +0800

        add yaml config to supprot multi-model eval

    commit edcc752f97ea3845cefad56624e5d2855066f680
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:39:42 2024 +0800

        black

    commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
    Merge: 7e8b57d 1d3fdd4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:37:57 2024 +0800

        resolve conflicts in sqa

    commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Thu Jan 25 10:36:46 2024 +0800

        add model specific prompt and gen kwargs

    commit ecb47d73d6e000b472be6c5c0cdc9413c7734384
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Thu Jan 25 09:47:31 2024 +0800

        [Dataset] Add flickr30k (#18)

        * Add flickr30k support

        * Black lint

        * Align prompt with NoCaps

    commit dc23f4b42b1dd60b41904d7ddbee1412d6851077
    Author: Li Bo <drluodian@gmail.com>
    Date:   Wed Jan 24 22:10:14 2024 +0800

        [Datasets] modify NoCaps data path and prompts (#17)

        * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

        * Update dataset paths and improve user prompts

    commit 5f55126484a7c9325db586d26cf2052538222804
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:56:51 2024 +0800

        black

    commit aa6f8853cf82384fb3b15306fec4769212fbc5ab
    Author: jzhang38 <a1286225768@gmail.com>
    Date:   Wed Jan 24 13:55:43 2024 +0800

        add mmme

commit 4c712336b6f7438e717a865910bb241e413a4688
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 08:38:11 2024 +0000

    Add coco_val and coco_test tasks to coco.yaml

commit b5547126c855927fd4dc8384211e4aceee40870f
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 04:58:28 2024 +0000

    Update dataset_path in flickr30k.yaml

commit f786f61e2559f082072f21aa9030e2080ddaf809
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 02:12:25 2024 +0000

    Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384'

commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8
Author: Bo Li <drluodian@gmail.com>
Date:   Thu Jan 25 02:10:18 2024 +0000

    Add submission folder and update file paths for storing prediction results

commit ecb47d73d6e000b472be6c5c0cdc9413c7734384
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit dc23f4b42b1dd60b41904d7ddbee1412d6851077
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (#17)

    * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

    * Update dataset paths and improve user prompts

commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59
Merge: c6370bf 51f2eaa
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:07 2024 +0800

    Merge branch 'main' into dev/bli_add_datasets

commit c6370bff65903681f00cf3d07111d8e15a57b619
Author: Bo Li <drluodian@gmail.com>
Date:   Wed Jan 24 14:08:06 2024 +0000

    Update dataset paths and improve user prompts

commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002
Author: Bo Li <drluodian@gmail.com>
Date:   Wed Jan 24 11:52:33 2024 +0000

    Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432'

commit 95ef3ea519cbd772924f9a6afa5394979eb00432
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 19:51:34 2024 +0800

    Add output path file naming convention (#16)

    Update datetime format in get_datetime_str() function

* Fix bug in login functionality

* create vqav2_val

* Update vqav2_test.yaml

* Update vqav2_test.yaml

* Update vqav2_val.yaml

---------

Co-authored-by: Li Bo <drluodian@gmail.com>

* vizwiz dataset (#24)

* Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15'

* Update dataset paths and improve user prompts

* Add submission folder and update file paths for storing prediction results

* Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e'

* Update dataset_path in flickr30k.yaml

* Add coco_val and coco_test tasks to coco.yaml

* Squashed commit of the following:

commit 542a34dc5721ecdff6c5c68b0568692ad3a17149
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit 2626383d99b5eac59d531ca0f293df960570c524
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 8349935fe145e33af0007ad4fb0d71fd925be7a0
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit d4e8e2552d407…
@Luodian Luodian added the question Further information is requested label Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants