Add m3exam #93

Jiawei-Guo · 2024-05-27T16:38:29Z

Add m3exam

add eight multi-lingual MMMU tasks

kcz358 · 2024-05-30T05:40:26Z

lmms_eval/tasks/m3exam/utils.py

+# (lang, method, setting, model, test_question, dev_question):
+    lang = doc["language"]
+    method = "default"
+    setting = "zero-shot"


Seems like the setting has been fixed into zero-shot. But I see that you write a lot of if-else statement evaluating the setting is zero-shot or not, is it true that there will be more settings in the future?

kcz358

Thank you for your contribution!

I think there are too many duplicated utils file. Is it possible to move different mmmu into the same folder and use a single utils? Or at least remove the repeat code. There is no need for creating different folders for mmmu_English, mmmu_French etc. Can you try to put it under the same folder and maybe a yaml file that allow user to run all the task using a group

* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d407…

Junpliu and others added 3 commits May 27, 2024 14:53

add multi-lingual MMMU

390c393

Merge pull request #1 from MM-Pod/multilingual_mmmu

0b3bcb3

add eight multi-lingual MMMU tasks

add_m3exam

f3b6fc2

kcz358 reviewed May 30, 2024

View reviewed changes

kcz358 reviewed May 31, 2024

View reviewed changes

Luodian added the question Further information is requested label Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add m3exam #93

Add m3exam #93

Jiawei-Guo commented May 27, 2024

kcz358 May 30, 2024

kcz358 left a comment •

edited

Loading

Add m3exam #93

Are you sure you want to change the base?

Add m3exam #93

Conversation

Jiawei-Guo commented May 27, 2024

kcz358 May 30, 2024

Choose a reason for hiding this comment

kcz358 left a comment • edited Loading

Choose a reason for hiding this comment

kcz358 left a comment •

edited

Loading