GitHub - yz1019117968/FSE-24-UniTrans: Source Code for "Exploring and Unleashing the Power of Large Language Models in Automated Code Translation"

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Preparation

jdk17
javafx-sdk-20 refer to https://openjfx.io/openjfx-docs/#introduction
Stack BackTrace for C++: https://github.com/NEWPLAN/newplan_toolkit/backtrace
EMMA Coverage Tool: https://emma.sourceforge.net/
transformers == 4.30.2
torch == 1.12.1

Attachments

Please find the data noise breakdown here:
Please find the tmp.java file here: tmp.pdf.
Please find the statistical test results here: statistical test.pdf.
Please find the OJ experimental results in the folder oj_samples, which is reported in the threats to validity section.

Cleaned Dataset

./cleaned_data/testable_samples.jsonl: cleaned dataset used in this work, including parallel functions of Java, Python, and C++.
./cleaned_data/transcoder_evaluation_gfg: test cases associated with the cleaned dataset.

Quick Start

Test Case Generation Phase

generate inputs with LLMs (taking GPT3.5 as an example)

  python gpt3_5.py --dst_lang ${dst_lang} --obj 0 --k ${test_case_num} --k ${sample_k}

collect test cases

  python process_valid_inputs.py --model ${model_name} --dst_lang ${dst_lang}

Translation Augmentation Phase

translation augmentation (taking GPT3.5 as an example)

  python gpt3_5.py --src_lang ${src_lang} --dst_lang ${dst_lang} --obj 3 --k ${sample_k} --test_case_num ${test_case_num}

post-process translated programs.

  python process_translation.py --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}

translation evaluation

  python fetch_feedbacks.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --test_case_num ${test_case_num} round ${round}

Translation Repair Phase

error info analysis

  python process_feedbacks.py --src_lang ${src_lang} --dst_lang ${dst_lang} --round ${round} --test_case_num ${test_case_num}

program repair

  python gpt3_5.py --src_lang ${src_lang} --dst_lang ${dst_lang} --obj 4 --k ${sample_k} --test_case_num ${test_case_num}

post-process repaired programs.

 python process_translation.py --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}

Evaluation

evaluation for computational accuracy

  python evaluation_CA.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --k ${CA@k} --timeout ${timeout} --suffix ${suffix}

evaluation for exact match accuracy

  python evaluation_EM.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Preparation

Attachments

Cleaned Dataset

Quick Start

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
cleaned_data		cleaned_data
figures		figures
oj_samples		oj_samples
scripts		scripts
README.md		README.md
dataset2.py		dataset2.py
evaluation_CA.py		evaluation_CA.py
evaluation_EM.py		evaluation_EM.py
fetch_feedbacks.py		fetch_feedbacks.py
gpt3_5.py		gpt3_5.py
open_source_llm2.py		open_source_llm2.py
process_feedbacks.py		process_feedbacks.py
process_translation.py		process_translation.py
process_valid_inputs.py		process_valid_inputs.py

yz1019117968/FSE-24-UniTrans

Folders and files

Latest commit

History

Repository files navigation

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Preparation

Attachments

Cleaned Dataset

Quick Start

About

Resources

Stars

Watchers

Forks

Languages