Skip to content

Source Code for "Exploring and Unleashing the Power of Large Language Models in Automated Code Translation"

Notifications You must be signed in to change notification settings

yz1019117968/FSE-24-UniTrans

Repository files navigation

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Preparation

jdk17
javafx-sdk-20 refer to https://openjfx.io/openjfx-docs/#introduction
Stack BackTrace for C++: https://github.com/NEWPLAN/newplan_toolkit/backtrace
EMMA Coverage Tool: https://emma.sourceforge.net/
transformers == 4.30.2
torch == 1.12.1

Attachments

  • Please find the data noise breakdown here: img.png
  • Please find the tmp.java file here: tmp.pdf.
  • Please find the statistical test results here: statistical test.pdf.
  • Please find the OJ experimental results in the folder oj_samples, which is reported in the threats to validity section.

Cleaned Dataset

  • ./cleaned_data/testable_samples.jsonl: cleaned dataset used in this work, including parallel functions of Java, Python, and C++.
  • ./cleaned_data/transcoder_evaluation_gfg: test cases associated with the cleaned dataset.

Quick Start

  • Test Case Generation Phase

    1. generate inputs with LLMs (taking GPT3.5 as an example)
      python gpt3_5.py --dst_lang ${dst_lang} --obj 0 --k ${test_case_num} --k ${sample_k}
    
    1. collect test cases
      python process_valid_inputs.py --model ${model_name} --dst_lang ${dst_lang}
    
  • Translation Augmentation Phase

    1. translation augmentation (taking GPT3.5 as an example)
      python gpt3_5.py --src_lang ${src_lang} --dst_lang ${dst_lang} --obj 3 --k ${sample_k} --test_case_num ${test_case_num}  
    
    1. post-process translated programs.
      python process_translation.py --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}
    
    1. translation evaluation
      python fetch_feedbacks.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --test_case_num ${test_case_num} round ${round}
    
  • Translation Repair Phase

    1. error info analysis
      python process_feedbacks.py --src_lang ${src_lang} --dst_lang ${dst_lang} --round ${round} --test_case_num ${test_case_num}  
    
    1. program repair
      python gpt3_5.py --src_lang ${src_lang} --dst_lang ${dst_lang} --obj 4 --k ${sample_k} --test_case_num ${test_case_num} 
    
    1. post-process repaired programs.
     python process_translation.py --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}
    
  • Evaluation

    1. evaluation for computational accuracy
      python evaluation_CA.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --k ${CA@k} --timeout ${timeout} --suffix ${suffix}
    
    1. evaluation for exact match accuracy
      python evaluation_EM.py --model ${model_name} --src_lang ${src_lang} --dst_lang ${dst_lang} --suffix ${suffix}
    

About

Source Code for "Exploring and Unleashing the Power of Large Language Models in Automated Code Translation"

Resources

Stars

Watchers

Forks