-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some modification to support torchbench-cpu well #795
Conversation
export OMP_NUM_THREADS=1 GOMP_CPU_AFFINITY=4 | ||
else | ||
# eg: bash test_torch_bench.sh cpu tiny 32 "0-63:2" | ||
export OMP_NUM_THREADS=$3 GOMP_CPU_AFFINITY=$4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not enough if only limiting OMP threads since the framework (e.g. torch/TF) may have its own thread pool as well. maybe be use taskset
as instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
.github/workflows/torchbench.yml
Outdated
secrets: inherit | ||
TorchBenchCpuFull8Threads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to encode CPU type into the name as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now cpu means x86 only, aarch64 has itw own name
results=(eval-cpu-fp32) | ||
else | ||
results=(eval-cuda-fp32 eval-cuda-fp16) | ||
fi | ||
python3 torchbenchmark/onnxrt_helper.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that latest ORT may use its own thread pool instead of ORT, we may need extra thread related setting for it. This can be leaved for future improvement.
302c920
to
51607d1
Compare
|
||
config_file=blade_$1_$2.yaml | ||
bench_target=$2 | ||
binding_cores=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 0
as default is better in case there is only 1 core used for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
export GOMP_CPU_AFFINITY="2-5" | ||
if [ ! -n "$3" ] | ||
then | ||
export OMP_NUM_THREADS=1 GOMP_CPU_AFFINITY=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using different default value for GOMP_CPU_AFFINITY
and taskset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the origin code using in this way. Now I using the same value
51607d1
to
107ca4a
Compare
107ca4a
to
8033c02
Compare
|
||
if [ $1 == "cpu" ] | ||
then | ||
# 4 cores | ||
export GOMP_CPU_AFFINITY="2-5" | ||
if [ ! -n "$3" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ! -n
is identical to -z
results=(eval-cpu-fp32) | ||
taskset -c $bingding_cores python3 torchbenchmark/.github/scripts/run-config.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelled bingding_cores
...
8033c02
to
22db73e
Compare
Manually trigger this workflow in https://github.com/alibaba/BladeDISC/actions/workflows/torchbench.yml ? |
50d7966
to
d82a075
Compare
.github/workflows/torchbench.yml
Outdated
base_image: bladedisc/bladedisc:latest-runtime-torch1.12.0-cpu | ||
device: cpu_benchmark | ||
dockerfile: docker/cronjobs/Dockerfile.torch.bench | ||
extra_envs: -e RELATED_DIFF_PERCENT=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff has changed to 5%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
.github/workflows/torchbench.yml
Outdated
dockerfile: docker/cronjobs/Dockerfile.torch.bench | ||
extra_envs: -e RELATED_DIFF_PERCENT=3 | ||
exec_command: bash ./pytorch_blade/benchmark/TorchBench/test_torch_bench.sh cpu full 8 "0-7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can loop for different configs in one job, not create jobs for each config.
22aa842
to
b85c8d7
Compare
.github/workflows/torchbench.yml
Outdated
name: torch-offcial-benchmark | ||
base_image: bladedisc/bladedisc:latest-runtime-torch1.12.0-cpu-aarch64 | ||
device: cpu_benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since not use extra machine for benchmark, we can directly use aarch64
/ cpu
for aarch64 / cpu benchmark now.
0993257
to
829f0ba
Compare
@@ -21,6 +24,9 @@ fi | |||
# for CI git-lfs permission problems | |||
pushd $benchmark_repo_dir | |||
# cache venv in benchmark dir | |||
if [ $1 == "aarch64" ]; then | |||
rm -rf ./venv && cp -r /opt/venv_disc ./venv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not remove this folder to cache venv
829f0ba
to
c085e5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.