llama : remove MPI backend #7395

slaren · 2024-05-19T17:45:05Z

Removes the MPI backend in favor of the RPC backend. The MPI backend has not been functional for a very long time, and its functionality can be obtained with the RPC backend.

For more details about the RPC backend, check #6829 and https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc.

github-actions · 2024-05-19T23:47:14Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 548 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8532.78ms p(95)=20422.01ms fails=, finish reason: stop=496 truncated=52
Prompt processing (pp): avg=91.33tk/s p(95)=333.81tk/s
Token generation (tg): avg=34.4tk/s p(95)=46.13tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sl/remove-mpi commit=78cded5394c2ada8f119ce75d674b937372e7dd1

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716161804 --> 1716162428
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 542.56, 542.56, 542.56, 542.56, 542.56, 754.17, 754.17, 754.17, 754.17, 754.17, 722.31, 722.31, 722.31, 722.31, 722.31, 751.1, 751.1, 751.1, 751.1, 751.1, 823.46, 823.46, 823.46, 823.46, 823.46, 816.0, 816.0, 816.0, 816.0, 816.0, 828.16, 828.16, 828.16, 828.16, 828.16, 845.95, 845.95, 845.95, 845.95, 845.95, 843.38, 843.38, 843.38, 843.38, 843.38, 830.18, 830.18, 830.18, 830.18, 830.18, 847.61, 847.61, 847.61, 847.61, 847.61, 843.31, 843.31, 843.31, 843.31, 843.31, 839.65, 839.65, 839.65, 839.65, 839.65, 846.18, 846.18, 846.18, 846.18, 846.18, 825.1, 825.1, 825.1, 825.1, 825.1, 838.5, 838.5, 838.5, 838.5, 838.5, 836.69, 836.69, 836.69, 836.69, 836.69, 844.15, 844.15, 844.15, 844.15, 844.15, 856.78, 856.78, 856.78, 856.78, 856.78, 858.95, 858.95, 858.95, 858.95, 858.95, 858.17, 858.17, 858.17, 858.17, 858.17, 865.97, 865.97, 865.97, 865.97, 865.97, 868.19, 868.19, 868.19, 868.19, 868.19, 864.5, 864.5, 864.5, 864.5, 864.5, 863.72, 863.72, 863.72, 863.72, 863.72, 865.37, 865.37, 865.37, 865.37, 865.37, 837.68, 837.68, 837.68, 837.68, 837.68, 837.55, 837.55, 837.55, 837.55, 837.55, 837.98, 837.98, 837.98, 837.98, 837.98, 843.5, 843.5, 843.5, 843.5, 843.5, 841.41, 841.41, 841.41, 841.41, 841.41, 842.51, 842.51, 842.51, 842.51, 842.51, 848.82, 848.82, 848.82, 848.82, 848.82, 836.74, 836.74, 836.74, 836.74, 836.74, 836.6, 836.6, 836.6, 836.6, 836.6, 833.12, 833.12, 833.12, 833.12, 833.12, 832.22, 832.22, 832.22, 832.22, 832.22, 831.27, 831.27, 831.27, 831.27, 831.27, 833.12, 833.12, 833.12, 833.12, 833.12, 835.9, 835.9, 835.9, 835.9, 835.9, 836.63, 836.63, 836.63, 836.63, 836.63, 841.22, 841.22, 841.22, 841.22, 841.22, 846.95, 846.95, 846.95, 846.95, 846.95, 845.48, 845.48, 845.48, 845.48, 845.48, 844.24, 844.24, 844.24, 844.24, 844.24, 844.3, 844.3, 844.3, 844.3, 844.3, 848.61, 848.61, 848.61, 848.61, 848.61, 847.44, 847.44, 847.44, 847.44, 847.44, 850.35, 850.35, 850.35, 850.35, 850.35, 851.69, 851.69, 851.69, 851.69, 851.69, 854.04, 854.04, 854.04, 854.04, 854.04, 857.27, 857.27, 857.27, 857.27, 857.27, 852.77, 852.77, 852.77, 852.77, 852.77, 854.27, 854.27, 854.27, 854.27, 854.27, 854.47, 854.47, 854.47, 854.47, 854.47, 854.09, 854.09, 854.09, 854.09, 854.09, 854.05, 854.05, 854.05, 854.05, 854.05, 853.99, 853.99, 853.99, 853.99, 853.99, 855.98, 855.98, 855.98, 855.98, 855.98, 858.93, 858.93, 858.93, 858.93, 858.93, 858.62, 858.62, 858.62]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716161804 --> 1716162428
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.52, 40.52, 40.52, 40.52, 40.52, 37.27, 37.27, 37.27, 37.27, 37.27, 28.86, 28.86, 28.86, 28.86, 28.86, 31.29, 31.29, 31.29, 31.29, 31.29, 31.76, 31.76, 31.76, 31.76, 31.76, 32.54, 32.54, 32.54, 32.54, 32.54, 34.01, 34.01, 34.01, 34.01, 34.01, 34.56, 34.56, 34.56, 34.56, 34.56, 34.37, 34.37, 34.37, 34.37, 34.37, 34.3, 34.3, 34.3, 34.3, 34.3, 34.37, 34.37, 34.37, 34.37, 34.37, 34.23, 34.23, 34.23, 34.23, 34.23, 32.96, 32.96, 32.96, 32.96, 32.96, 32.59, 32.59, 32.59, 32.59, 32.59, 31.98, 31.98, 31.98, 31.98, 31.98, 31.5, 31.5, 31.5, 31.5, 31.5, 31.04, 31.04, 31.04, 31.04, 31.04, 31.17, 31.17, 31.17, 31.17, 31.17, 30.99, 30.99, 30.99, 30.99, 30.99, 30.86, 30.86, 30.86, 30.86, 30.86, 30.83, 30.83, 30.83, 30.83, 30.83, 30.84, 30.84, 30.84, 30.84, 30.84, 31.06, 31.06, 31.06, 31.06, 31.06, 31.06, 31.06, 31.06, 31.06, 31.06, 30.94, 30.94, 30.94, 30.94, 30.94, 31.01, 31.01, 31.01, 31.01, 31.01, 31.25, 31.25, 31.25, 31.25, 31.25, 31.28, 31.28, 31.28, 31.28, 31.28, 31.54, 31.54, 31.54, 31.54, 31.54, 31.72, 31.72, 31.72, 31.72, 31.72, 31.78, 31.78, 31.78, 31.78, 31.78, 32.01, 32.01, 32.01, 32.01, 32.01, 32.15, 32.15, 32.15, 32.15, 32.15, 31.9, 31.9, 31.9, 31.9, 31.9, 31.78, 31.78, 31.78, 31.78, 31.78, 31.75, 31.75, 31.75, 31.75, 31.75, 31.39, 31.39, 31.39, 31.39, 31.39, 31.26, 31.26, 31.26, 31.26, 31.26, 31.35, 31.35, 31.35, 31.35, 31.35, 31.5, 31.5, 31.5, 31.5, 31.5, 31.62, 31.62, 31.62, 31.62, 31.62, 31.62, 31.62, 31.62, 31.62, 31.62, 31.4, 31.4, 31.4, 31.4, 31.4, 31.05, 31.05, 31.05, 31.05, 31.05, 30.72, 30.72, 30.72, 30.72, 30.72, 29.74, 29.74, 29.74, 29.74, 29.74, 29.64, 29.64, 29.64, 29.64, 29.64, 29.57, 29.57, 29.57, 29.57, 29.57, 29.61, 29.61, 29.61, 29.61, 29.61, 29.68, 29.68, 29.68, 29.68, 29.68, 29.75, 29.75, 29.75, 29.75, 29.75, 29.71, 29.71, 29.71, 29.71, 29.71, 29.73, 29.73, 29.73, 29.73, 29.73, 29.63, 29.63, 29.63, 29.63, 29.63, 29.6, 29.6, 29.6, 29.6, 29.6, 29.61, 29.61, 29.61, 29.61, 29.61, 29.8, 29.8, 29.8, 29.8, 29.8, 29.87, 29.87, 29.87, 29.87, 29.87, 29.98, 29.98, 29.98, 29.98, 29.98, 30.04, 30.04, 30.04, 30.04, 30.04, 30.06, 30.06, 30.06]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716161804 --> 1716162428
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.24, 0.24, 0.24, 0.24, 0.24, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.31, 0.31, 0.31, 0.31, 0.31, 0.36, 0.36, 0.36, 0.36, 0.36, 0.3, 0.3, 0.3, 0.3, 0.3, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.29, 0.29, 0.29, 0.29, 0.29, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.33, 0.33, 0.33, 0.33, 0.33, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.22, 0.22, 0.22, 0.22, 0.22, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.38, 0.38, 0.38, 0.38, 0.38, 0.58, 0.58, 0.58, 0.58, 0.58, 0.52, 0.52, 0.52, 0.52, 0.52, 0.48, 0.48, 0.48, 0.48, 0.48, 0.17, 0.17, 0.17, 0.17, 0.17, 0.28, 0.28, 0.28, 0.28, 0.28, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.26, 0.26, 0.26, 0.26, 0.26, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.24, 0.24, 0.24, 0.24, 0.24, 0.23, 0.23, 0.23]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716161804 --> 1716162428
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0]

mofosyne · 2024-05-20T00:54:36Z

CI failed here https://github.com/ggml-org/ci/tree/results/llama.cpp/d3/59f30921a9f62a0fd299c412ff3f270286fea6/ggml-4-x86-cuda-v100

Failed on

 20 - test-backend-ops (Failed)

edit: Ah i see this particular ci fail is due to this new sanatiser run

ci : re-enable sanitizer runs #7358
- Server / server (THREAD, Debug) (push) Failing after 26m
  - https://github.com/ggerganov/llama.cpp/actions/runs/9141008261/job/25134947389
  - 20 - test-backend-ops (Failed)

mofosyne · 2024-05-20T15:13:09Z

CI issue fixed by #7409

llama-cpp no longer supports mpi and rcp is the recommended alternative. See: ggerganov/llama.cpp#7395 Signed-off-by: Maxwell Henderson <mxwhenderson@gmail.com>

llama-cpp no longer supports mpi and rpc is the recommended alternative. See: ggerganov/llama.cpp#7395 Signed-off-by: Maxwell Henderson <mxwhenderson@gmail.com>

llama : remove MPI backend

78cded5

ggerganov approved these changes May 19, 2024

View reviewed changes

slaren merged commit d359f30 into master May 19, 2024
76 checks passed

slaren deleted the sl/remove-mpi branch May 19, 2024 23:17

mofosyne mentioned this pull request May 20, 2024

ci failing on main branch #7403

Closed

maxstrid mentioned this pull request May 22, 2024

llama-cpp: Add rpc and remove mpi support NixOS/nixpkgs#313525

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : remove MPI backend #7395

llama : remove MPI backend #7395

slaren commented May 19, 2024

github-actions bot commented May 19, 2024

mofosyne commented May 20, 2024 •

edited

Loading

mofosyne commented May 20, 2024

llama : remove MPI backend #7395

llama : remove MPI backend #7395

Conversation

slaren commented May 19, 2024

github-actions bot commented May 19, 2024

mofosyne commented May 20, 2024 • edited Loading

mofosyne commented May 20, 2024

mofosyne commented May 20, 2024 •

edited

Loading