Skip to content

Tags: legraphista/llama.cpp

Tags

b3091

Toggle b3091's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : refactor rope norm/neox (ggerganov#7634)

* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci

b3089

Toggle b3089's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix per token atrributes bits (ggerganov#7749)

b3088

Toggle b3088's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Allow number of nodes in CUDA graph to change (ggerganov#7738)

Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.

b3087

Toggle b3087's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : refactor cli arg parsing (ggerganov#7675)

* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params

b3086

Toggle b3086's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : remove OpenCL (ggerganov#7735)

ggml-ci

b3085

Toggle b3085's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : remove beam search (ggerganov#7736)

b3083

Toggle b3083's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-bench : allow using a different printer for stderr with -oe (gg…

…erganov#7722)

compare-commits.sh : hide stdout, use -oe to print markdown

b3082

Toggle b3082's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Improve hipBLAS support in CMake (ggerganov#7696)

* Improve hipBLAS support in CMake

This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.

* Set ROCM_PATH correctly

b3080

Toggle b3080's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Per token attributes (ggerganov#7685)

* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c)
* Replace llama_token_type with llama_token_attribs

b3079

Toggle b3079's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : prevent builds with -ffinite-math-only (ggerganov#7726)

This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu ggerganov#7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some 
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref ggerganov#7154 (comment)