-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Insights: triton-inference-server/server
Overview
Could not load contribution data
Please try again later
4 Pull requests merged by 3 people
-
[doc]Adjusted formatting of the warning
#7675 merged
Oct 3, 2024 -
ci: Reducing flakiness of
L0_python_api
#7674 merged
Oct 2, 2024 -
Build: Update server in master post 24.09
#7658 merged
Oct 2, 2024 -
[docs] Removed vLLM meetup announcement
#7673 merged
Oct 1, 2024
3 Pull requests opened by 2 people
-
fix: `tritonfrontend` gRPC Streaming Segmentation Fault
#7671 opened
Sep 29, 2024 -
build: Adding `tritonfrontend` to `build.py`
#7681 opened
Oct 4, 2024 -
fix: Support sampling parameters of type List for vLLM backend (stop words)
#7682 opened
Oct 5, 2024
4 Issues closed by 4 people
-
Running separate DCGM on Kubernetes cluster
#7597 closed
Oct 3, 2024 -
Triton ensemble LLM model (Llama 3.1 8B Instruct) returns prompt in the output
#7665 closed
Oct 1, 2024 -
Failed to unload model (vLLM Backend) after running inference in streaming mode
#7626 closed
Sep 30, 2024 -
Instance_group config behaves strangely in the example jetson/concurrency_and_dynamic_batching
#7657 closed
Sep 30, 2024
6 Issues opened by 6 people
-
Ability to do casting between datatypes within backend
#7680 opened
Oct 4, 2024 -
are FP8 models supported in Triton ??
#7678 opened
Oct 4, 2024 -
Triton ONNX runtime backend slower than onnxruntime python client on CPU
#7677 opened
Oct 3, 2024 -
Dynamic batching not working with TRT-LLM backend
#7676 opened
Oct 3, 2024 -
Histogram Metric for multi-instance tail latency aggregation
#7672 opened
Oct 1, 2024 -
DCGM unable to start: DCGM initialization error,Error: Failed to initialize NVML
#7670 opened
Sep 29, 2024
19 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Add copyright hook
#7666 commented on
Oct 4, 2024 • 13 new comments -
feat: `tritonfrontend` support for no/partial endpoint builds
#7605 commented on
Oct 4, 2024 • 3 new comments -
build: update build.py to pass vllm versions as input parameter and convert version map to dictionary
#7500 commented on
Oct 1, 2024 • 3 new comments -
fix: usage of ReadDataFromJson in array tensors
#7624 commented on
Oct 4, 2024 • 1 new comment -
feat: OpenAI Compatible Frontend
#7561 commented on
Oct 5, 2024 • 0 new comments -
Direct Streaming of Model Weights from Cloud Storage to GPU Memory
#7660 commented on
Oct 3, 2024 • 0 new comments -
[Critical] Triton stops processing requests and crashes
#7649 commented on
Oct 3, 2024 • 0 new comments -
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?
#7667 commented on
Oct 2, 2024 • 0 new comments -
error: creating server: Internal - s3:// file-system not supported. To enable, build with -DTRITON_ENABLE_S3=ON.
#7582 commented on
Oct 2, 2024 • 0 new comments -
Support for vLLM and TRT-LLM running in OpenAI compatible mode
#6583 commented on
Oct 2, 2024 • 0 new comments -
When there are multiple GPU, only one GPU is used
#7664 commented on
Oct 2, 2024 • 0 new comments -
python_backend pytorch example as_numpy() error
#7647 commented on
Oct 1, 2024 • 0 new comments -
Python backend SHM memory leak
#7481 commented on
Oct 1, 2024 • 0 new comments -
Ability to make preferred_batch_size mandatory
#7604 commented on
Oct 1, 2024 • 0 new comments -
Can't load custom backend shared library from s3 (24.07)
#7550 commented on
Sep 30, 2024 • 0 new comments -
Deploy TTS model with Triton and onnx backend, failed:Protobuf parsing failed
#7654 commented on
Sep 30, 2024 • 0 new comments -
UNAVAILABLE: Not found: unable to load shared library: %1 is not a valid Win32 application
#7636 commented on
Sep 30, 2024 • 0 new comments -
incompatible constructor arguments for c_python_backend_utils.InferenceRequest
#7639 commented on
Sep 30, 2024 • 0 new comments -
Triton crashes with SIGSEGV (signal 11)
#7472 commented on
Sep 30, 2024 • 0 new comments