Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Logprobs support in Multi-step #7652

Merged
merged 49 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
f97e0ae
added example
afeldman-nm Aug 21, 2024
f969241
wip:
afeldman-nm Aug 21, 2024
642d31b
first working attempt at logprobs
afeldman-nm Aug 21, 2024
a0ca262
merge; format
afeldman-nm Aug 21, 2024
ed97288
passing test; dataclass
afeldman-nm Aug 21, 2024
861e1b9
refactoring
afeldman-nm Aug 21, 2024
8bc0765
Merge branch 'main' into logprobs_merge
afeldman-nm Aug 21, 2024
a34d1ac
refactoring
afeldman-nm Aug 21, 2024
4cda5c0
Merge branch 'logprobs' into logprobs_merge
afeldman-nm Aug 21, 2024
ac8a39a
Merge branch 'main' into logprobs_merge
afeldman-nm Aug 21, 2024
1284327
removing example
afeldman-nm Aug 21, 2024
a6c1207
removed example from build pipeline
afeldman-nm Aug 21, 2024
fe42995
fixed one docstring; embedded NUM_LOGPROBS
afeldman-nm Aug 21, 2024
9fb5bbe
test refactor
afeldman-nm Aug 21, 2024
046a8b1
incremental refactors
afeldman-nm Aug 21, 2024
fa86efd
remove unnecessary conftest change
afeldman-nm Aug 21, 2024
1c0ffb6
Update vllm/model_executor/layers/sampler.py
afeldman-nm Aug 21, 2024
3babadb
refactor
afeldman-nm Aug 21, 2024
f502029
Merge branch 'afeldman-nm/logprobs' of https://github.com/neuralmagic…
afeldman-nm Aug 21, 2024
1875b37
test_multi_step comment
afeldman-nm Aug 21, 2024
3760a95
utils function docstrings
afeldman-nm Aug 21, 2024
d43308c
docstring refactors
afeldman-nm Aug 21, 2024
54db498
merge
afeldman-nm Aug 21, 2024
dfbbaf0
passing tests & formatted
afeldman-nm Aug 21, 2024
5eebfca
Merge branch 'main' into logprobs_merge
afeldman-nm Aug 21, 2024
5e23d9a
Merge branch 'main' into logprobs_merge
afeldman-nm Aug 22, 2024
717efa3
merge; format
afeldman-nm Aug 22, 2024
e0d59ce
removed incorrect SamplerOutput imports
afeldman-nm Aug 22, 2024
102fd92
formatting
afeldman-nm Aug 22, 2024
948f4ef
Update tests/multi_step/test_correctness.py
afeldman-nm Aug 22, 2024
6e6711f
fixed comment
afeldman-nm Aug 22, 2024
f61163e
merge; format
afeldman-nm Aug 23, 2024
1cc93dd
rename
afeldman-nm Aug 23, 2024
4995204
Merge branch 'logprobs' into logprobs_merge
afeldman-nm Aug 23, 2024
da5826b
test modification
afeldman-nm Aug 26, 2024
d4fb430
merge; format
afeldman-nm Aug 26, 2024
b6752e0
merge
afeldman-nm Aug 27, 2024
1e42656
formatting
afeldman-nm Aug 27, 2024
cd0fdf9
disabled logprobs pythonization when logprobs are disabled
afeldman-nm Aug 27, 2024
3fecbc4
wip
afeldman-nm Aug 27, 2024
67bd035
skip logprobs processing entirely when logprobs are not enabled; form…
afeldman-nm Aug 27, 2024
419659d
multi-step output processing; formatting
afeldman-nm Aug 27, 2024
55eaab9
wip
afeldman-nm Aug 27, 2024
bae1fb9
small fixes
afeldman-nm Aug 27, 2024
fbb75b7
reverting to no prompt-logprobs support; merged in main
afeldman-nm Aug 28, 2024
63c5582
timeout increase
afeldman-nm Aug 28, 2024
8191571
refactoring
afeldman-nm Aug 28, 2024
9a708f8
Merge branch 'main' into logprobs_no_prompt
afeldman-nm Aug 28, 2024
e54606d
upstream merge
afeldman-nm Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
first working attempt at logprobs
  • Loading branch information
afeldman-nm committed Aug 21, 2024
commit 642d31b814b3dceab9636dcb3c1367ea5968699d
1 change: 1 addition & 0 deletions examples/offline_inference_multi_step.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
gpu_memory_utilization=0.9,
num_scheduler_steps=8,
use_v2_block_manager=True,
enforce_eager=True,
)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
Expand Down
3 changes: 2 additions & 1 deletion tests/spec_decode/test_multi_step_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
import pytest
import torch

from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.model_executor.utils import set_random_seed
from vllm.sequence import ExecuteModelRequest, Logprob, SamplerOutput
from vllm.sequence import ExecuteModelRequest, Logprob
from vllm.spec_decode.draft_model_runner import TP1DraftModelRunner
from vllm.spec_decode.multi_step_worker import MultiStepWorker
from vllm.spec_decode.top1_proposer import Top1Proposer
Expand Down
3 changes: 2 additions & 1 deletion tests/spec_decode/test_spec_decode_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
import pytest
import torch

from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.model_executor.utils import set_random_seed
from vllm.sequence import ExecuteModelRequest, SamplerOutput, SequenceOutput
from vllm.sequence import ExecuteModelRequest, SequenceOutput
from vllm.spec_decode.interfaces import SpeculativeProposals
from vllm.spec_decode.metrics import (AsyncMetricsCollector,
SpecDecodeWorkerMetrics)
Expand Down
4 changes: 2 additions & 2 deletions tests/spec_decode/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@
import torch

from vllm.engine.arg_utils import EngineArgs
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.model_executor.utils import set_random_seed
from vllm.sampling_params import SamplingParams
from vllm.sequence import (VLLM_TOKEN_ID_ARRAY_TYPE,
CompletionSequenceGroupOutput, Logprob,
SamplerOutput, SequenceData, SequenceGroupMetadata,
SequenceOutput)
SequenceData, SequenceGroupMetadata, SequenceOutput)
from vllm.utils import get_distributed_init_method, get_ip, get_open_port
from vllm.worker.cache_engine import CacheEngine
from vllm.worker.model_runner import ModelRunner
Expand Down
5 changes: 3 additions & 2 deletions tests/test_sequence.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

import pytest

from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import (VLLM_TOKEN_ID_ARRAY_TYPE,
CompletionSequenceGroupOutput, SamplerOutput,
SequenceData, SequenceOutput)
CompletionSequenceGroupOutput, SequenceData,
SequenceOutput)

from .core.utils import create_dummy_prompt

Expand Down
4 changes: 2 additions & 2 deletions vllm/engine/async_llm_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@
from vllm.inputs.parse import is_explicit_encoder_decoder_prompt
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.outputs import EmbeddingRequestOutput, RequestOutput
from vllm.pooling_params import PoolingParams
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sampling_params import SamplingParams
from vllm.sequence import (ExecuteModelRequest, SamplerOutput,
SequenceGroupMetadata)
from vllm.sequence import ExecuteModelRequest, SequenceGroupMetadata
from vllm.usage.usage_lib import UsageContext
from vllm.utils import print_warning_once

Expand Down
6 changes: 3 additions & 3 deletions vllm/engine/llm_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,16 @@
from vllm.inputs.parse import is_explicit_encoder_decoder_prompt
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.multimodal import MultiModalDataDict
from vllm.outputs import (EmbeddingRequestOutput, RequestOutput,
RequestOutputFactory)
from vllm.pooling_params import PoolingParams
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sampling_params import SamplingParams
from vllm.sequence import (EmbeddingSequenceGroupOutput, ExecuteModelRequest,
PoolerOutput, SamplerOutput, Sequence,
SequenceGroup, SequenceGroupMetadata,
SequenceStatus)
PoolerOutput, Sequence, SequenceGroup,
SequenceGroupMetadata, SequenceStatus)
from vllm.tracing import (SpanAttributes, SpanKind, extract_trace_context,
init_tracer)
from vllm.transformers_utils.config import try_get_generation_config
Expand Down
3 changes: 2 additions & 1 deletion vllm/engine/output_processor/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
from typing import Sequence as GenericSequence
from typing import Union

from vllm.sequence import PoolerOutput, SamplerOutput, SequenceGroupOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import PoolerOutput, SequenceGroupOutput


def create_output_by_sequence_group(
Expand Down
2 changes: 1 addition & 1 deletion vllm/engine/protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
from vllm.core.scheduler import SchedulerOutputs
from vllm.inputs.data import PromptInputs
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.outputs import EmbeddingRequestOutput, RequestOutput
from vllm.pooling_params import PoolingParams
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sampling_params import SamplingParams
from vllm.sequence import SamplerOutput


@runtime_checkable
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/cpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@
ResultHandler, WorkerMonitor)
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (GiB_bytes, get_distributed_init_method, get_open_port,
get_vllm_instance_id, make_async)
from vllm.worker.worker_base import WorkerWrapperBase
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/distributed_gpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
from vllm.executor.gpu_executor import GPUExecutor
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest

logger = init_logger(__name__)

Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/executor_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@
PromptAdapterConfig, SchedulerConfig,
SpeculativeConfig)
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.sequence import ExecuteModelRequest


class ExecutorBase(ABC):
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/gpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
from vllm.executor.executor_base import ExecutorAsyncBase, ExecutorBase
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.prompt_adapter.request import PromptAdapterRequest
from vllm.sequence import ExecuteModelRequest, PoolerOutput, SamplerOutput
from vllm.sequence import ExecuteModelRequest, PoolerOutput
from vllm.utils import (get_distributed_init_method, get_ip, get_open_port,
make_async)
from vllm.worker.worker_base import WorkerBase, WorkerWrapperBase
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/multiproc_gpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
from vllm.executor.multiproc_worker_utils import (ProcessWorkerWrapper,
ResultHandler, WorkerMonitor)
from vllm.logger import init_logger
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.triton_utils import maybe_set_triton_cache_manager
from vllm.utils import (_run_task_with_lock, cuda_device_count_stateless,
get_distributed_init_method, get_open_port,
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/neuron_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
from vllm.executor.executor_base import ExecutorAsyncBase, ExecutorBase
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import make_async

logger = init_logger(__name__)
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/openvino_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
from vllm.executor.executor_base import ExecutorAsyncBase, ExecutorBase
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (GiB_bytes, get_distributed_init_method, get_ip,
get_open_port, make_async)

Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/ray_gpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
from vllm.executor.msgspec_utils import encode_hook
from vllm.executor.ray_utils import RayWorkerWrapper, ray
from vllm.logger import init_logger
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (_run_task_with_lock, get_distributed_init_method,
get_ip, get_open_port, get_vllm_instance_id,
make_async)
Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/ray_tpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
from vllm.executor.ray_utils import RayWorkerWrapper, ray
from vllm.executor.tpu_executor import TPUExecutor
from vllm.logger import init_logger
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (get_distributed_init_method, get_ip, get_open_port,
get_vllm_instance_id, make_async)

Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/ray_xpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
from vllm.executor.ray_utils import RayWorkerWrapper, ray
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (get_distributed_init_method, get_ip, get_open_port,
make_async)

Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/tpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
from vllm.executor.executor_base import ExecutorAsyncBase, ExecutorBase
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.sequence import ExecuteModelRequest, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest
from vllm.utils import (get_distributed_init_method, get_ip, get_open_port,
make_async)

Expand Down
3 changes: 2 additions & 1 deletion vllm/executor/xpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
from vllm.executor.executor_base import ExecutorAsyncBase
from vllm.executor.gpu_executor import GPUExecutor
from vllm.logger import init_logger
from vllm.sequence import ExecuteModelRequest, PoolerOutput, SamplerOutput
from vllm.model_executor.layers.sampler import SamplerOutput
from vllm.sequence import ExecuteModelRequest, PoolerOutput
from vllm.utils import make_async
from vllm.worker.worker_base import WorkerBase

Expand Down
Loading