Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Falcon block for inference #500

Merged
merged 18 commits into from
Sep 4, 2023
Prev Previous commit
Next Next commit
Run tests on CUDA and CPU,
  • Loading branch information
mryab committed Sep 4, 2023
commit 91f6248535d6d248fbea6891e657908358bd4bf3
7 changes: 5 additions & 2 deletions tests/test_optimized_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,14 @@ def _collapse_states(self, state: torch.Tensor) -> torch.Tensor:


@pytest.mark.skipif("falcon" not in MODEL_NAME, reason="This test is applicable only to Falcon models")
@pytest.mark.parametrize("device", ["cpu", "cuda:0"])
@pytest.mark.forked
def test_falcon():
def test_falcon(device):
if device == "cuda:0" and not torch.cuda.is_available():
pytest.skip("CUDA tests can be run only in CUDA-enabled setups")

config = AutoDistributedConfig.from_pretrained(MODEL_NAME)

device = "cpu"
tensor_parallel_devices = (device,)
dtype = torch.bfloat16
quant_type = QuantType.NONE
Expand Down