Skip to content

Commit

Permalink
Document how to run the unit tests
Browse files Browse the repository at this point in the history
The tests in `sampler_test.py` are currently skipped since no tokenizer is currently distributed with the `gemma` sources.

I also auto-formatted the `README.md` using `mdformat`.

PiperOrigin-RevId: 608961902
Change-Id: I41845f0005d4337f21093f53e77b4e041881120d
  • Loading branch information
alimuldal committed Feb 21, 2024
1 parent f03bf63 commit e11c26c
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 25 deletions.
73 changes: 48 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,29 @@
# Gemma

[Gemma](https://ai.google.dev/gemma) is a family of open-weights Large Language
Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini
research and technology.
[Gemma](https://ai.google.dev/gemma) is a family of open-weights Large Language
Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini
research and technology.

This repository contains an inference implementation and
examples, based on the [Flax](https://github.com/google/flax) and
[JAX](https://github.com/google/jax).
This repository contains an inference implementation and examples, based on the
[Flax](https://github.com/google/flax) and [JAX](https://github.com/google/jax).

### Learn more about Gemma
- The [Gemma technical report](https://goo.gle/GemmaReport) details the models'
capabilities.
- For tutorials, reference implementations in other ML frameworks, and more,
visit https://ai.google.dev/gemma.

- The [Gemma technical report](https://ai.google.dev/gemma/technical-report)
details the models' capabilities.
- For tutorials, reference implementations in other ML frameworks, and more,
visit https://ai.google.dev/gemma.

## Quick start

### Installation

1. To install Gemma you need to use Python 3.10 or higher.
1. To install Gemma you need to use Python 3.10 or higher.

2. Install JAX for CPU, GPU or TPU. Follow instructions at [the JAX website](https://jax.readthedocs.io/en/latest/installation.html).
2. Install JAX for CPU, GPU or TPU. Follow instructions at
[the JAX website](https://jax.readthedocs.io/en/latest/installation.html).

3. Run
3. Run

```
python -m venv gemma-demo
Expand All @@ -32,37 +33,59 @@ pip install git+https://github.com/google-deepmind/gemma.git

### Downloading the models

The model checkpoints are available through Kaggle at http://kaggle.com/models/google/gemma.
Please be sure the download Flax checkpoints, as well as the tokenizer.
The model checkpoints are available through Kaggle at
http://kaggle.com/models/google/gemma. Please be sure the download Flax
checkpoints, as well as the tokenizer.

### Running the unit tests

To run the unit tests, install the optional `[test]` dependencies (e.g. using
`pip install -e .[test]` from the root of the source tree), then:

```
pytest .
```

Note that the tests in `sampler_test.py` are skipped by default since no
tokenizer is distributed with the Gemma sources. To run these tests, download a
tokenizer and update the `_VOCAB` constant in `sampler_test.py`.

## Examples

- [`colabs/sampling_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.py) contains a [Colab](http://colab.google) notebook with a sampling example.
- [`colabs/sampling_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.py)
contains a [Colab](http://colab.google) notebook with a sampling example.

- [`colabs/fine_tuning_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/fine_tuning_tutorial.py) contains a [Colab](http://colab.google) with a basic tutorial on how to fine tune Gemma for a task, such as English to French
translation.
- [`colabs/fine_tuning_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/fine_tuning_tutorial.py)
contains a [Colab](http://colab.google) with a basic tutorial on how to fine
tune Gemma for a task, such as English to French translation.

- [`colabs/gsm8k_eval.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/gsm8k_eval.py) is a [Colab](http://colab.google) with a reference GSM8K eval implementation.
- [`colabs/gsm8k_eval.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/gsm8k_eval.py)
is a [Colab](http://colab.google) with a reference GSM8K eval
implementation.

## System Requirements

Gemma can run on a CPU, GPU and TPU. For GPU, we recommend a 8GB+ RAM on GPU for
the 2B checkpoint and 24GB+ RAM on GPU for the 7B checkpoint.

## Contributing
We are open to bug reports, pull requests (PR), and other contributions. Please see
[CONTRIBUTING.md](CONTRIBUTING.md) for details on PRs.

We are open to bug reports, pull requests (PR), and other contributions. Please
see [CONTRIBUTING.md](CONTRIBUTING.md) for details on PRs.

## License

Copyright 2024 DeepMind Technologies Limited

This code is licensed under the Apache License, Version 2.0 (the \"License\");
This code is licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License. You may obtain
a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

## Disclaimer

This is not an official Google product.
7 changes: 7 additions & 0 deletions gemma/sampler_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"""Minimal test for sampler."""

import os
import unittest

from absl.testing import absltest
from gemma import sampler as sampler_lib
Expand All @@ -28,10 +29,15 @@
# TODO: Replace with a mock tokenizer.
# Download the tokenizer and put its path here.
_VOCAB = ''
_NO_TOKENIZER_MESSAGE = (
'No tokenizer path specified. Please download a tokenizer and update the'
' `_VOCAB` constant.'
)


class SamplerTest(absltest.TestCase):

@unittest.skipIf(not _VOCAB, _NO_TOKENIZER_MESSAGE)
def test_samples(self):
vocab = spm.SentencePieceProcessor()
vocab.Load(_VOCAB)
Expand Down Expand Up @@ -68,6 +74,7 @@ def test_samples(self):
result = sampler(['input string', 'hello world'], total_generation_steps=10)
self.assertIsNotNone(result)

@unittest.skipIf(not _VOCAB, _NO_TOKENIZER_MESSAGE)
def test_forward_equivalence(self):
vocab = spm.SentencePieceProcessor()
vocab.Load(_VOCAB)
Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ readme = "README.md"
absl-py = "^2.1.0"
sentencepiece = "^0.1.99"
flax = "^0.7.5"
pytest = {version = "^8.0.0", optional = true}

[tool.poetry.extras]
test = ["pytest"]

[build-system]
requires = ["poetry-core"]
Expand Down

0 comments on commit e11c26c

Please sign in to comment.