Document how to run the unit tests

The tests in `sampler_test.py` are currently skipped since no tokenizer is currently distributed with the `gemma` sources. I also auto-formatted the `README.md` using `mdformat`. PiperOrigin-RevId: 608961902 Change-Id: I41845f0005d4337f21093f53e77b4e041881120d
neveries · Feb 21, 2024 · e11c26c · e11c26c
1 parent f03bf63
commit e11c26c
Show file tree

Hide file tree

Showing 3 changed files with 59 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -1,28 +1,29 @@
 # Gemma
 
-[Gemma](https://ai.google.dev/gemma) is a family of open-weights Large Language 
-Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini 
-research and technology. 
+[Gemma](https://ai.google.dev/gemma) is a family of open-weights Large Language
+Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini
+research and technology.
 
-This repository contains an inference implementation and 
-examples, based on the [Flax](https://github.com/google/flax) and 
-[JAX](https://github.com/google/jax).
+This repository contains an inference implementation and examples, based on the
+[Flax](https://github.com/google/flax) and [JAX](https://github.com/google/jax).
 
 ### Learn more about Gemma
- - The [Gemma technical report](https://goo.gle/GemmaReport) details the models'
-   capabilities. 
- - For tutorials, reference implementations in other ML frameworks, and more, 
-   visit https://ai.google.dev/gemma.
+
+-   The [Gemma technical report](https://ai.google.dev/gemma/technical-report)
+    details the models' capabilities.
+-   For tutorials, reference implementations in other ML frameworks, and more,
+    visit https://ai.google.dev/gemma.
 
 ## Quick start
 
 ### Installation
 
-1. To install Gemma you need to use Python 3.10 or higher. 
+1.  To install Gemma you need to use Python 3.10 or higher.
 
-2. Install JAX for CPU, GPU or TPU. Follow instructions at [the JAX website](https://jax.readthedocs.io/en/latest/installation.html).
+2.  Install JAX for CPU, GPU or TPU. Follow instructions at
+    [the JAX website](https://jax.readthedocs.io/en/latest/installation.html).
 
-3. Run 
+3.  Run
 
 ```
 python -m venv gemma-demo
@@ -32,37 +33,59 @@ pip install git+https://github.com/google-deepmind/gemma.git
 
 ### Downloading the models
 
-The model checkpoints are available through Kaggle at http://kaggle.com/models/google/gemma.
-Please be sure the download Flax checkpoints, as well as the tokenizer.
+The model checkpoints are available through Kaggle at
+http://kaggle.com/models/google/gemma. Please be sure the download Flax
+checkpoints, as well as the tokenizer.
+
+### Running the unit tests
+
+To run the unit tests, install the optional `[test]` dependencies (e.g. using
+`pip install -e .[test]` from the root of the source tree), then:
+
+```
+pytest .
+```
+
+Note that the tests in `sampler_test.py` are skipped by default since no
+tokenizer is distributed with the Gemma sources. To run these tests, download a
+tokenizer and update the `_VOCAB` constant in `sampler_test.py`.
 
 ## Examples
 
- - [`colabs/sampling_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.py) contains a [Colab](http://colab.google) notebook with a sampling example.
+-   [`colabs/sampling_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.py)
+    contains a [Colab](http://colab.google) notebook with a sampling example.
 
- - [`colabs/fine_tuning_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/fine_tuning_tutorial.py) contains a [Colab](http://colab.google) with a basic tutorial on how to fine tune Gemma for a task, such as English to French
-translation.
+-   [`colabs/fine_tuning_tutorial.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/fine_tuning_tutorial.py)
+    contains a [Colab](http://colab.google) with a basic tutorial on how to fine
+    tune Gemma for a task, such as English to French translation.
 
- - [`colabs/gsm8k_eval.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/gsm8k_eval.py) is a [Colab](http://colab.google) with a reference GSM8K eval implementation.
+-   [`colabs/gsm8k_eval.py`](https://colab.sandbox.google.com/github/google-deepmind/gemma/blob/main/colabs/gsm8k_eval.py)
+    is a [Colab](http://colab.google) with a reference GSM8K eval
+    implementation.
 
 ## System Requirements
+
 Gemma can run on a CPU, GPU and TPU. For GPU, we recommend a 8GB+ RAM on GPU for
 the 2B checkpoint and 24GB+ RAM on GPU for the 7B checkpoint.
 
 ## Contributing
-We are open to bug reports, pull requests (PR), and other contributions. Please see 
-[CONTRIBUTING.md](CONTRIBUTING.md) for details on PRs.
+
+We are open to bug reports, pull requests (PR), and other contributions. Please
+see [CONTRIBUTING.md](CONTRIBUTING.md) for details on PRs.
 
 ## License
+
 Copyright 2024 DeepMind Technologies Limited
 
-This code is licensed under the Apache License, Version 2.0 (the \"License\"); 
+This code is licensed under the Apache License, Version 2.0 (the \"License\");
 you may not use this file except in compliance with the License. You may obtain
 a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
 
-Unless required by applicable law or agreed to in writing, software distributed 
-under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR 
-CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+Unless required by applicable law or agreed to in writing, software distributed
+under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR
+CONDITIONS OF ANY KIND, either express or implied. See the License for the
 specific language governing permissions and limitations under the License.
 
 ## Disclaimer
+
 This is not an official Google product.
diff --git a/gemma/sampler_test.py b/gemma/sampler_test.py
@@ -15,6 +15,7 @@
 """Minimal test for sampler."""
 
 import os
+import unittest
 
 from absl.testing import absltest
 from gemma import sampler as sampler_lib
@@ -28,10 +29,15 @@
 # TODO: Replace with a mock tokenizer.
 # Download the tokenizer and put its path here.
 _VOCAB = ''
+_NO_TOKENIZER_MESSAGE = (
+    'No tokenizer path specified. Please download a tokenizer and update the'
+    ' `_VOCAB` constant.'
+)
 
 
 class SamplerTest(absltest.TestCase):
 
+  @unittest.skipIf(not _VOCAB, _NO_TOKENIZER_MESSAGE)
   def test_samples(self):
     vocab = spm.SentencePieceProcessor()
     vocab.Load(_VOCAB)
@@ -68,6 +74,7 @@ def test_samples(self):
     result = sampler(['input string', 'hello world'], total_generation_steps=10)
     self.assertIsNotNone(result)
 
+  @unittest.skipIf(not _VOCAB, _NO_TOKENIZER_MESSAGE)
   def test_forward_equivalence(self):
     vocab = spm.SentencePieceProcessor()
     vocab.Load(_VOCAB)

diff --git a/pyproject.toml b/pyproject.toml
@@ -25,6 +25,10 @@ readme = "README.md"
 absl-py = "^2.1.0"
 sentencepiece = "^0.1.99"
 flax = "^0.7.5"
+pytest = {version = "^8.0.0", optional = true}
+
+[tool.poetry.extras]
+test = ["pytest"]
 
 [build-system]
 requires = ["poetry-core"]