Skip to content

Legacy code for Guild's original implementation in Erlang

License

Notifications You must be signed in to change notification settings

guildai/_guild-legacy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Guild AI

Guild AI supplements your TensorFlow™ operations by collecting a wide range of information about your model's performance, including GPU usage, CPU usage, memory consumption and disk IO. You can view all of this information along with your TensorFlow summary output in realtime using Guild AI view.

Guild is used to measure model performance when running on specific systems. The data you collect can be used to optimize your model for specific applications. For example, you can collect metrics such as:

  • GPU memory usage for batch training, single inference, batch inference
  • Inference latency and throughput
  • Impact of hyper parameter tuning on model accuracy and training time

When your model is trained you can run it using Guild's Google Cloud Machine Learning compatible inference server. This can be used as a local dev/test environment in preparation for cloud deployment, or be run in production within your own environment.

Project status

Guild is in a pre-release "alpha" state. All command interfaces, programming interfaces, and data structures may be changed without prior notice. We'll do our best to communicate potentially disruptive changes.

Build dependencies

Guild requires the following software for compilation:

  • make (available via Linux system package or Command Line Tools via Xcode on OSX)
  • Erlang (18 or later)

Runtime dependencies

Guild requires the following software for runtime (i.e. performing model related operations prepare, train, and evaluate).

Compiling Guild

Before building Guild, confirm that you have the required build dependencies installed (see above).

Clone the Guild repository:

$ git clone git@github.com:guildai/guild.git

Change to the Guild directory and run make:

$ cd guild
$ make

Please report any compile errors to the Guild issues list on GitHub.

Create a symlink named guild to guild/scripts/guild-dev that's in your PATH environment. The most convenient location would be /usr/local/bin (requires root access):

$ sudo ln -s GUILD_REPO/scripts/guild-dev /usr/local/bin/guild

where GUILD_REPO is the local Guild repo you cloned above.

Alternatively, create a symlink in a directory in your home directory (e.g. ~/Bin) and include that directory in your PATH environment variable.

$ sudo ln -s GUILD_REPO/scripts/guild-dev ~/Bin/guild

Future releases of Guild will provide precompiled packages for Linux and OSX to simplify the process of installing Guild.

Verify that Guild is available by running:

$ guild --help

If you get an error message, verify that you've completed the steps above. If you can't resolve the issue, please open an issue.

Using Guild

The easiest way to start using Guild is to run some of the examples. Clone the example repository:

$ git clone git@github.com:guildai/guild-examples.git

Change to the MNIST example and train the intro model. This model downloads MNIST images and so requires an initial prepare operation before any of the models can be trained.

$ cd guild-examples/mnist
$ guild prepare

This operation will take some time to download the MNIST images. When it finished, train the intro model:

$ guild train intro

The intro example corresponds to TensorFlow's MNIST for ML Beginners. It's a very simple model and should train in a few seconds even on a CPU.

Next, run Guild View from the same directory:

$ guild view

Open http://localhost:6333 to view the training result. You should see the results of the intro training, including the model validation accuracy, training accuracy, steps, and time. The view also includes time series charts that plot training loss, accuracy, and CPU/GPU information during the operation. Note the training may not have run long enough in this simple case to collect system stats.

Next, train the expert version of MNIST. You can keep running View during any Guild operation -- in that case, open another terminal, change to guild-examples/mnist and run:

$ guild train expert

This model correspond to TensorFlow's Deep MNIST for Experts example. As it trains a multi-layer convolutional neural network it takes longer to train.

You can view the training progress in real time in Guild View -- select the latest training operation from the dropdown selector in the top left of the View page.

You can compare the performance of multiple runs in Guild View by clicking the Compare tab. When the expert model finishes training, you can compare its validation accuracy to the intro model -- it's significantly more accurate, at the cost of a longer and more computationally expensive training run.

You can train either model using more epochs (rounds of training using the entire MNIST training set) -- this will improve validation accuracy up to a point:

$ guild train expert -F epochs=5

The -F sets a model flag that is used by the operation. In this case we're asking the model to train over 5 epochs. You should see a slight improvement in validation accuracy -- again, at the cost of more training.

Finally, evaluate the model performance using the MNIST test data:

$ guild evaluate --latest-run

This will evaluate the model trained on the latest and print the test accuracy.

For background on why test is different from validation, see this section in TensorFlow's documentation on network retraining.

Next Steps

Documentation for Guild is in process but not yet available. While lacking in detail, you may benefit from:

  • Reading Guild examples source code
  • Using guild --help and guild COMMAND --help
  • Guild-enable an existing project by running guild init and editing the generated Guild project file