Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft proposal for new kernel providers mechanism and associated changes #45

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions kernel-discovery/kernel-discovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# New kernel discovery framework

## Problem

It's common for a programmer to have several separate 'environments', with
different packages installed, on one computer. Virtualenv, conda and pyenv are
popular tools to do this for the Python language, while container systems
such as Docker and Singularity can be used for a similar purpose.

The 'kernelspecs' system was designed to tell Jupyter about roughly
one kernel for each programming language. With care, it's possible to use it
to describe kernels running the same language in different environments, but
there are pitfalls which result in user confusion. Symptoms include:

- You have installed a package, but you can't import it, or you load the wrong
version, because the kernel is not running in the environment you think.
- Your kernel won't start at all, because you have a kernelspec created from
an environment you later deleted.

Packages such as [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels)
manage available kernels differently. However, they replace the default
kernelspec system, rather than adding to it. Some also affect only the
Jupyter Notebook server application, so the available kernels are inconsistent
across applications.

## Proposed Enhancement

Three related changes are proposed to improve the situation:

1. A new extension mechanism, *kernel providers*, which allows more than one
installed package to discover and launch kernels, along with a namespacing
mechanism for kernel types.
2. A convention that *kernel type IDs*, unique names identifying specific kernel
types in the provider system, should be exposed in user interfaces where
kernels are listed.
3. Relaxing the relationship between a notebook file and a specific named
kernel, which may not exist when the notebook is sent to another computer.

## Detail Explanation

### Kernel providers mechanism

This has been developed experimentally in two repositories, which this proposal
would move to the Jupyter organisation on GitHub:

- https://github.com/takluyver/jupyter_kernel_mgmt
- https://github.com/takluyver/jupyter_protocol

More details are available at those repositories, and in the
[jupyter_kernel_mgmt docs](https://jupyter-kernel-mgmt.readthedocs.io/en/latest/),
but a summary follows.

Kernel providers are discovered by Python
[entry points](https://entrypoints.readthedocs.io/en/latest/), which can be
installed as part of any Python package. Each provider has an ID, a method to
list kernels that it knows about, and a method to launch a kernel of a named
type.

Kernel type IDs are a combination of the provider's ID, and the provider's name
for that kernel. For instance, a built-in kernel provider called `spec`
uses the existing kernelspec mechanism, and it may advertise a kernel name
`python3`. The kernel type ID would be `spec/python3`. Another kernel type from
another provider may have ID `conda/env-py38`.

Orthogonally to this proposal, the new kernel management APIs are built to be
asynchronous by default (with blocking wrappers), and include a mechanism to
pass parameters to a kernel at launch.

### Exposing kernel type IDs in UI

Kernelspecs include a 'display name' field, e.g. `"Python 3"`, meant to be used
in UI in preference to the unique name, e.g. `"python3"`.
However, experience has shown that it's easy to end up with multiple kernel
types with the same display name, which are then impossible to distinguish in UI.
The kernel provider framework will likely exacerbate this.

This proposal strongly recommends that applications make the unique kernel
type IDs (e.g. `spec/python3`) fully visible in the user interface.
This could be presented alongside display names or other useful information.

Using semi-readable unique names is familiar to programmers - we do this all
the time with variables, filenames, URLs, etc. It was a mistake to hide this
simple concept behind a confusing 'user friendly' veneer.

### Don't name kernels in notebook metadata

When we thought of kernels as roughly one per language, it made sense to
name the kernel in notebook metadata - a Julia notebook should obviously have a
Julia kernel.

As environments proliferate and notebooks are widely shared, this is a problem.
You may create a notebook with a kernel named after one of your environments,
then share it with a coworker who doesn't have an environment with that name.
Or you may run a notebook in a command line tool, and be surprised when it
ignores the environment you have chosen in the shell.

We hope that freeing tools from the constraint of a kernel named in the file
will allow them to experiment with different ways of finding or
creating a suitable kernel to run the notebook. We offer the following
guidelines for this:

1. In most cases, it's still important to match the programming language of the
code in the notebook with the kernel that will run it.
2. The active environment is often a big hint, especially for command-line
tools. If you have activated environment X and you run something which
decides to execute code in environment Y, there's a good chance that's a
nasty surprise.
3. *In the face of ambiguity, refuse the temptation to guess.* (borrowed from
[the Zen of Python](https://www.python.org/dev/peps/pep-0020/))

## Pros and Cons

Pros associated with this implementation include:
* Hopefully, in the long run, more transparency about what kernels are available
and which is used.
* Better foundations for remote kernels, kernels inside containers, etc.

Cons associated with this implementation include:
* Extra complexity - this keeps the kernelspec mechanism (for compatibility)
while building an extra layer of public API above that.

## Interested Contributors

Thomas Kluyver, Kevin Bates

(Feel free to add to this list)