Skip to content

Commit

Permalink
docs: DOC-210: Add information about ML backend data access (#5815)
Browse files Browse the repository at this point in the history
Added a section entitled "Allow the ML backend to access Label Studio
data" to the docs


Affects:
- [X] Enterprise docs
- [X] Community docs

---------

Co-authored-by: caitlinwheeless <caitlin@humansignal.com>
  • Loading branch information
caitlinwheeless and caitlinwheeless committed May 2, 2024
1 parent 727f1f6 commit a1c2a00
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 0 deletions.
54 changes: 54 additions & 0 deletions docs/source/guide/ml.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,60 @@ Some of them work without any additional configuration. Check the **Required par
| [interactive_substring_matching](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/interactive_substring_matching) | Simple keywords search |||| None |
| [langchain_search_agent](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/langchain_search_agent) | RAG pipeline with Google Search and [Langchain](https://langchain.com/) |||| OPENAI_API_KEY, GOOGLE_CSE_ID, GOOGLE_API_KEY |

## Allow the ML backend to access Label Studio data

In most cases, you will need to set your environment variables to allow the ML backend access to the data in Label Studio.

Label Studio tasks can have multiple sources of resource files:

* Direct `http` and `https` links.
Example: `task['data'] = {"image": "http://example.com/photo_1.jpg"}`

* Files that have been uploaded to Label Studio using the **Import** action.
Example: `task['data'] = {"image": "https://ls-instance/data/upload/42/photo_1.jpg"}`

* Files added through a [local storage connection](storage#Local-storage).
Example: `task['data'] = {"image": "https://ls-instance/data/local-files/?d=folder/photo_1.jpg"}`

* Files added through a [cloud storage](storage) (S3, GCS, Azure) connection.
Example: `task['data'] = {"image": "s3://bucket/prefix/photo_1.jpg"}`

When Label Studio invokes the `predict(tasks)` method on an ML backend, it sends tasks containing data sub-dictionaries with links to resource files.

Downloading files from direct `http` and `https` links (the first example above) is straightforward. However, the other three types (imported files, local storage files, and cloud storage files) are more complex.

To address this, the ML backend utilizes the `get_local_path(url, task_id)` function from the `label_studio_tools` package (this package is pre-installed with `label-studio-ml-backend`):

```python
from label_studio_tools.core.utils.io import get_local_path

class MLBackend(LabelStudioMLBase)
def predict(tasks):
task = tasks[0]
locaL_path = get_local_path(task['data']['image'], task_id=task['id'])
with open(locaL_path, 'r') as f:
f.read()
```

The `get_local_path()` function resolves URIs to URLs, then downloads and caches the file.

For this to work, you must specify the `LABEL_STUDIO_URL` and `LABEL_STUDIO_API_KEY` environment variables for your ML backend before using `get_local_path`.

Note the following:

* `LABEL_STUDIO_URL` must be accessible from the ML backend instance.

* If you are running the ML backend in Docker, `LABEL_STUDIO_URL` can’t contain `localhost`. Use the full IP address instead. You can get this using the `ifconfig` (Unix) or `ipconfig` (Windows) commands.

* `LABEL_STUDIO_URL` must start either with `http://` or `https://`.

To find your `LABEL_STUDIO_API_KEY`, open Label Studio and go to your [user account page](user_account#Access-token).

<div class="enterprise-only">

Note that your user must also have access to the project that you are connecting to ML backend.

</div>

## Model training

Expand Down
6 changes: 6 additions & 0 deletions docs/source/guide/ml_create.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,11 @@ Other methods and parameters are available within the `LabelStudioMLBase` class:
- `self.label_interface` - Returns the Label Studio Label Interface object that contains all information about the labeling task.
- `self.model_version` - Returns the current model version.

## 4. Ensure the ML backend can access Label Studio data

If your data is stored in a cloud, local directory, or has been imported into Label Studio, you will need to set the `LABEL_STUDIO_URL` and `LABEL_STUDIO_API_KEY` environment variables.

For more information, see [Allow the ML backend to access Label Studio data](ml#Allow-the-ML-backend-to-access-Label-Studio-data).

## 5. Run the ML backend server

Expand Down Expand Up @@ -179,6 +184,7 @@ To modify the host and port, use the following command line parameters:

```bash
label-studio-ml start my_ml_backend -p 9091 --host 0.0.0.0
```

### Test your ML backend

Expand Down

0 comments on commit a1c2a00

Please sign in to comment.