Skip to content

Commit

Permalink
Additional tweaks to task retry docs (PrefectHQ#9575)
Browse files Browse the repository at this point in the history
  • Loading branch information
abrookins committed May 23, 2023
1 parent 9c6adc8 commit 4c56a9a
Showing 1 changed file with 49 additions and 12 deletions.
61 changes: 49 additions & 12 deletions docs/concepts/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,29 +204,61 @@ def my_flow():

## Retries

Prefect tasks can automatically retry on failure. To enable retries, pass `retries` and `retry_delay_seconds` parameters to your task.
Prefect can automatically retry tasks on failure. In Prefect, a task _fails_ if
its Python function raises an exception.

For example, let's say you need to retrieve data from a brittle API:
To enable retries, pass `retries` and `retry_delay_seconds` parameters to your
task. If the task fails, Prefect will retry it up to `retries` times, waiting
`retry_delay_seconds` seconds between each attempt. If the task fails on the
final retry, Prefect marks the task as _crashed_ if the task raised an exception
or _failed_ if it returned a string.

!!! note "Retries don't create new task runs"
A new task run is not created when a task is retried. A new state is added to the state history of the original task run.


### A real-world example: making an API request

Consider the real-world problem of making an API request. In this example,
we'll use the [`httpx`](https://www.python-httpx.org/) library to make an HTTP
request.

```python hl_lines="4"
import requests
import httpx
from prefect import task, flow

from prefect import flow, task


@task(retries=2, retry_delay_seconds=5)
def get_data(
def get_data_task(
url: str = "https://api.brittle-service.com/endpoint"
) -> dict:
response = httpx.get(url)

# If the response status code is anything but a 2xx, httpx will raise
# an exception. This task doesn't handle the exception, so Prefect will
# catch the exception and will consider the task run failed.
response.raise_for_status()

return response.json()


@flow
def get_data_flow():
get_data_task()
```

If your task gets a bad response, `get_data` will automatically retry twice, waiting 5 seconds in between retries.
In this task, if the HTTP request to the brittle API receives any status code
other than a 2xx (200, 201, etc.), Prefect will retry the task a maximum of two
times, waiting five seconds in between retries.

### Custom retry behavior

The `retry_delay_seconds` option accepts a list of delays for more custom retry behavior. The following task will wait for successively increasing intervals of 1, 10, and 100 seconds, respectively, before the next attempt starts:

```python
from prefect import task, flow
from prefect import task

@task(retries=3, retry_delay_seconds=[1, 10, 100])
def some_task_with_manual_backoff_retries():
Expand All @@ -236,18 +268,24 @@ def some_task_with_manual_backoff_retries():
Additionally, you can pass a callable that accepts the number of retries as an argument and returns a list. Prefect includes an [`exponential_backoff`](/api-ref/prefect/tasks/#prefect.tasks.exponential_backoff) utility that will automatically generate a list of retry delays that correspond to an exponential backoff retry strategy. The following flow will wait for 10, 20, then 40 seconds before each retry.

```python
from prefect import task, flow
from prefect import task
from prefect.tasks import exponential_backoff

@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=10))
def some_task_with_exponential_backoff_retries():
...
```

While using exponential backoff you may also want to jitter the delay times to prevent "thundering herd" scenarios, where many tasks all retry at exactly the same time, causing cascading failures. The `retry_jitter_factor` option can be used to add variance to the base delay. For example, a retry delay of 10 seconds with a `retry_jitter_factor` of 0.5 will be allowed to delay up to 15 seconds. Large values of `retry_jitter_factor` provide more protection against "thundering herds", while keeping the average retry delay time constant. For example, the following task adds jitter to its exponential backoff so the retry delays will vary up to a maximum delay time of 20, 40, and 80 seconds respectively.
#### Advanced topic: adding "jitter"

While using exponential backoff, you may also want to add _jitter_ to the delay times. Jitter is
a random amount of time added to retry periods that helps prevent "thundering herd" scenarios, which
is when many tasks all retry at the exact same time, potentially overwhelming systems.

The `retry_jitter_factor` option can be used to add variance to the base delay. For example, a retry delay of 10 seconds with a `retry_jitter_factor` of 0.5 will be allowed to delay up to 15 seconds. Large values of `retry_jitter_factor` provide more protection against "thundering herds," while keeping the average retry delay time constant. For example, the following task adds jitter to its exponential backoff so the retry delays will vary up to a maximum delay time of 20, 40, and 80 seconds respectively.

```python
from prefect import task, flow
from prefect import task
from prefect.tasks import exponential_backoff

@task(
Expand All @@ -259,6 +297,8 @@ def some_task_with_exponential_backoff_retries():
...
```

### Configuring retry behavior globally with settings

You can also set retries and retry delays by using the following global settings. These settings will not override the `retries` or `retry_delay_seconds` that are set in the flow or task decorator.

```
Expand All @@ -268,9 +308,6 @@ prefect config set PREFECT_FLOW_DEFAULT_RETRY_DELAY_SECONDS = [1, 10, 100]
prefect config set PREFECT_TASK_DEFAULT_RETRY_DELAY_SECONDS = [1, 10, 100]
```

!!! note "Retries don't create new task runs"
A new task run is not created when a task is retried. A new state is added to the state history of the original task run.

## Caching

Caching refers to the ability of a task run to reflect a finished state without actually running the code that defines the task. This allows you to efficiently reuse results of tasks that may be expensive to run with every flow run, or reuse cached results if the inputs to a task have not changed.
Expand Down

0 comments on commit 4c56a9a

Please sign in to comment.