Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrefectHTTPStatusError: Client error '429 Too Many Requests' for url #9723

Closed
4 tasks done
BitTheByte opened this issue May 25, 2023 · 11 comments · Fixed by #9724
Closed
4 tasks done

PrefectHTTPStatusError: Client error '429 Too Many Requests' for url #9723

BitTheByte opened this issue May 25, 2023 · 11 comments · Fixed by #9724
Labels
bug Something isn't working

Comments

@BitTheByte
Copy link
Contributor

BitTheByte commented May 25, 2023

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

While using prefect with prefect-dask I encountered a rate limit error. this shouldn't be happening as prefect client base should retry on those. I'm not sure why this is happening but this has risen at 2.10.10 and did not exist before

Reproduction

Any Flow with prefect-dask

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/distributed/client.py", line 1697, in _close
    await self.scheduler_comm.close()
asyncio.exceptions.CancelledError
01:00:08.452 | ERROR   | Flow run 'psi5-alastria-x' - Crash detected! Execution was interrupted by an unexpected exception: PrefectHTTPStatusError: Client error '429 Too Many Requests' for url 'https://cloud-url/task_runs/'
Response: {'detail': 'Orchestration API rate limit reached'}
For more information check: https://httpstatuses.com/429

Versions

Version:             2.10.10
API version:         0.8.4
Python version:      3.11.2
Git commit:          8159450b
Built:               Thu, May 18, 2023 3:43 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         server

Additional context

No response

@BitTheByte BitTheByte added bug Something isn't working status:triage labels May 25, 2023
@BitTheByte
Copy link
Contributor Author

BitTheByte commented May 25, 2023

For some reason, prefect cloud is not using Prefect base client ?

https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client/cloud.py#L59

@ghost
Copy link

ghost commented May 25, 2023

We've also encountered this issue today with many of our flows failing. I realize now, thanks to @BitTheByte 's comments, that we inadvertently upgraded to prefect 2.10.10

@BitTheByte
Copy link
Contributor Author

Turns out this was the problem, I've updated our pipelines to use my patched version and everything seems fine

@zanieb
Copy link
Contributor

zanieb commented May 25, 2023

Hm I'm a bit confused — the Cloud client is only used for authentication. The normal client is used for all orchestration behavior including the task_runs route you shared logs for. It seems unlikely that the lack of retries on the Cloud client is the issue. It seems more likely that you encountered actual rate limits? We'll only retry so many times on 429 and we having been making adjustments to rate limit behavior in the backend to address some abusive behavior.

@BitTheByte
Copy link
Contributor Author

BitTheByte commented May 25, 2023

Hey @madkinsz I tested with the code a bit before submitting the PR. For some reason prefect 2.10.10 is using the cloud client during orchestration. - This is only if the flow ran as prefect.engine -m style I believe -

@zanieb
Copy link
Contributor

zanieb commented May 25, 2023

@BitTheByte that doesn't make sense to me; a search of Cloud client usage does not show it being used in the engine. Can you demonstrate that the Cloud client is actually being used?

@BitTheByte
Copy link
Contributor Author

BitTheByte commented May 25, 2023

I totally believe it doesn't make any sense as the logic of the client is in PrefectClient. but prefect 2.10.8 works and 2.10.10 doesn't and this weird change fixes it, why? I have no idea.

@BitTheByte
Copy link
Contributor Author

@madkinsz Please reopen this, issue still happening. you were right I was encountering a rate limit but shouldn't the client sleep til the retry-after is finished? I also suggest allowing users to change the max retries value https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client/base.py#L167 since the default 5 doesn't work properly

@zanieb
Copy link
Contributor

zanieb commented May 25, 2023

@BitTheByte we will retry up to 5 times by default with an exponential back-off. You can adjust the retry count following #9735.

If you want higher rate limits, please contact our sales team. We need to enforce rate limits to ensure that Cloud is usable by everyone.

@sarahmk125
Copy link
Contributor

@BitTheByte I encourage you to book a time to chat with our product advocate team if you haven't already, we'd like to understand your use case better:
https://calendly.com/prefect-experts/rubber-duck

@BitTheByte
Copy link
Contributor Author

Hey, @sarahmk125 Thanks for your interest, I'm using Prefect in I would say a large scale ~500 machine so I'm always hitting the extreme limits. I also believe that I may caused some headaches for the backend folks 😃. Surely I'd love to book a time with the team to discuss this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants