-
Notifications
You must be signed in to change notification settings - Fork 40
Flow using ECSTask Infra Block remains in Pending state forever #156
Comments
can you share how did you configure your agent and ecs task infra block? this got a little too sparse than I hoped, not even prefect version showing up here 😅 can you add that? we would want to reproduce the issue first before we can fix it |
Prefect version 2.6.7 prefect-aws 0.1.8 Agent is
Then I run the command |
ECSTask block def build_and_save_ecs_task(repo_name: str):
account_number = Sts.get_caller_identity()
ecs = ECSTask(
name=repo_name,
image=f"{account_number}.dkr.ecr.ap-southeast-2.amazonaws.com/prefect-{repo_name}:latest",
cpu=512,
memory=256,
**{
"stream_output": True,
"configure_cloudwatch_logs": True,
"cluster": "prefect-cluster",
"execution_role_arn": "arn:aws:iam::XXX:role/prefect-execution-role",
"task_role_arn": "arn:aws:iam::XXX:role/prefect-task-role",
"launch_type": "FARGATE",
"vpc_id": "vpc-XXX",
"env": {"PYTHONPATH": "$PYTHONPATH:.", "AWS_RETRY_MODE": "adaptive", "AWS_MAX_ATTEMPTS": "100"},
"task_customizations": [
{"op": "replace", "path": "/networkConfiguration/awsvpcConfiguration/assignPublicIp", "value": "DISABLED"},
{
"op": "add",
"path": "/networkConfiguration/awsvpcConfiguration/subnets",
"value": ["subnet-XXX", "subnet-XXX", "subnet-XXX"],
},
{
"op": "add",
"path": "/networkConfiguration/awsvpcConfiguration/securityGroups",
"value": ["sg-XXX"],
},
],
},
)
ecs.save(name=repo_name, overwrite=True) |
@bennnym Do you have debug level agent logs? |
No I don't. I am about to wait and see on the hour what happens and I will paste the agent logs here. I think they are just info, if that is what it defaults to? |
For the record I have 8 flows scheduled to run at the hour, every hour. |
|
Of these 8 flows, 6 are now permanently in pending and 2 ran successfully |
I also get quite a few of these errors - but dont think they are related
|
Ah interesting. Why are they in |
I added an extra agent to see if that was the issue. George told me that I
can have multiple agents and queue's with the same name?
So maybe those error logs are unrelated to the issue?
…On Fri, 18 Nov 2022, 4:30 pm Michael Adkins, ***@***.***> wrote:
Ah interesting. Why are they in Pending states instead of Scheduled, the
work queue should not even be returning these. We allow Scheduled ->
Pending transitions but not Pending -> Pending as that would indicate two
agents attempted to submit the same run at once.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGEVNYMNR64MJCBD4AC253WI4H7VANCNFSM6AAAAAASECXGQQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
So I reverted back to one agent, and I can confirm that the error in
regards to PENDING was between agents. I am not able to test the primary
issue again because I have had to run my prefect 1.0 stuff again so that
our business operates over the weekend.
I might ba able to get a POC of the issue on Monday morning.
…On Fri, 18 Nov 2022 at 16:30, Michael Adkins ***@***.***> wrote:
Ah interesting. Why are they in Pending states instead of Scheduled, the
work queue should not even be returning these. We allow Scheduled ->
Pending transitions but not Pending -> Pending as that would indicate two
agents attempted to submit the same run at once.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGEVNYMNR64MJCBD4AC253WI4H7VANCNFSM6AAAAAASECXGQQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Benjamin Muller*
*m: *0415 382 245
*e: ***@***.***
|
I dug a bit deeper to see if this was just a UI issue. I can confirm it is
not a UI issue. It is continuing to happen today.
…On Fri, 18 Nov 2022 at 16:30, Michael Adkins ***@***.***> wrote:
Ah interesting. Why are they in Pending states instead of Scheduled, the
work queue should not even be returning these. We allow Scheduled ->
Pending transitions but not Pending -> Pending as that would indicate two
agents attempted to submit the same run at once.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGEVNYMNR64MJCBD4AC253WI4H7VANCNFSM6AAAAAASECXGQQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Benjamin Muller*
*m: *0415 382 245
*e: ***@***.***
|
GOOD NEWS!
I upped my agents resources as I thought maybe that could be a possible
issue... The agent was running with 512 CPU and 1024 memory.
I changed to 1024 CPU and 4096 memory.
The orchestration seems to be working again! Just ran 9 flows that were all
scheduled on the hour.
Maybe we can add something to the docs about this?
…On Fri, 18 Nov 2022, 4:30 pm Michael Adkins, ***@***.***> wrote:
Ah interesting. Why are they in Pending states instead of Scheduled, the
work queue should not even be returning these. We allow Scheduled ->
Pending transitions but not Pending -> Pending as that would indicate two
agents attempted to submit the same run at once.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGEVNYMNR64MJCBD4AC253WI4H7VANCNFSM6AAAAAASECXGQQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
btw Ben, you could leverage work queue concurrency limits here to ensure that those runs triggered simultaneously will be queued up without overwhelming the agent |
Closing the issue because we clarified the issue and added docs, thanks for all the help here, Ben |
During my migration to 2.0 I have noticed that if multiple flows are scheduled at the same time.
Eg. they all have a schedule of
0 * * * *
Only one or two of the flows will run, and the rest will remain in the
Pending
state forever.I currently have one agent and one queue setup.
I am going to explore and see if setting up multiple agents may solve the problem.
The text was updated successfully, but these errors were encountered: