Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Flow using ECSTask Infra Block remains in Pending state forever #156

Closed
bennnym opened this issue Nov 18, 2022 · 16 comments
Closed

Flow using ECSTask Infra Block remains in Pending state forever #156

bennnym opened this issue Nov 18, 2022 · 16 comments

Comments

@bennnym
Copy link

bennnym commented Nov 18, 2022

During my migration to 2.0 I have noticed that if multiple flows are scheduled at the same time.

Eg. they all have a schedule of 0 * * * *

Only one or two of the flows will run, and the rest will remain in the Pending state forever.

I currently have one agent and one queue setup.

I am going to explore and see if setting up multiple agents may solve the problem.

@anna-geller
Copy link
Contributor

can you share how did you configure your agent and ecs task infra block? this got a little too sparse than I hoped, not even prefect version showing up here 😅 can you add that? we would want to reproduce the issue first before we can fix it

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

Prefect version 2.6.7 prefect-aws 0.1.8

Agent is

FROM        prefecthq/prefect:2.6.7-python3.9


RUN         pip install prefect-aws==0.1.8

Then I run the command prefect agent start -q 'prod'

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

ECSTask block

def build_and_save_ecs_task(repo_name: str):
    account_number = Sts.get_caller_identity()

    ecs = ECSTask(
        name=repo_name,
        image=f"{account_number}.dkr.ecr.ap-southeast-2.amazonaws.com/prefect-{repo_name}:latest",
        cpu=512,
        memory=256,
        **{
            "stream_output": True,
            "configure_cloudwatch_logs": True,
            "cluster": "prefect-cluster",
            "execution_role_arn": "arn:aws:iam::XXX:role/prefect-execution-role",
            "task_role_arn": "arn:aws:iam::XXX:role/prefect-task-role",
            "launch_type": "FARGATE",
            "vpc_id": "vpc-XXX",
            "env": {"PYTHONPATH": "$PYTHONPATH:.", "AWS_RETRY_MODE": "adaptive", "AWS_MAX_ATTEMPTS": "100"},
            "task_customizations": [
                {"op": "replace", "path": "/networkConfiguration/awsvpcConfiguration/assignPublicIp", "value": "DISABLED"},
                {
                    "op": "add",
                    "path": "/networkConfiguration/awsvpcConfiguration/subnets",
                    "value": ["subnet-XXX", "subnet-XXX", "subnet-XXX"],
                },
                {
                    "op": "add",
                    "path": "/networkConfiguration/awsvpcConfiguration/securityGroups",
                    "value": ["sg-XXX"],
                },
            ],
        },
    )
    ecs.save(name=repo_name, overwrite=True)

@zanieb
Copy link
Contributor

zanieb commented Nov 18, 2022

@bennnym Do you have debug level agent logs?

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

No I don't.

I am about to wait and see on the hour what happens and I will paste the agent logs here.

I think they are just info, if that is what it defaults to?

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

For the record I have 8 flows scheduled to run at the hour, every hour.

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022



04:59:54.473 \| INFO    \| prefect.agent - Submitting flow run
--
'1c168b64-c542-4ecc-be86-fec3bf879a39'
04:59:54.474 \| INFO    \| prefect.agent - Submitting flow run
'56e37fcc-bf15-4c81-b05f-0955f1f44f58'
04:59:54.475 \| INFO    \| prefect.agent - Submitting flow run
'8fa259e7-6d51-4e61-9a4f-79f959f93783'
04:59:54.475 \| INFO    \| prefect.agent - Submitting flow run
'96242353-061f-43b0-927e-fe693b539ac5'
04:59:54.476 \| INFO    \| prefect.agent - Submitting flow run
'a98eeec6-d794-4084-b7ed-a2f2462d1be2'
04:59:54.476 \| INFO    \| prefect.agent - Submitting flow run
'ab9d2ee5-6132-417c-8575-48746bac4a7b'
04:59:54.477 \| INFO    \| prefect.agent - Submitting flow run
'bb535ea6-ec31-474e-b0d2-b0d1d4ea50f6'
04:59:54.478 \| INFO    \| prefect.agent - Submitting flow run
'c874d6e6-68e6-419a-bd32-14db60def23c'
05:00:00.013 \| INFO    \| prefect.agent - Aborted submission of flow run
'a98eeec6-d794-4084-b7ed-a2f2462d1be2'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.016 \| INFO    \| prefect.agent - Aborted submission of flow run
'c874d6e6-68e6-419a-bd32-14db60def23c'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.017 \| INFO    \| prefect.agent - Aborted submission of flow run
'1c168b64-c542-4ecc-be86-fec3bf879a39'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.027 \| INFO    \| prefect.agent - Aborted submission of flow run
'ab9d2ee5-6132-417c-8575-48746bac4a7b'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.031 \| INFO    \| prefect.agent - Aborted submission of flow run
'56e37fcc-bf15-4c81-b05f-0955f1f44f58'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.055 \| INFO    \| prefect.agent - Aborted submission of flow run
'96242353-061f-43b0-927e-fe693b539ac5'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.090 \| INFO    \| prefect.agent - Aborted submission of flow run
'bb535ea6-ec31-474e-b0d2-b0d1d4ea50f6'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:00.235 \| INFO    \| prefect.agent - Aborted submission of flow run
'8fa259e7-6d51-4e61-9a4f-79f959f93783'. Server sent an abort signal: This run
cannot transition to the PENDING state from the PENDING state.
05:00:53.069 \| INFO    \| prefect.agent - Submitting flow run
'38a2d94b-81f5-4689-9e08-a4025075221a'


@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

Of these 8 flows, 6 are now permanently in pending and 2 ran successfully

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022

I also get quite a few of these errors - but dont think they are related



05:10:41.627 \| ERROR   \| prefect.agent - An error occured while monitoring flow
--
run '5e1eface-72b5-4040-9be3-fe8c2c1b585f'. The flow run will not be marked as
failed, but an issue may have occurred.

@zanieb
Copy link
Contributor

zanieb commented Nov 18, 2022

Ah interesting. Why are they in Pending states instead of Scheduled, the work queue should not even be returning these. We allow Scheduled -> Pending transitions but not Pending -> Pending as that would indicate two agents attempted to submit the same run at once.

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022 via email

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022 via email

@bennnym
Copy link
Author

bennnym commented Nov 18, 2022 via email

@bennnym
Copy link
Author

bennnym commented Nov 19, 2022 via email

@anna-geller
Copy link
Contributor

btw Ben, you could leverage work queue concurrency limits here to ensure that those runs triggered simultaneously will be queued up without overwhelming the agent

@anna-geller
Copy link
Contributor

Closing the issue because we clarified the issue and added docs, thanks for all the help here, Ben

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants