Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve task dependencies to futures passed as parameters in .map #6701

Merged
merged 2 commits into from
Sep 12, 2022

Conversation

robalar
Copy link
Contributor

@robalar robalar commented Sep 3, 2022

Preserves relationships between a mapped task and its parameters if that parameter is a future, closes #6686.

Example

from prefect import task, flow, get_run_logger


@task
def log_task(x: int):
    logger = get_run_logger()
    logger.info(x)


@task
def flat_numbers() -> list[int]:
    return [0, 1, 2, 3, 4]


@flow
def test_flow():
    x = flat_numbers.submit()
    log_task.map(x)

Will now correctly produce this graph:

Screenshot 2022-09-03 at 3 44 34 pm

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • This pull request includes tests or only affects documentation.
  • This pull request includes a fix, feature, enhancement, docs, or maintenance label categorizing the change.

@netlify
Copy link

netlify bot commented Sep 3, 2022

Deploy Preview for prefect-orion failed.

Built without sensitive environment variables

Name Link
🔨 Latest commit 835fd04
🔍 Latest deploy log https://app.netlify.com/sites/prefect-orion/deploys/631a3bf88806ba0008148a3a

src/prefect/engine.py Outdated Show resolved Hide resolved
@robalar
Copy link
Contributor Author

robalar commented Sep 7, 2022

@madkinsz Have pushed a commit that passes all the test cases. It obviously uses wait_for but feel like it should extend when we start persisting the keys too.

I don't love that I had to add a special case to visit_collection but could not find any other way to recuse into resolved values and avoid infinite recursion.

I also found some tests in the same file that test relationships but only via values returned, I believe my tests would sit better next to them. As a side note, have you considered breaking up this file? It's really quite large and hard to orient/find things in.

@zanieb
Copy link
Contributor

zanieb commented Sep 7, 2022

Hi again! visit_collection actually has a max_depth field. We just need to pass it through.

Here's a commit that:

  • Adds max depth to task input collection and resolution
  • Separates collection from resolution to avoid introducing new resolve future type
  • Uses max depth to avoid resolving futures below the top level
  • Adds extra_task_inputs passing through the engine so we do not have to use wait_for.

See 00d8647

I did not remove any of the functions you introduced and there may be some additional polish here. Are you interested in cherry-picking that commit into here and cleaning things up?

I also found some tests in the same file that test relationships but only via values returned, I believe my tests would sit better next to them.

I'd prefer to keep all the mapping tests separate from those other inputs tests for now since they're a special case. If you want to create a TestTaskMapInputs section that's fine with me.

As a side note, have you considered breaking up this file? It's really quite large and hard to orient/find things in.

We haven't yet! It seems reasonable in the long run but I mostly just navigate by class. We don't have strong norms about using classes vs files. I think I'd prefer files as we grow the test suite though.

@robalar
Copy link
Contributor Author

robalar commented Sep 7, 2022

Ah, that makes so much sense! The nesting was giving me a massive headache, but it didn't need to be! Feels obvious looking at it..... 😄

I did not remove any of the functions you introduced and there may be some additional polish here. Are you interested in cherry-picking that commit into here and cleaning things up?

Yes very happy to tidy up!

I'd prefer to keep all the mapping tests separate from those other inputs tests for now since they're a special case. If you want to create a TestTaskMapInputs section that's fine with me.

Noted, will split into its own section.

@robalar robalar force-pushed the bugfix/map-future-relationship branch from 9f0b164 to 0b4ec68 Compare September 8, 2022 15:11
@robalar robalar requested review from zanieb and removed request for cicdw September 8, 2022 15:12
@zanieb zanieb added the fix A fix for a bug in an existing feature label Sep 8, 2022
@zanieb
Copy link
Contributor

zanieb commented Sep 8, 2022

Excellent work! Since I wrote some of this I'm going to request @bunchesofdonald as a reviewer. This looks good to me though.

@robalar
Copy link
Contributor Author

robalar commented Sep 8, 2022

@madkinsz thank you for your guidance on this! Not sure I can claim to have done too much of it, but glad to have helped it along!

Copy link
Contributor

@bunchesofdonald bunchesofdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great! That test suite is fantastic.

@@ -795,15 +801,14 @@ async def begin_task_map(
wait_for=wait_for,
return_type=return_type,
task_runner=task_runner,
extra_task_inputs=task_inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not excited about extra_task_inputs, but I also can't think of a better approach at the moment. It may point at a possible need to refactor the way we validate the parameters to a map call or the way we gather at task inputs at some point.

Copy link
Contributor

@zanieb zanieb Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah me neither. I think ideally we have a way to track lengths on futures and create indexed children of futures. When the future has no known length, we resolve it to determine the length for mapping then access its indices (which creates child futures) that we pass downstream. That way, our existing tracking will work.

That said, users want to create dependencies between tasks in their graph without necessarily waiting for them to finish. extra_task_inputs may be useful for user-defined relationships that do not have runtime implications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @WillRaphaelson for reference

tests/test_tasks.py Outdated Show resolved Hide resolved
@robalar robalar force-pushed the bugfix/map-future-relationship branch from 76b0058 to 835fd04 Compare September 8, 2022 19:01
@robalar
Copy link
Contributor Author

robalar commented Sep 12, 2022

@madkinsz do you require any more action from me to get this merged? all tests are passing, netlify seems to be failing due to authentication issues to a private repo.

@zanieb
Copy link
Contributor

zanieb commented Sep 12, 2022

See #6728 for some notes on that failure, it's expected right now. Thanks a bunch for contributing!

@zanieb zanieb merged commit f8935be into PrefectHQ:main Sep 12, 2022
@robalar robalar deleted the bugfix/map-future-relationship branch September 14, 2022 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A fix for a bug in an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When a task is mapped over a list of iterables the task hierarchy is not correctly displayed.
3 participants