add lifecycle handlers to allow extensions to hook session lifecycle #10988

tdyas · 2020-10-17T04:20:26Z

Problem

Extensions have no generic way to hook into the lifecycle of a Pants session.

Solution

Add a lifecycle_handlers entry point to extensions to allow extensions to hook session creation and start/end. This will be used in #10889 to allow the metrics reporting to manage mutable state during the lifetime of each session.

Result

Added test for life cycle registration.

tdyas · 2020-10-17T04:21:39Z

Example of how the extension was able to hook a session lifecycle:

$ ./pants test src/python/pants/base::
21:17:02.49 [INFO] initializing pantsd...
21:17:03.11 [INFO] pantsd initialized.
21:17:03.20 [INFO] on_session_create
21:17:03.20 [INFO] on_session_start
21:17:32.49 [INFO] Completed: Building pytest.pex with 6 requirements: ipdb, pygments, pytest-cov>=2.10.1,<2.11, pytest-icdiff, pytest>=6.0.1,<6.1, zipp==2.1.0
21:17:35.07 [INFO] Completed: Run tests - src/python/pants/base/run_info_test.py:tests succeeded.
21:17:35.97 [INFO] Completed: Run tests - src/python/pants/base/build_environment_test.py:tests succeeded.
21:17:36.88 [INFO] Completed: Run tests - src/python/pants/base/specs_test.py:tests succeeded.
21:17:37.65 [INFO] Completed: Run tests - src/python/pants/base/exception_sink_test.py:exception_sink_test succeeded.
21:17:38.27 [INFO] Completed: Run tests - src/python/pants/base/hash_utils_test.py:tests succeeded.
21:17:38.59 [INFO] Completed: Run tests - src/python/pants/base/build_root_test.py:tests succeeded.
21:17:39.63 [INFO] Completed: Run tests - src/python/pants/base/deprecated_test.py:tests succeeded.
21:17:41.31 [INFO] Completed: Run tests - src/python/pants/base/specs_parser_test.py:tests succeeded.
21:17:48.93 [INFO] Completed: Run tests - src/python/pants/base/exiter_integration_test.py:exiter_integration_test succeeded.
21:18:02.65 [INFO] Completed: Run tests - src/python/pants/base/exception_sink_integration_test.py:exception_sink_integration_test succeeded.

✓ src/python/pants/base/build_environment_test.py:tests succeeded.
✓ src/python/pants/base/build_root_test.py:tests succeeded.
✓ src/python/pants/base/deprecated_test.py:tests succeeded.
✓ src/python/pants/base/exception_sink_integration_test.py:exception_sink_integration_test succeeded.
✓ src/python/pants/base/exception_sink_test.py:exception_sink_test succeeded.
✓ src/python/pants/base/exiter_integration_test.py:exiter_integration_test succeeded.
✓ src/python/pants/base/hash_utils_test.py:tests succeeded.
✓ src/python/pants/base/run_info_test.py:tests succeeded.
✓ src/python/pants/base/specs_parser_test.py:tests succeeded.
✓ src/python/pants/base/specs_test.py:tests succeeded.
21:18:02.67 [INFO] on_session_end

tdyas · 2020-10-17T04:24:45Z

src/python/pants/core/register.py

+import logging
+logger = logging.getLogger(__name__)
+
+
+class CoreSessionLifecycleHandler(SessionLifecycleHandler):
+    def on_session_start(self):
+        logger.info("on_session_start")
+
+    def on_session_end(self):
+        logger.info("on_session_end")
+
+
+class CoreLifecycleHandler(ExtensionLifecycleHandler):
+    def on_session_create(self, options: Options) -> Optional[SessionLifecycleHandler]:
+        logger.info("on_session_create")
+        return CoreSessionLifecycleHandler()
+
+
+def lifecycle_handlers():
+    return [CoreLifecycleHandler()]


This change is just to provide a quick example of how a lifecycle handler would work in practice. It would be removed in the final version of the PR.

Removed this.

coveralls · 2020-10-17T04:58:49Z

Coverage remained the same at 0.0% when pulling 9c3d29c on tdyas:add_lifecycle_hooks into bf62b0a on pantsbuild:master.

[ci skip-rust] [ci skip-build-wheels]

tdyas · 2020-10-20T01:33:26Z

@Eric-Arellano: I've updated the PR as per our discussion offline to use keyword parameters on the lifecycle handlers.

Is there any good place to put an integration test of lifecycle_handlers?
Would it be better for lifecycle_handlers to return classes to instantiate instead of an instance? The thought is whether BuildConfiguration can be reused in some way.

tdyas · 2020-10-20T01:34:50Z

@asherf: You may be interested in this PR. It adds a way for Pants plugins to hook into life cycle of a Pants session. The goal in part is to have the Buildsense plugin not need to maintain mutable state on a Subsystem any more.

Eric-Arellano

Would it be better for lifecycle_handlers to return classes to instantiate instead of an instance?

Yes, I think so. Each lifecycle handler class will implement a uniform .create() method (or similar name), and Pants controls the creation of it.

Is there any good place to put an integration test of lifecycle_handlers?

Yes, adding a test to src/python/pants/init would be awesome. I think you can maybe use run_pants(): https://www.pantsbuild.org/docs/rules-api-testing#approach-4-run_pants-integration-tests-for-pants. You would set up a register.py file in the test with your simple plugin, and set --backend-packages and --pythonpath to load it, similar to this:

pants/src/python/pants/base/exception_sink_integration_test.py

Lines 27 to 28 in bf2548b

    
           f"--pythonpath=+['{testproject_backend_src_dir}']", 
        
           f"--backend-packages=+['{testproject_backend_pkg_name}']",

Eric-Arellano · 2020-10-20T20:43:06Z

src/python/pants/bin/local_pants_runner.py

@@ -55,6 +56,7 @@ class LocalPantsRunner:
    union_membership: UnionMembership
    profile_path: Optional[str]
    _run_tracker: RunTracker
+    _session_lifecycle_handlers: List[SessionLifecycleHandler]


This should probably be Tuple[SessionLifecycleHandler, ...] so that it's hashable. Even though the dataclass isn't frozen, we always bias towards immutability.

Eric-Arellano · 2020-10-20T20:46:01Z

src/python/pants/build_graph/lifecycle.py

@@ -0,0 +1,23 @@
+# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).


Should probably rename the file to lifecycle_handler.py.

Eric-Arellano · 2020-10-20T20:49:49Z

src/python/pants/build_graph/lifecycle.py

+from pants.option.options import Options
+
+
+class SessionLifecycleHandler:


If you envision multiple types of LifecycleHandlers, I recommend a LifecycleHandler superclass. It's fine if it's empty for now, perhaps subclass ABC to express intent that it's abstract. We can fill the methods in as we add more use cases and the common interface emerges from usage.

I think you can delete ExtensionLifecycleHandler. BuildConfiguration should store a bunch of LifecycleHandler, where each is a certain concrete subclass. Then, we can call isinstance(SessionLifecycleHandler) etc to determine which type of handler we're dealing with in our local_pants_runner.py code.

Eric-Arellano · 2020-10-20T20:50:28Z

src/python/pants/build_graph/lifecycle.py

+
+
+class SessionLifecycleHandler:
+    def on_session_start(self):


A basic docstring would be helpful for both of these. Even though we're not documenting on the website yet, it will help future contributors to understand what's going on.

It might be fine to call these start() and end(), as the SessionLifecycleHandler already expresses when those events happen.

That will make it easier for us to factor the methods up into the general LifecycleHandler if/when we're ready to generalize. I can imagine the interface being create(), start(), and end(), and having subclasses like SessionLifecycleHandler vs. RuleInvocationLifecycleHandler.

Eric-Arellano · 2020-10-20T20:51:18Z

src/python/pants/build_graph/lifecycle.py

+    def on_session_create(
+        self, *, build_root: str, options: Options, specs: Specs
+    ) -> Optional[SessionLifecycleHandler]:
+        pass


I think you can move this into SessionLifecycleHandler and call it create(). No longer return Optional; we already validated it's a SessionLifecycleHandler, so we know the creation should not fail.

Also it should be a classmethod, as it's a factory method to create a new instance.

Eric-Arellano · 2020-10-20T20:53:18Z

src/python/pants/bin/local_pants_runner.py

+            session_lifecycle_handler = lifecycle_handler.on_session_create(
+                build_root=build_root, options=options, specs=specs
+            )
+            if session_lifecycle_handler:
+                session_lifecycle_handlers.append(session_lifecycle_handler)


See lifecycle.py for how I recommend changing the modeling of this all. Here, you would check if instance(handler, SessionLifecycleHandler), then call handler.create().

stuhood · 2020-10-20T22:06:45Z

It feels like consumers of this API are likely to have pretty heavy overlap with

pants/src/python/pants/reporting/streaming_workunit_handler.py

Lines 36 to 95 in bf2548b

    
           class StreamingWorkunitHandler: 
        
               """StreamingWorkunitHandler's job is to periodically call each registered callback function with 
        
               the following kwargs: 
        
               workunits: Tuple[Dict[str, str], ...] - the workunit data itself 
        
               finished: bool - this will be set to True when the last chunk of workunit data is reported to the callback 
        
               """ 
        
               def __init__( 
        
                   self, 
        
                   scheduler: Any, 
        
                   callbacks: Iterable[Callable], 
        
                   report_interval_seconds: float, 
        
                   max_workunit_verbosity: LogLevel = LogLevel.TRACE, 
        
               ): 
        
                   self.scheduler = scheduler 
        
                   self.report_interval = report_interval_seconds 
        
                   self.callbacks = callbacks 
        
                   self._thread_runner: Optional[_InnerHandler] = None 
        
                   self._context = StreamingWorkunitContext(_scheduler=self.scheduler) 
        
                   # TODO(10092) The max verbosity should be a per-client setting, rather than a global setting. 
        
                   self.max_workunit_verbosity = max_workunit_verbosity 
        
               def start(self) -> None: 
        
                   if self.callbacks: 
        
                       self._thread_runner = _InnerHandler( 
        
                           scheduler=self.scheduler, 
        
                           context=self._context, 
        
                           callbacks=self.callbacks, 
        
                           report_interval=self.report_interval, 
        
                           max_workunit_verbosity=self.max_workunit_verbosity, 
        
                       ) 
        
                       self._thread_runner.start() 
        
               def end(self) -> None: 
        
                   if self._thread_runner: 
        
                       self._thread_runner.join() 
        
                   # After stopping the thread, poll workunits one last time to make sure 
        
                   # we report any workunits that were added after the last time the thread polled. 
        
                   workunits = self.scheduler.poll_workunits(self.max_workunit_verbosity) 
        
                   for callback in self.callbacks: 
        
                       callback( 
        
                           workunits=workunits["completed"], 
        
                           started_workunits=workunits["started"], 
        
                           completed_workunits=workunits["completed"], 
        
                           finished=True, 
        
                           context=self._context, 
        
                       ) 
        
               @contextmanager 
        
               def session(self) -> Iterator[None]: 
        
                   try: 
        
                       self.start() 
        
                       yield 
        
                       self.end() 
        
                   except Exception as e: 
        
                       if self._thread_runner: 
        
                           self._thread_runner.join() 
        
                       raise e

... should metrics access be a new facility of the StreamingWorkunitHandler (possibly with a rename?), or was this a conscious decision to have lots of small interfaces? If lots of small interfaces, would recommend moving them closer together (the reporting package) and aligning their APIs.

tdyas · 2020-10-21T19:30:24Z

It feels like consumers of this API are likely to have pretty heavy overlap with

pants/src/python/pants/reporting/streaming_workunit_handler.py

Lines 36 to 95 in bf2548b

class StreamingWorkunitHandler:

"""StreamingWorkunitHandler's job is to periodically call each registered callback function with

the following kwargs:

workunits: Tuple[Dict[str, str], ...] - the workunit data itself

finished: bool - this will be set to True when the last chunk of workunit data is reported to the callback

"""

def __init__(

self,

scheduler: Any,

callbacks: Iterable[Callable],

report_interval_seconds: float,

max_workunit_verbosity: LogLevel = LogLevel.TRACE,

):

self.scheduler = scheduler

self.report_interval = report_interval_seconds

self.callbacks = callbacks

self._thread_runner: Optional[_InnerHandler] = None

self._context = StreamingWorkunitContext(_scheduler=self.scheduler)

# TODO(10092) The max verbosity should be a per-client setting, rather than a global setting.

self.max_workunit_verbosity = max_workunit_verbosity

def start(self) -> None:

if self.callbacks:

self._thread_runner = _InnerHandler(

scheduler=self.scheduler,

context=self._context,

callbacks=self.callbacks,

report_interval=self.report_interval,

max_workunit_verbosity=self.max_workunit_verbosity,

)

self._thread_runner.start()

def end(self) -> None:

if self._thread_runner:

self._thread_runner.join()

# After stopping the thread, poll workunits one last time to make sure

# we report any workunits that were added after the last time the thread polled.

workunits = self.scheduler.poll_workunits(self.max_workunit_verbosity)

for callback in self.callbacks:

callback(

workunits=workunits["completed"],

started_workunits=workunits["started"],

completed_workunits=workunits["completed"],

finished=True,

context=self._context,

)

@contextmanager

def session(self) -> Iterator[None]:

try:

self.start()

yield

self.end()

except Exception as e:

if self._thread_runner:

self._thread_runner.join()

raise e

... should metrics access be a new facility of the StreamingWorkunitHandler (possibly with a rename?), or was this a conscious decision to have lots of small interfaces? If lots of small interfaces, would recommend moving them closer together (the reporting package) and aligning their APIs.

The motivation for this PR in part is that Eric mentioned that a development goal for Pants was to remove mutable state on Subsystems. StreamingWorkunitHandler currently requires that the handle_workunits hook be implemented on a global singleton. (More specifically, classes referenced as streaming work unit handlers must have a global_instance class method and then implement handle_workunits on the instance returned from that method.

pants/src/python/pants/option/subsystem.py

Line 236 in 92a2087

subsystem = subsystem_class.global_instance()

)

The initial implementation of counters hard-coded the setup of the metrics aggregation handler into LocalPantsRunner. This seemed like a code smell to me. Moreover, the aggregation code was implemented on the Subsystem for the metrics reporting which runs afoul of the goal of no mutable state on Subsystems. And the code smell remains even if that class is made to not be a Subsystem but rather a standalone class because it would still need to create a global singleton via a global_instance method.

Re "smaller interfaces", that was my intent. The better approach in my opinion was to have hooks that would be instantiated for particular phases of a Pants run. These hooks seem to me to be more general than just reporting. I actually would like to modify that workunit handler setup after this PR so that the lifecycle handlers can explicitly register for receiving workunits programmatically. (This is how I envision the counters PR actually operating without having to hard-code them into LocalPantsRunner.)

Thoughts?

stuhood · 2020-10-27T22:21:17Z

StreamingWorkunitHandler currently requires that the handle_workunits hook be implemented on a global singleton.

Got it. I didn't realize that, but we'll need to adjust that anyway to unblock #7654, so would be good to avoid having multiple APIs in the meantime. The Session object represents the run-specific state... mutability on Subsystems (like the RunTracker) is legacy: in v2, Subsystems are no longer accessed via globals in @rules.

To adjust that API, I expect that we'd want to construct an instance of StreamingWorkunitHandler per Session... which is almost what happens here:

pants/src/python/pants/bin/local_pants_runner.py

Lines 243 to 249 in 3760987

    
           streaming_handlers = global_options.streaming_workunits_handlers 
        
           callbacks = Subsystem.get_streaming_workunit_callbacks(streaming_handlers) 
        
           streaming_reporter = StreamingWorkunitHandler( 
        
               self.graph_session.scheduler_session, 
        
               callbacks=callbacks, 
        
               report_interval_seconds=global_options.streaming_workunits_report_interval, 
        
           )

(each call to LocalPantsRunner.run has a new Session created here)

stuhood · 2020-10-28T18:30:41Z

I actually would like to modify that workunit handler setup after this PR so that the lifecycle handlers can explicitly register for receiving workunits programmatically. (This is how I envision the counters PR actually operating without having to hard-code them into LocalPantsRunner.)

So it sounds like you're suggesting roughly three commits that:

adjust the interface away from classes with abstract methods, and toward (possibly) stateful functions that are called as hooks
refactor StreamingWorkunitHandler away into usage of those functions
add a facility to those hooks to collect metrics at the end of a run

? If so, that sounds reasonable. I expect that the first two commits could probably be accomplished in one PR though... this one probably?

tdyas · 2020-10-29T18:10:22Z

Closing since we decided this PR is unnecessary currently.

tdyas requested review from Eric-Arellano and gshuflin October 17, 2020 04:20

tdyas marked this pull request as draft October 17, 2020 04:20

tdyas commented Oct 17, 2020

View reviewed changes

tdyas requested a review from asherf October 17, 2020 04:26

Tom Dyas added 2 commits October 19, 2020 17:40

add lifecycle hooks

add0892

[ci skip-rust] [ci skip-build-wheels]

cleanup for review

9c3d29c

[ci skip-rust] [ci skip-build-wheels]

tdyas force-pushed the add_lifecycle_hooks branch from 2c603a0 to 9c3d29c Compare October 20, 2020 01:12

tdyas changed the title ~~[WIP] add lifecycle handlers to allow extensions to hook session lifecycle~~ add lifecycle handlers to allow extensions to hook session lifecycle Oct 20, 2020

tdyas marked this pull request as ready for review October 20, 2020 01:30

tdyas requested a review from stuhood October 20, 2020 18:20

Eric-Arellano reviewed Oct 20, 2020

View reviewed changes

tdyas mentioned this pull request Oct 27, 2020

add foundation for counter metrics #10889

Merged

tdyas closed this Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add lifecycle handlers to allow extensions to hook session lifecycle #10988

add lifecycle handlers to allow extensions to hook session lifecycle #10988

tdyas commented Oct 17, 2020 •

edited

Loading

tdyas commented Oct 17, 2020

tdyas Oct 17, 2020

tdyas Oct 20, 2020

coveralls commented Oct 17, 2020 •

edited

Loading

tdyas commented Oct 20, 2020

tdyas commented Oct 20, 2020

Eric-Arellano left a comment

Eric-Arellano Oct 20, 2020

Eric-Arellano Oct 20, 2020

Eric-Arellano Oct 20, 2020 •

edited

Loading

Eric-Arellano Oct 20, 2020

Eric-Arellano Oct 20, 2020

Eric-Arellano Oct 20, 2020

Eric-Arellano Oct 20, 2020

stuhood commented Oct 20, 2020

tdyas commented Oct 21, 2020

stuhood commented Oct 27, 2020

stuhood commented Oct 28, 2020

tdyas commented Oct 29, 2020

	f"--pythonpath=+['{testproject_backend_src_dir}']",
	f"--backend-packages=+['{testproject_backend_pkg_name}']",

		@@ -0,0 +1,23 @@
		# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).

		from pants.option.options import Options


		class SessionLifecycleHandler:

add lifecycle handlers to allow extensions to hook session lifecycle #10988

add lifecycle handlers to allow extensions to hook session lifecycle #10988

Conversation

tdyas commented Oct 17, 2020 • edited Loading

Problem

Solution

Result

tdyas commented Oct 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 17, 2020 • edited Loading

tdyas commented Oct 20, 2020

tdyas commented Oct 20, 2020

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano Oct 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented Oct 20, 2020

tdyas commented Oct 21, 2020

stuhood commented Oct 27, 2020

stuhood commented Oct 28, 2020

tdyas commented Oct 29, 2020

tdyas commented Oct 17, 2020 •

edited

Loading

coveralls commented Oct 17, 2020 •

edited

Loading

Eric-Arellano Oct 20, 2020 •

edited

Loading