Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No facility for orderly stack shutdown #7190

Open
msandstedt opened this issue May 27, 2021 · 0 comments
Open

No facility for orderly stack shutdown #7190

msandstedt opened this issue May 27, 2021 · 0 comments

Comments

@msandstedt
Copy link
Contributor

Problem

The stack does not provide a facility for orderly shutdown. In the POSIX platform, shutdown is approximately:

  • atomically set a flag to tell event loop to stop
  • pthread_join event loop thread
  • then shutdown objects
  • then delete objects

The pthread_join does give thread safety, but does not actually provide an orderly shutdown. Yes, by stopping the event loop, all in-flight operations are effectively 'paused' and object deletion can probably proceed without risk of subsequent unsafe references to freed objects. However, no in-flight operations are actually completed. The problems with this are two-fold:

  • The stack is afforded no opportunity for cleanup or state persistence:
    • Critical async operations pending? These evaporate.
    • Have an exchange that should advance persisted counters? These are gone.
    • Device just commissioned? May be forgotten on next boot.
  • The stack cannot support disjoint lifecycles for stateful non-singleton objects:
    • Instantiating two DeviceController objects and shutting them down at different times would be impossible.
    • All objects can only be Shutdown / deleted after the PlatformMgr() singleton event loop is stopped.
    • This effectively propagates the PlatformMgr() singleton pattern to all stateful objects in the stack.

Proposed Solution

The solution is similar to that proposed in #6931, but more general:

  • All stateful objects executing in the event loop require a Shutdown interface that posts a flag to the object in the event loop.
  • Objects can then drain themselves of in-flight transactions and complete final bookkeeping.
  • Then and only then should it be permissible to delete these objects.
  • This must be possible and safe without the pthread_join to the PlatformMgr() singleton event loop thread.
  • The need to pthread_join first means by definition that Shutdown behavior is not correct.
    • The need for this only arises when objects are being deleted without completing their pending operations.
    • Doing that always presents a risk for corrupting state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants