Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensuring determinism with periodics #34

Open
joshburkart opened this issue Aug 15, 2024 · 9 comments
Open

Ensuring determinism with periodics #34

joshburkart opened this issue Aug 15, 2024 · 9 comments
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@joshburkart
Copy link

I'm wondering about determinism in the following simplified situation: Say we have two models, A and B, each with an execute method that we schedule to run periodically at the same periodicity and no phase offset. Thus for each cycle, A::execute and B::execute will run at the same simulated time. Say further that the execute method for A immediately sends an event to B, which modifies B's state, affecting what B's execute method will do. As a result, without a way of specifying which periodic executes first each cycle, there appears to be a race condition possible where the outcome of the simulation isn't fully determined/predictable.

Am I understanding this right? Are there ordering guarantees for periodics? Is there a way to ensure determinism, other than something hacky like adding tiny phase offsets? Would appreciate any context that can be provided. Thanks so much!

@sbarral sbarral added the question Further information is requested label Aug 16, 2024
@sbarral
Copy link
Member

sbarral commented Aug 16, 2024

A subtle topic indeed!

TL;DR: it is not possible to ensure determinism with schedule_periodic_event, but if you are willing to live on the bleeding edge this is possible with the development branch (main) using the schedule method and periodic Actions.

In general, the scheduler attempts to strike a balance between efficiency and determinism. The minimum guarantee when it comes to determinism is that the execution order must be preserved within messages addressed to the same model. To achieve this, when events are polled from the scheduler queue (which preserves scheduling order), all events targeting the same model are fused into a single sequential Future and this fused Future is spawned onto the executor. However, events targeting different models will end up in different Futures so even though they are spawned in order, there is no telling in which relative order the multi-threaded executor will run them.

In the development branch, EventSources were introduced which are basically free-standing Outputs that can schedule Actions that broadcast the same message (possibly periodically) to many models. But since several Actions scheduled for the same timestamp may have one or mode target models in common, in order to preserve the execution order within messages to the same model, the scheduler needs to be conservative and fuse all Actions in a single sequential future.

You can take advantage of that by scheduling two periodic Actions using two EventSource s, one targetting only model A and one targetting only model B, relying on the fact that the scheduler will fuse the Actions into a sequential future. While this is not documented behavior, I don't see that changing in the future. Note that you cannot use a single EventSource that broadcasts to A and B because for efficiency reasons, all broadcast operations resolve futures in parallel so ordering is no guaranteed (similarly to futures::stream::FuturesOrdered).

The scope for version 0.3 keeps on expanding so I certainly won't consider this for the next version, but some simulators have a concept of super-dense time where the timestamp has an additional integer part that make it possible to decompose a time step into several sub-steps. AFAIK this concept does not exist in the ECSS SMP standard which suggests it might be a bit niche, but if you feel the need for that at some point, feel free to open a specific issue.

@sbarral
Copy link
Member

sbarral commented Sep 8, 2024

This issue prompted further internal discussions and we came to the conclusion that the current ordering guaranties for scheduling are not fully consistent with the ordering guaranties for non-scheduled messages (those sent from Output or Requestor ports).

What we should have done in retrospect is to guaranty ordered sequential execution for same-time events based on the model that scheduled them rather than based on the models targetted by such events. Doing this would provide exactly the same kind of determinism as for regular messages: if model A schedules message M1 to model B and then schedules message M2 to model C which upon receipt of M2 sends message M3 to model B, it is guaranteed that B will process M1 before M3.

In this context it makes sense to treat the main scheduler as a model, which would give exactly the determinism you expected, whether an event or an action is used.

This is thus added to the v0.3 roadmap, but it shouldn't be too difficult to implement this behavior in the 0.2 branch and it is very unlikely to break existing user code. Let me know if you need it now, we could try to make a point release.

@sbarral sbarral added the enhancement New feature or request label Sep 8, 2024
@sbarral sbarral added this to the v0.3 milestone Sep 8, 2024
@joshburkart
Copy link
Author

That makes sense -- thanks so much! I don't need it urgently but definitely a desirable feature in the next 3 months or so for me.

@joshburkart
Copy link
Author

I had a follow-up question: after the changes you described are implemented, under what conditions will an Asynchronix simulation be fully deterministic? Will it be sufficient for all constituent models to run deterministically and possess no shared state?

@sbarral
Copy link
Member

sbarral commented Sep 12, 2024

Short answer: the only way to ensure full determinism is to run the simulation on a single thread (SimInit::with_num_threads(1)). This is because it is basically impossible to offer full determinism without loosing all the advantages of parallel execution.

Full determinism can be important for debugging, but otherwise if the good behavior of a simulation requires more guaranties than we provide I would be a bit worried that there isn't enough isolation in the specific implementation of the models, resulting in a fragile setup. For instance if model A broadcasts a message to B and C , both of which immediately send a message to model D, then expecting a specific ordering of arrival at D is IMO relying a bit too much on implementation details.
There are admittedly more nuanced situations, for instance if A sends these messages one after another via different Output ports. In this case sequential ordering can only be enforced by changing the Outputs to Requestors ports (with return type ()), as these will always wait for the response before proceeding, even if the response has the unit type. But even in such case, I would worry that this might be a symptom of a fragile bench design.

If you are mostly worried about debugging though, then apart form enforcing determinism with single-threaded execution it will be possible to take advantage of tracing support in the main branch tomorrow or Monday latest.

@joshburkart
Copy link
Author

Ah, I'm mostly interested in determinism for debugging and reproducibility (e.g. wanting to be able to cache test results in CI sim runs), not to ensure correctness -- agree it would seem quite fragile to rely on particular e.g. execution orderings for correctness.

Hmm, say I have a simulation with models A, B, and C, each of which possesses isolated state and no back-channel communication, and further say B and C each have a non-async replier port. If A calls futures::join! over two requests to B and C, then the replier ports will run in parallel, but the simulation will nonetheless be fully deterministic, right? So I'm not sure using a single thread is necessary for determinism in all cases? Am I misunderstanding something?

@sbarral
Copy link
Member

sbarral commented Sep 12, 2024

I don't see anything that could cause a non-deterministic outcome in your example, which makes me worry that I may have in turn misunderstood the situation you are describing... Just to be sure: B and C don't communicate in that example?

But in broad terms, a non-deterministic outcome with the parallel executor would be rare and usually symptomatic of a flawed bench design; I just meant to say that it is theoretically possible.

@joshburkart
Copy link
Author

Right, they don't communicate. I was trying to provide a parallel example that is nonetheless guaranteed to be deterministic -- doesn't this run contrary to what you said earlier?

Short answer: the only way to ensure full determinism is to run the simulation on a single thread (SimInit::with_num_threads(1)).

Maybe by "full" you mean "across all possible simulation benches"? What I'm interested in is what characteristics a particular simulation bench must possess in order to ensure that it will always execute deterministically regardless of parallelism. For me it seems useful to publish a sufficient condition for this, as general as possible, since determinism is often a desirable property, and since it is not in general guaranteed (if I understand correctly).

I guess in other words I'm looking for a precise definition of what you mean by "flawed". For example, although not very general, would this be correct?

Simulation benches will execute deterministically under the parallel executor if (but not only if):

  1. All constituent models' methods execute deterministically (e.g. no IO is performed).
  2. There is no shared state or backchannel communication between models.
  3. There are no message chains used, i.e. replier ports and input ports that aren't directly scheduled at init time with periodics never in turn send/schedule messages (guaranteed if all replier ports and inputs ports not scheduled with periodics are not async).

In my use case, I have been planning to construct a limited, tailored wrapper around Asynchronix that strongly encourages simulations to be deterministic (among other things), so publishing guidance like this would be helpful in informing my design.

@sbarral
Copy link
Member

sbarral commented Sep 12, 2024

Maybe by "full" you mean "across all possible simulation benches"

Yes, this is what I meant, sorry if this was confusing.

In general, even without performing I/O or using shared state, it is always theoretically possible for some bench topologies to get a different outcome with parallel execution because the operating system might randomly preempt the execution of worker threads. Since the executor uses a work-stealing strategy and each worker thread picks any task that is ready to be executed, this can result in different orders of execution.

A very useful thought experiment to determine if there may be non-determinism is to consider that the executor is in fact fully deterministic (!) and message queues are ordered and latency-free, but that the time it takes for models to process an event is unknown. If this unknown time plays a role in the outcome, then this is a source of non-determinism. So the fundamental root of non-determinism is in the bench topology, and things like I/O are simply other elements beside preemption that can trigger the latent non-determinism.

Regarding your point 3) above, note that there is no difference in this respect between async and regular input/replier ports: an async function is of course more likely to be suspended and take an unpredictable time to process, but strictly speaking a regular function also takes a non-zero time to execute and can be preempted by the OS, which would affect the order in which the multi-threaded executor executes tasks.

Example of deterministic scenario: A sends an event M1 to B and then to an event M2 to C which after processing M2 sends an event M3 to B, then notwithstanding how long it takes for C to process M2, message M1 will always end up before M3 in the queue of model B.

Example of non-deterministic scenario: A sends events (not requests) M1 and M2 to B and C, and B and C subsequently send events M3 and M4 to model D after they have processed M1 and M2; in such case the relative order of M3 and M4 will depend on the relative processing times of models B and C.

In my experience working with simulators from large space integrators, even though they were technically single-threaded and thus deterministic, we would anyway need to make sure that the above rules were upheld because the order in which models would be scheduled was an implementation detail on which we couldn't rely (interestingly, even events sent sequentially within one function were not necessarily scheduled in that sequential order...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants