Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE_ES_ScanAppTable() possible race conditions #295

Closed
skliper opened this issue Sep 30, 2019 · 3 comments · Fixed by #598
Closed

CFE_ES_ScanAppTable() possible race conditions #295

skliper opened this issue Sep 30, 2019 · 3 comments · Fixed by #598
Assignees
Labels
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

The CFE_ES_ScanAppTable() is called from the ES message processing thread, but it does not lock the global ES data structure when reading/writing from the global data.

ES software bus command messages are safe because this function is called by the same thread that is processing those messages, therefore concurrency is not possible here.

But other functions, like CFE_ES_DeleteApp() and CFE_ES_ReloadApp() are part of the public API and these also update the same fields within the app state data structure that CFE_ES_ScanAppTable() is reading. CFE_ES_RunLoop() can also modify fields within this structure and this is called by pretty much every app in the system.

This issue was noted while reviewing the fix for a similar issue in #279.

@skliper skliper self-assigned this Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 264. Created by jphickey on 2019-03-25T17:01:55, last modified: 2019-06-24T11:44:20

@skliper skliper added the bug label Sep 30, 2019
@skliper skliper removed their assignment Sep 30, 2019
@skliper skliper added this to the 6.8.0 milestone Nov 4, 2019
@ghost
Copy link

ghost commented Mar 27, 2020

CFE_ES_RunLoop, CFE_ES_DeleteApp, and CFE_ES_ReloadApp all lock ES shared data already.
It may be possible just to guard the call to CFE_ES_ScanAppTable in the ES main task.
Before doing that, it is probably worth a more thorough analysis of how this data is locked in these various calls. We don't want to introduce a deadlock by just locking/unlocking data around the CFE_ES_ScanAppTable when ES calls it.

@jphickey
Copy link
Contributor

I've looked into this already and I have a proposal/plan which is mostly implemented but just needs to be tested on all the various platforms.

The proposal is to introduce a more consistent notion of a "background job" in ES. We have several of these scattered about, including the Performance Log dump (separate ticket #456) and they are all implemented differently and inconsistently.

Worth noting that ScanAppTable has other issues besides the locking, in particular the timers are based on "scan intervals" which are not consistent, as they happen after every message (incl. SCH "send tlm") as well as after a period of message inactivity.

At any rate, my proposal is to have ES start a single, low-priority thread for all of these background jobs. The jobs will be implemented as state machines which perform some (limited) work on each invocation. The prototype will be:

bool (*RunFunc)(uint32 ElapsedTime, void *Arg);

The ElapsedTime parameter is the amount of time elapsed (based on OS clock) since the last time the job was executed, Arg is a generic/opaque state structure. The function returns false if the job is idle (no work to do) or true if it is actively doing something.

Both the periodic ScanAppTable and the Performance Log file dump fit nicely into this model, as well as (probably) any other longer-running jobs that occur in ES.

jphickey added a commit to jphickey/cFE that referenced this issue Apr 9, 2020
Create a new background job to handle the maintenance tasks
that had been performed in the ES main task as part of the
CFE_ES_ScanAppTable() routine.

All app state changes, including those invoked by messages,
are now handled by this job.

This also slightly changes the semantics of CFE_ES_RunLoop and
CFE_ES_ExitApp.  Now, the CFE_ES_RunLoop routine no longer requires
a RunStatus buffer.  Instead, the only thing that matters is the
RunStatus value that is eventually passed to CFE_ES_ExitApp after
the shutdown is complete.  This should be mostly backward compatible,
as the recommended app pattern would pass the same value to
both functions.

This commit also fixes nasa#480, as the value passed to CFE_ES_ExitApp
will not override a request that was already pending.
astrogeco added a commit that referenced this issue Apr 21, 2020
Fix #295, Resolve app table scanning race conditions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants