Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerization of nos3 #35

Closed
PeteBlanchard opened this issue Jun 3, 2020 · 9 comments
Closed

Dockerization of nos3 #35

PeteBlanchard opened this issue Jun 3, 2020 · 9 comments

Comments

@PeteBlanchard
Copy link

Hello,

A couple colleagues and I were trying to make a version of nos3 runnable from a docker container...I know, not really following the 1 app per container guidance, but we thought it would be cool to try to simplify a 'run' environment.

Of course, we started 42 in a non-ui mode...most things build and seem to run okay, except for the core flight software...when we try to start that up in a container one of a few things happen:

  • if we run as 'root' user (since that is the default in a container we started with that), we receive (forgive some additional debugging messages I added):
CFE_PSP: Default CPU Name: linux
CFE_PSP: Starting the cFE with a POWER ON reset.
CFE_PSP: Clearing out CFE CDS Shared memory segment.
CFE_PSP: Clearing out CFE Reset Shared memory segment.
CFE_PSP: Clearing out CFE User Reserved Shared memory segment.
initializing nos engine link...
2050-154-18:22:38.51195 POWER ON RESET due to Power Cycle (Power Cycle).
2050-154-18:22:38.51199 ES Startup: CFE_ES_Main in EARLY_INIT state
CFE_PSP: CFE_PSP_AttachExceptions Called
2050-154-18:22:38.51202 ES Startup: CFE_ES_Main entering CORE_STARTUP state
2050-154-18:22:38.51202 ES Startup: Starting Object Creation calls.
2050-154-18:22:38.51202 ES Startup: Calling CFE_ES_CDSEarlyInit
2050-154-18:22:38.51214 ES Startup: Calling CFE_EVS_EarlyInit
2050-154-18:22:38.51228 Event Log cleared following power-on reset
2050-154-18:22:38.51229 ES Startup: Calling CFE_SB_EarlyInit
2050-154-18:22:38.51234 ES Startup: Calling CFE_TIME_EarlyInit
2000-012-14:03:20.00000 ES Startup: Calling CFE_TBL_EarlyInit
2000-012-14:03:20.00017 ES Startup: Calling CFE_FS_EarlyInit
2000-012-14:03:20.00037 OS_TaskCreate[posix]: Checking if user (0) is root (0)
2000-012-14:03:20.00044 OS_TaskCreate[posix]: Calling pthread_attr_setinheritsched
2000-012-14:03:20.00045 OS_TaskCreate[posix]: Calling pthread_attr_setstacksize
2000-012-14:03:20.00046 OS_TaskCreate[posix]: Calling pthread_attr_setschedpolicy
2000-012-14:03:20.00051 OS_TaskCreate[posix]: Calling memset
2000-012-14:03:20.00051 OS_TaskCreate[posix]: Calling pthread_attr_setschedparam
2000-012-14:03:20.00052 OS_TaskCreate[posix]: Calling pthread_create
2000-012-14:03:20.00061 OS_TaskCreate[posix]: Error: Operation not permitted
2000-012-14:03:20.00062 ES Startup: OS_TaskCreate error creating core App: CFE_EVS: EC = 0xFFFFFFFF
CFE_PSP_Panic Called with error code = 0x00000006. Exiting.
The cFE could not start.
  • if we run as 'nos3' user, we receive:
CFE_PSP: Default CPU Name: linux
CFE_PSP: Starting the cFE with a POWER ON reset.
CFE_PSP: Cannot shmget CDS Shared memory Segment!

Not quite sure why the 'root' user would be denied creating a new thread...there seems to be plenty of memory and cpus available. I was googling the error and found some information about setting capabilities on an executable directly (CAP_SYS_NICE), but when I tried that it did not get any further (nor did adding it to the container).

Do you have any ideas as to what may be occurring?

Our initial docker creation does do some things differently (at least in a different order (installing dependent libs in a pre-run container); and everything is owned by/running as 'root'). But not sure what would explain the issue(s) we are seeing.

Thanks in advance for any help.

Best Regards,

Peter
@PeteBlanchard
Copy link
Author

PeteBlanchard commented Jun 3, 2020

Okay, I've refactored how we were building to be more in-line with the vagrant build...
still doesn't work, but getting different error now...

2000-012-14:03:20.00013 ES Startup: Calling CFE_FS_EarlyInit
2000-012-14:03:20.00027 ES Startup: Core App: CFE_EVS created. App ID: 0
Your queue depth may be too large for the
OS to handle. Please check the msg_max
parameter located in /proc/sys/fs/mqueue/msg_max
on your Linux file system and raise it if you
 need to or run as root
EVS Port1 42/1/CFE_EVS 14: No subscribers for MsgId 0x808,sender CFE_EVS
EVS Port1 42/1/CFE_EVS 4: CreatePipeErr:OS_QueueCreate returned -1,app CFE_EVS
2000-012-14:03:20.00046 EVS:Call to CFE_SB_CreatePipe Failed:RC=0xCA000005
2000-012-14:03:20.00047 EVS:Application Init Failed,RC=0xCA000005
2000-012-14:03:20.00047 CFE_ES_ExitApp: CORE Application CFE_EVS Had an Init Error.
2000-012-14:03:20.00048 PROCESSOR RESET called from CFE_ES_ResetCFE (Commanded).
CFE_PSP: Exiting cFE with PROCESSOR Reset status.
CFE_PSP: Shared Memory segments have been PRESERVED.
CFE_PSP: Restart the cFE with the PR parameter to complete the Processor Reset.

After checking the '/proc/sys/fs/mqueue/msg_max' of the container, I see that it is 10, even though the host docker system is set to 100. I cannot seem to modify on the running container (read only filesystem).
Also, the '/etc/sysctl.conf' is configured with 'fs.mqueue.msg_max=500' which also seems to be ignored.
I am running the container with the '--cap-add=ALL' option, so sys_resolve and sys_nice should be allowed.

I had been playing around (between the old build and the new) and thought I had actually gotten it running once, but I have not been able to replicate on a clean build...

Could it be related to "nasa/osal#285" or "nasa/osal#235"?

@cmanderino
Copy link

Using Docker with cFS can produce some difficulties if the docker images are not just right or the containers are not correctly spun up. Docker exists as an isolated user space; you'll want to make sure to get the correct resources from the OS upon the container's initialization or else you may run into some issues like the ones you are seeing. IIRC, there is a flag you need to give Docker containers to make sure it got the correct number of mqueues. Once in the container you may not be able to set the OS mqueues like you normally might from general userspace with sudo.

Some other general tips: Do not run cFS root. You can, but you ought to be able to do anything you want to do with it without root privilege. Root can produce a number of issues, one of which is the unreleased shared memory segment. To properly run cFS you'll need to make sure you have the right number of mqueues. If you want to check, you might have old mqueues locked in your OS from an incorrect termination of a (usually a root cFS) past cFS instance where you'll need to clear the mqueues manually from your OS before your next run. Additionally, you may need to run cFS with PR Reset flag.

You can definitely run cFS in Docker and have it interact with other pieces. I do recommend, if you are putting many pieces into containers, that you establish a Docker Network for the pieces to use between themselves.

I hope this helps!

@PeteBlanchard
Copy link
Author

PeteBlanchard commented Jun 4, 2020

okay, I figured a way to update the mqueue size of the container, which got me past the problem above and it looks like it is running now.
(for anyone interested, when creating an instance of the container, you need to pass '--sysctl fs.mqueue.msg_max=100' on the 'run' line)

@PeteBlanchard
Copy link
Author

PeteBlanchard commented Jun 4, 2020

@cmanderino, thanks, I appreciate the response!

Yeah, you can't change the mqueue size once the container is create and I had seen the warning about running cFS as root, that was why I stepped back to re-align what had been done previously.

Breaking into multiple containers and running in a named network is a next/future task, but I may want to tackle it after I update to the current rc-1.05.00 (I was still using master).

@PeteBlanchard PeteBlanchard reopened this Jun 9, 2020
@PeteBlanchard
Copy link
Author

Okay, thought I had this running...but it looks like the flight software gets hung up in the gps initialization phase...

EVS Port1 42/1/CFE_EVS 1: CAM Lib HW Init Success
NAV_LibInit(): Initializing the GPS
EVS Port1 42/

The gps simulator is running (as well as the nos_engine_standalone)...any ideas as to what this could be waiting for?

Thanks in advance.

@mgrubb-stf
Copy link

This looks like code from the master branch, since removed in rc-1.05.00. But in that version the LibInit is going to make the UART connection to NOS Engine. It is likely hung trying to connect to the standalone server.

@PeteBlanchard
Copy link
Author

Interesting...yes, it is master branch. I haven't started the transition over to the new rc yet (have built and run it, but not tried to containerize).

how are the docs coming for the rc? have seen a couple anomalies (since I've built) in some commands sent through COSMOS not echoing in the core flight software and wondering where the issue might be...(e.g. 'cmd("GENERIC_REACTION_WHEEL GENERIC_RW_REQ_DATA_CC")')

@PeteBlanchard
Copy link
Author

PeteBlanchard commented Jun 9, 2020

yes, the core flight software is throwing this (or maybe the engine):

2020-06-09 22:32:29.442240 [WARNING] - SimConfig::get_config_for_simulator:  WARNING, did NOT load plug-in library libtime_sim.so.  Error: libtime_sim.so: cannot open shared object file: No such file or directory
2020-06-09 22:32:29.447861 [WARNING] - SimData42SocketProvider::connect_reader_thread_as_42_socket_client:  Continuing, but could not connect socket: Connection refused
Telemetry socket established.  Number of sockets=1
2020-06-09 22:32:30.700486 [ ERROR ] NosEngine.Uart - close uart port failed: node (fsw) not connected to port (1)

Did a search of the NOS3 vagrant built VM and couldn't find the 'libtime_sim.so' there either, so that may not be a big deal...not sure about the other...both the nos_engine_standalone and gps_simulator are running...

@PeteBlanchard
Copy link
Author

FYI, since I have the rc1.05.00 working (which has a more robust cFS cmd/tlm), I am closing this issue (which was against 1.04.00).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants