Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LWIP Update- Function called without core lock #576

Open
andrewclink opened this issue Feb 20, 2018 · 5 comments
Open

LWIP Update- Function called without core lock #576

andrewclink opened this issue Feb 20, 2018 · 5 comments

Comments

@andrewclink
Copy link
Contributor

andrewclink commented Feb 20, 2018

Since the LWIP update LWIP_TCPIP_CORE_LOCKING now defaults to 1. From what I've been able to find, this is to allow the application to avoid a few context changes by providing its own mutex.

The client is supposed to call LOCK_TCPIP_CORE() etc, but I'm not entirely clear on where that's supposed to happen in esp-free-rtos. I see it peppered throughout the project, but not in any examples.

The end result is the message "Function called without core lock" being repeated on the uart.

  1. Is this my responsibility? If so, we need some examples showing use.
  2. Is this esp-open-rtos's responsibility? If so, it looks like several calls are missing. If someone could help clarify I'd be happy to work on this.
@ourairquality
Copy link
Contributor

The option LWIP_TCPIP_CORE_LOCKING was already enabled, it is just that checking has been added to lwip. If it is reporting problems then there are likely thread safety issues in the app code. If it is just at start up then you might get away with it.

We should correct any problems in the esp-open-rtos code. An attempt was made try to guard all the calls into lwipl, but perhaps there are still some calls into lwip from the sdk binary for which we do not have source code. If you could narrow down an example that would help.

Lwip has a range of APIs, callbacks, and sockets. Probably best to study lwip to learn more. The esp-open-rtos fork of lwip is very close to the upstream version.

@jeffsf
Copy link
Contributor

jeffsf commented Feb 22, 2018

Once I finish up my extras/timekeeping for a pull request, I'll probably dive into this deeper. I'm tempted to replace the nagging message with a forced crash so I can get a stack dump. Is there a better way to get a stack dump printed than sdk_system_restart_in_nmi(); ?

I get the messages even when only calling LWIP-supplied code and it doesn't appear to be part of the DHCP negotiation:

void
sntp_task(void *pvParameters) {

    while (sdk_wifi_station_get_connect_status() != STATION_GOT_IP) {
        vTaskDelay(10);
    };

    sntp_setoperatingmode(SNTP_OPMODE_LISTENONLY);
    sntp_init();

    while (1) {
        vTaskDelay(60 * 60 * (1000 / portTICK_PERIOD_MS));
        printf("gettimeofday(NULL, NULL)\n");
        gettimeofday(NULL, NULL);
    }
}

void user_init(void)
{
    uart_set_baud(0, 115200);
    sdk_wifi_set_opmode(STATION_MODE);
    /*
     * Run at a high enough priority so that the initial time-set doesn't get interrupted
     * Later calls and listen-only mode calls run in the high-priority "tcpip_thread" (LWIP)
     */
    xTaskCreate(sntp_task, "SNTP task", 256, NULL, 6, NULL);
}

The "SNTP:" line is printed within the implementation of #define SNTP_SET_SYSTEM_TIME_US(S, F) which is a call-back executed by the LWIP SNTP code running in the tcpip_thread

Opening /dev/tty.usbserial-AH02F0VW at 115200bps...
add 0
aid 1
cnt

connected with ESP, channel 1
dhcp client start...
ip:10.10.10.82,mask:255.255.255.0,gw:10.10.10.1
Function called without core lock
Function called without core lock
Function called without core lock
Function called without core lock
Function called without core lock
SNTP:     1519259689.771761    delta: 1519258788601.887 ms
SNTP:     1519259753.771809    delta:  -0.225 ms
SNTP:     1519259818.771803    delta:  -0.389 ms
SNTP:     1519259885.771817    delta:  -0.415 ms
ip:10.10.10.82,mask:255.255.255.0,gw:10.10.10.1
SNTP:     1519259949.771813    delta:  -0.379 ms

For the curious, yes, I've got a fully working set of settimeofday(), gettimeofday() and adjtime() that work with setenv() and tzset() using POSIX-style timezone strings to provide monotonic time. I'm seeing 5-6 ppm on the specific Adafruit Huzzah on my desk, consistent with the 15 ppm crystal spec from Espressif. Cleaning up the code now for a pull request.

@andrewclink
Copy link
Contributor Author

Ooh, I have been meaning to implement SNTP a little better...

I did just force a crash real quick out of curiosity. The only thing I see in the stack so far is tcpip_timeouts_mbox_fetch, reported as line 91 of lwip/lwip/src/api/tcpip.c. The only place this is called is in a while(1) loop in tcpip_thread.

There's calls to UNLOCK_TCPIP_CORE() and LOCK_TCPIP_CORE() wrapping sys_arch_mbox_fetch. I expected it to be the other way around, but I clearly am not familiar with the lwIP codebase yet.

@ourairquality
Copy link
Contributor

ourairquality commented Feb 22, 2018

Fwiw I went through a similar thought process, but with the mdns code. I started submitting patches to make the mdns api thread safe, and added support to the netif-api code, which can either use locking or message passing. It was patiently explained that lwip has different styles of code, some are thread safe such as the sockets api, and then there is the callback style. The callback style assumes that it all runs in one thread. When mixing multiple threads with the callback style of interface it is necessary to acquire the lwip lock around calls into lwip. There was even mention that the message passing approach might be deprecated as it was a big burden compared with locking, but no decision was made. They added the core lock checks to help developers write safe code. Perhaps that discussion would help to see their perspective: https://savannah.nongnu.org/bugs/index.php?52770

@jeffsf
Copy link
Contributor

jeffsf commented Feb 22, 2018

If I've found the right spot, from sys_arch.c it looks like the lock is recursive and also checks that the the lock can only be "extended" or released by the thread that originally obtained the lock.

Phew!

Other than potentially locking out another thread that is using LWIP, there doesn't look to be any "harm" in taking the lock.

(The TZ_LOCK in newlib is not recursive.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants