Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBLOX EVK NINA B1: mbed 5.5 regression on Serial class (nrf5) #4686

Closed
NeilMacMullen opened this issue Jul 2, 2017 · 42 comments
Closed

UBLOX EVK NINA B1: mbed 5.5 regression on Serial class (nrf5) #4686

NeilMacMullen opened this issue Jul 2, 2017 · 42 comments

Comments

@NeilMacMullen
Copy link
Contributor

NeilMacMullen commented Jul 2, 2017

This code no longer echoes back received characters on 5.5 (commit c9e63f1) although the initial "LOOPBACK SERIAL" message is shown.

`#include "mbed.h"
 
int main()
{
  Serial *pc = new Serial(USBTX, USBRX);
 
  pc->format(8, Serial::None, 1);
  pc->set_flow_control(Serial::RTSCTS);
  pc->baud(115200);
  pc->printf("LOOPBACK SERIAL %d %d>>>>>>\r\n",USBTX,USBRX);
  while (1)
  {
    while (pc->readable())
    {
      char c = pc->getc();
      pc->putc(c);
    }
    Thread::wait(10);
  }
}
`

Description

  • Type: Bug

Target
UBLOX_EVK_NINA_B1

Toolchain:
GCC_ARM

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 3, 2017

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 3, 2017

It would help if you can git bisect it. At least to find a revision that worked for you. I assume 5.5.0 worked, 5.5.1 broke it or?

Looking at the git log, I can see this commit - 080c5af that is touching serial in the recent history. Is this one that has affected your code base? I can see it is defined in nina target PinNames, is it correctly set ?

cc @nvlsianpu

@0xc0170 0xc0170 changed the title Mbed5.5 regression on Serial class. UBLOX EVK NINA B1: mbed 5.5 regression on Serial class (nrf5) Jul 3, 2017
@andreaslarssonublox
Copy link

Hi, I tested the code above on the latest master i.e. d382d44 and it seems to work. The fix done by @nvlsianpu is actually fixing the issue that is reported. The commit c9e63f1 is earlier in the history before the fix.
@NeilMacMullen can you try with the d382d44 commit instead and see if it works for you?

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Jul 3, 2017

@andreaslarssonublox @0xc0170 commit c9e63f1 is what you get if you run 'mbed new' (at least as of today, 3rd Jul 2017, 1500:Z) so it's not like I've just taken a random build. :-(

Running 'mbed update d382d44 ' inside the mbed-os folder then 'git log -n 1' shows "commit d382d44".

Rebuilding with "mbed compile -c" and reflashing the device, I see exactly the same behaviour unfortunately. (The reason I'm printing out pinnames is because I suspected they might have been accidentally changed but that's not the case for TX and RX. Commenting out the flow-control line makes no difference).

It would help if you can git bisect it.

Unfortunately I don't have time to do this right now but can test a couple of 'major' releases if you can tell me the magic 'mbed update' runes.

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Jul 3, 2017

On further investigation it appears that the problem does NOT appear (with d382d44) on an actual NINA_B1 but does appear with our own design which emulates the pin out. I suspect this is RTS/CTS related so will look at how our schematic might be different. This is definitely new behaviour since OS4.x though - has anything around this area been changed?

Edit

UART pin assignments appear unchanged between working and non-working builds...
TX:6
RX:5
CTS:7
RTS:31

Note, we're using NINA-B112 module. UART_CTS/GPIO_21 is pulled low. UART_RTS/GPIO_20, UART_DSR_PIO_17, UART_DTR/GPIO_16 all NC (floating). Perhaps there has been a change to default configuration of these?

Edit
Using pc->set_flow_control(Serial::Disabled); causes the test application to also fail on the NINA_B1 EVK though I'm not sure this is new behaviour since I recall being surprised that we had to set RTS/CTS flow control to get serial comms. Surely 'Serial::Disabled' should mean "RTS/CTS are ignored" ?

@andreaslarssonublox
Copy link

By the way shouldn't the pins for RTS and CTS be passed as well e.g.:
pc->set_flow_control(Serial::RTSCTS,RTS_PIN_NUMBER,CTS_PIN_NUMBER);

See set_flow_control method in SerialBase.h

@NeilMacMullen
Copy link
Contributor Author

@andreaslarssonublox - yes you are right. These are defaulted to NC which seems.... just a bit counter-intuitive to me (surely they should default to CTS/RTS?). Possibly the reason that bytes are not being received is because the host might be configured to use flow control but since we're pulling CTS low this shouldn't matter. I need to double-check the schematic (can't do this for a few hours) but pretty sure that the CTS pin isn't even connected to our uart header so why would disabling flow control prevent the chip receiving bytes? Perhaps the 'NC' defaults are causing some misconfiguration internally?

@nvlsianpu
Copy link
Contributor

@NeilMacMullen
Copy link
Contributor Author

@nvlsianpu I'll give that a try later but the main issue here is that 1) the behaviour has clearly changed and 2) the set_flow_control method should work. At this stage I can't even seem to get it to work reliably on the NINA_BA at commit d382d44 using any combination of settings and specifying the pins in set_flow_control. ( pc->set_flow_control(Serial::Disabled,RTS_PIN_NUMBER,CTS_PIN_NUMBER);) I've also tried telling TeraTerm to use hardware on 'none' FC on the host. What is interesting is that the echoed characters are flushed when the nina is reprogrammed using jflashlite.

@NeilMacMullen
Copy link
Contributor Author

Ok, I tried the config suggested in the link from @nvlsianpu That doesn't work for me either on NINA_B1 @ commit d382d44 with teraterm set to 'none' and the application 'set_flow_control' line commented out.

Exactly the same source code as above, on exactly the same hardware compiled against mbed @ commit ed4febe works fine

@nvlsianpu
Copy link
Contributor

I can't reproduce this on NRF52_DK @andreaslarssonublox It is possible to reproduce this issue on the NINA_B1x board? (I don't have a such)

@andreaslarssonublox
Copy link

Hi, I have tested the following on the UBLOX_EVK_NINA_B1 and commit d382d44 with the debug profile:

Default configuration with HW flow control enabled:

  • Using pc->set_flow_control(Serial::RTSCTS, RTS_PIN_NUMBER, CTS_PIN_NUMBER) putty with flow control=RTS/CTS - works
  • Using pc->set_flow_control(Serial::Disable) putty with flow control=None - works
  • Using no pc->set_flow_control putty with flow control=RTS/CTS - works

Modified configuration with no HW flow control in mbed_app.json

       "target_overrides": {
       "UBLOX_EVK_NINA_B1": {
           "target.uart_hwfc": 0
       },
  • Using pc->set_flow_control(Serial::RTSCTS, RTS_PIN_NUMBER, CTS_PIN_NUMBER) putty with flow control=RTS/CTS - works
  • Using pc->set_flow_control(Serial::Disable) putty with flow control=None - works
  • Using no pc->set_flow_control, putty with flow control=None - works

One thing I saw is that the board might need a restart when switching between flow control modes. This might have to do with the SEGGER J-Link debug chip sensing what mode is being used or what do you think @nvlsianpu?

@nvlsianpu
Copy link
Contributor

SEGGER J-Link cheat a little - it might doesn't work properly without CTS/RTS flow-control. I have tested it again with FTDI232 adapter (so I was sure how wiring looked like and which signals were connected to devices). It have been worked properly for flow both control OFF and ON modes. Then it must have be problem with an UART adapter you used @NeilMacMullen.

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Jul 6, 2017

The idea about the on-board debug chip getting confused is interesting and might help explain the B1 behaviour (recall that this originally worked for me but then seemed not to!) On our 'real' hardware which also appears to show the problem, we use a standard FTDI-232 adaptor cable with RTS/CTS not connected to the FTDI. J-Link is wired separately to the SWD pins. I will have some more time tonight so will see if I can come closer to isolating the change that is causing this.

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Jul 6, 2017

I have isolated the change that causes this. The code I'm using is...

#include "mbed.h"
#include "id.h"
int main()
{
Serial *pc = new Serial(USBTX, USBRX);
pc->format(8, Serial::None, 1);
pc->set_flow_control(Serial::RTSCTS);
pc->baud(115200);
pc->printf("LOOPBACK SERIAL %s >>>>>>\r\n",ID);
while (1)
{
while (pc->readable())
{
char c = pc->getc();
pc->putc('@');
}
Thread::wait(10);
}
}

where id.h is simply
#define ID "fb8fda3cee"

At commit 5f9f8c5 the code works as expected, echoing back '@' characters. At the subsequent commit fb8fda3, the code stops echoing characters.

Details of the commit are

commit fb8fda3
Merge: 65adf44 c5f0ad5
Author: Jimmy Brisson theotherjimmy@gmail.com
Date: Tue Apr 18 15:05:02 2017 -0500
Merge pull request #4097 from bulislaw/build_debug_macro
Debug build flag + change to sleep behavior in debug mode

Note that commit d382d44 also displays the incorrect behaviour so this does not resolve the issue.

@NeilMacMullen
Copy link
Contributor Author

It appears to be the changes to mbed_sleep.h....

diff --git a/platform/mbed_sleep.h b/platform/mbed_sleep.h
index 9154625..013bef1 100644
--- a/platform/mbed_sleep.h
+++ b/platform/mbed_sleep.h
@@ -28,8 +28,7 @@ extern "C" {
/** Send the microcontroller to sleep
*

  • @note This function can be a noop if not implemented by the platform.
    • @note This function will be a noop in debug mode (debug build profile when MBED_DEBUG is defined).
    • @note This function will be a noop while uVisor is in use.
    • @note This function will only put device to sleep in release mode (small profile or when NDEBUG is defined).
  • The processor is setup ready for sleep, and sent to sleep using __WFI(). In this mode, the
  • system clock to the core is stopped until a reset or an interrupt occurs. This eliminates
    @@ -45,20 +44,17 @@ extern "C" {
    /
    __INLINE static void sleep(void)
    {
    -#if !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED))
    -#ifndef MBED_DEBUG
    +#ifdef NDEBUG
    #if DEVICE_SLEEP
    hal_sleep();
    #endif /
    DEVICE_SLEEP /
    -#endif /
    MBED_DEBUG /
    -#endif /
    !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED)) /
    +#endif /
    NDEBUG */
    }

/** Send the microcontroller to deep sleep
*

  • @note This function can be a noop if not implemented by the platform.
    • @note This function will be a noop in debug mode (debug build profile when MBED_DEBUG is defined)
    • @note This function will be a noop while uVisor is in use.
    • @note This function will only put device to sleep in release mode (small profile or when NDEBUG is defined).
  • This processor is setup ready for deep sleep, and sent to sleep using __WFI(). This mode
  • has the same sleep features as sleep plus it powers down peripherals and clocks. All state
    @@ -73,13 +69,11 @@ __INLINE static void sleep(void)
    /
    __INLINE static void deepsleep(void)
    {
    -#if !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED))
    -#ifndef MBED_DEBUG
    +#ifdef NDEBUG
    #if DEVICE_SLEEP
    hal_deepsleep();
    #endif /
    DEVICE_SLEEP /
    -#endif /
    MBED_DEBUG /
    -#endif /
    !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED)) /
    +#endif /
    NDEBUG */
    }

#ifdef __cplusplus

@MarceloSalazar
Copy link

@bulislaw @0xc0170 I understand you've been working on sleep, could you have a look please?

@NeilMacMullen
Copy link
Contributor Author

I did wonder if perhaps the chip was relying on the RTS line going active to wake it from sleep so tried using pc->set_flow_control(Serial::RTSCTS,RTS_PIN_NUMBER,CTS_PIN_NUMBER); on the NINA_B1 but still see the same behaviour.

@bulislaw
Copy link
Member

bulislaw commented Jul 7, 2017

The board is put to sleep, when RTX calls to idle, between SysTicks (we don't have tickless mode yet), for both develop and release profiles. You can try disabling it by either using mbed compile [...] --profile debug or removing SLEEP from device_has in targets.json and see if that changes the behavior.

@NeilMacMullen
Copy link
Contributor Author

I'll try this later but surely the device would be expected to wake up on data reception and stay away while data is queued for the uart?

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 11, 2017

I'll try this later but surely the device would be expected to wake up on data reception and stay away while data is queued for the uart?

Any new findings?

@NeilMacMullen
Copy link
Contributor Author

Any new findings?

use '--profile debug' causes the application to behave as expected so that at least gives me a workaround for now. However, unless I'm missing something, the uart should be able to work in conjunction with sleep so that still leaves the question of why this isn't working in release mode?

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 12, 2017

This is based on TARGET_NRF5 implementation, so we should be able to reproduce it there.

cc @pan- @nvlsianpu @anangl please look at this, what might be causing uart to stop working ?

@pan-
Copy link
Member

pan- commented Jul 12, 2017

@0xc0170 Overflow of the idle thread stack: should be fixed by #4736 .

@NeilMacMullen
Copy link
Contributor Author

I applied this change to commit fb8fda3

diff --git a/targets/TARGET_NORDIC/mbed_rtx.h b/targets/TARGET_NORDIC/mbed_rtx.h
index 15c1ef6..12b9056 100644
--- a/targets/TARGET_NORDIC/mbed_rtx.h
+++ b/targets/TARGET_NORDIC/mbed_rtx.h
@@ -17,6 +17,9 @@
#ifndef MBED_MBED_RTX_H
#define MBED_MBED_RTX_H

+#define OS_IDLE_THREAD_STACK_SIZE 512
+
+
#if defined(TARGET_MCU_NRF51822)

#ifndef INITIAL_SP

It does not resolve the issue. (Just be be clear, my application does not appear to be crashing, it just appears to be sleeping though I can see how if the idle thread was broken it might not wake up properly...)

@NeilMacMullen
Copy link
Contributor Author

Any progress on this?

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 27, 2017

I'll try this later but surely the device would be expected to wake up on data reception and stay away while data is queued for the uart?

False, I don't think the flow control is using any interrupt (that could be source of wake up). The regular source of wake up in this case would be rtos timer (for this target it is rtc used instead of systick). thus your target should wake up every 1ms. Is that the case?

The question is - does this sleep() has any impact on flow control functionality? As I read above, @nvlsianpu neither @andreaslarssonublox could not reproduce the issue, makes me wonder is this related to the codebase in here or hw setup you have?

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Jul 27, 2017

False, I don't think the flow control is using any interrupt (that could be source of wake up)

Possibly we are talking at cross purposes here but at this stage the example code is NOT using flow-control. My point was that the (apparent) behaviour is that bytes are arriving on the RF52 UART but failing to be echoed. Even if the RF52 were in deep-sleep, the expected behaviour would be that it would wake when a byte was received (either via hardware interrupt or scheduled tick) then stay awake long enough to flush this byte back out of the output buffer. That's not happening.

makes me wonder is this related to the codebase in here or hw setup you have?

I'm fairly sure that the minimal example I gave compiled against the commit I mentioned also fails to behave correctly on the NINA-B1. I will verify that later. Even if the behaviour is different, I think the most plausible explanation is the one that's already been floated which is that the on-board Jlink adaptor may be causing the differences in sleep behaviour.

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Sep 7, 2017

Well, the good news is that the loopback behaviour now seems to be correct (just tested on 98ba8ac without '--profile debug').

The bad news is that trying to use an rx callback seems to crash the os*.. Has the use of this API changed?

*this can be seen with a slightly less minimal example that flashes an led on a background thread

#include "mbed.h"

class CallbackObject
{
	public: void OnRx() {}
};

int main()
{
  Serial *pc = new Serial(USBTX, USBRX);
  pc->format(8, Serial::None, 1);
  pc->set_flow_control(Serial::Disabled);
  pc->baud(115200);
  pc->printf("LOOPBACK SERIAL  with callback>>>>>>\r\n");

  CallbackObject * obj = new CallbackObject();
  bool attached = false;

  while (1)
  {
    while (pc->readable())
    {
      char c = pc->getc();
      pc->putc(c); pc->putc(' ');
      if (!attached && c=='!') 
      {
        pc->putc('@');
	pc->attach(callback(obj,&CallbackObject::OnRx),Serial::RxIrq);
	pc->putc('^');
       }
    }
    Thread::wait(10);
  }
}

@MarceloSalazar
Copy link

@0xc0170 @pan- are you aware of this crash problem in 5.5.6 (98ba8ac) ?

@NeilMacMullen
Copy link
Contributor Author

I may have been too optimistic in thinking that the original (no loopback) issue was resolved. Here are the results from testing on a stock NINA-B1 board using teraterm, no flow-control....

mbed compile

  • No echo at all

mbed compile --profile debug

  • Echoes characters correctly
  • Crashes on receiving the first character after callback attached. ('@' and '^') are seen indicating attach call is fine.

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 8, 2017

@andreaslarssonublox @nvlsianpu @anangl Is this known issue for nrf52 targets?

@SenRamakri
Copy link
Contributor

Hi NeilMacMullen,

I tried the above sample code on a NRF51_DK board and did not see any crash. By the way, what crash you are seeing? Can you post it here or provide more details?
Note that Callbacks are executed in ISR context and doing anything in the Callback which does things like acquiring a mutex ( directly or indirectly ) may cause an error.
One weird thing I'm noticing in my investigation with NRF51_DK is that even if we disable Flowcontrol in CONFIG we still need to have the default pins assigned for PSELRTS, PSELCTS in order to have TX, RX tasks working. Are you seeing this behavior with your target?

Thanks,
Senthil

@sg-
Copy link
Contributor

sg- commented Sep 11, 2017

First thing that looks wrong in the example is that Serial and RX interrupts are not reading data, therefore would endlessly be stuck in the RX IRQ. This is a problem given that Serial inherits from Stream and retargets to putc and getc which use a Mutex for protection and Mutex + IRQ context is not allowed. (The trap for this failure is not obvious)

Secondly there is a problem with nRF devices and flow control being disabled. More on that from @SenRamakri

In fact, I think we may need to deprecate attach in the Serial class and in general revisit the Serial classes given their quite a mess right now IMO.

Note: RawSerial doesn't have this problem. Could you try this example? Seems to work for me although not sure its what you're after. If you comment the flow control in or out the program will work or fail

#include "mbed.h"

DigitalOut led1(LED1);
Ticker t;
static void flip()
{
    led1 = !led1;
}

class CallbackObject : public RawSerial
{
public:
    CallbackObject() : RawSerial(USBTX, USBRX), new_data(false) {}
    CallbackObject(PinName tx, PinName rx) : RawSerial(tx, rx), new_data(false) {}
    
    void OnRx() {
        while (RawSerial::readable()) {
            c = RawSerial::getc();
            new_data = true;
        }
    }
    
    char getc() {
        new_data = false;
        return c;
    }

    bool readable() {
        return new_data;
    }
private:
    char c;
    bool new_data;
} obj;

int main()
{
    t.attach(callback(flip), 0.250f);
    obj.baud(115200);
    //obj.set_flow_control(Serial::Disabled);
    obj.attach(callback(&obj, &CallbackObject::OnRx), Serial::RxIrq);
    obj.printf("LOOPBACK SERIAL  with callback>>>>>>\r\n");

    while (1) {
        if(obj.readable()) {
            obj.putc(obj.getc());
        }
    }
}

@NeilMacMullen
Copy link
Contributor Author

Sam and Senthil. Thanks for your input.. I spent some time with Vincent and Marcelo in Cambridge today, FIndings were....

  1. As you noticed the callback is not actually reading the incoming bytes and clearing the rx interrupt However attempting to call getc inside the callback results in an invalid operation since it's not longer possible to do this inside an IRQ! I believe Vincent was going to initiate some internal discussion about the best way to handle this. I understand the desire to deprecate attach but you should be aware that there is quite a bit of code on github (e.g C027interface) that uses this pattern and would have to updated.

  2. The original problem may not actually be related to UART at all. During out testing, we noticed that even the blinky example fails to wake from sleep once external stimuli to the chip are removed (primarily the DAPLINK debug activity). Marcello has recorded this as a separate issue at Blinky doesn't work when using develop/release profile with external power #5070

I'll take a closer look at your suggestions tomorrow.

@NeilMacMullen
Copy link
Contributor Author

Thanks @sg- I've just created a variation of Serial that stubs out lock and unlock and this appears to work.. It's a little unclear what the mutex is meant to be protecting. I assume it's the low-level hardware access? Looking at the RF52 serial_api, the various access functions I care about (serial_readable/writeable/getc/putc) look like they should be interrupt-safe (the main concern I would have would be an incoming byte arriving as I'm calling putc). Do you foresee any issues with removing the locks entirely?

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 22, 2017

Do you foresee any issues with removing the locks entirely?

Why would you want them to be removed?

It's a little unclear what the mutex is meant to be protecting. I assume it's the low-level hardware access?

From the Serial documentation: * @note Synchronization level: Thread safe. Yes, protecting also serial_t object and the other protected members.

@NeilMacMullen
Copy link
Contributor Author

NeilMacMullen commented Sep 26, 2017

@0xc0170 The reason to remove the locks is to avoid the issue that @sg- raises which is that calling getc in the attach'd rx callback results in an invalid operation. Clearly it makes no sense to selectively remove locks from the getc function (either the design is thread-safe or it's not)!

From the Serial documentation...

This doesn't really address my question. :) Yes, obviously the mutex is there to make it thread-safe. The question is why it is not inherently thread-safe without locks. Normally the reason would be that that interleaving register or data structure accesses between rx/tx function would make "bad things happen" (TM). In the specific case of the RF52 serial driver, it looks to me like the register access sequences are actually already thread-safe but not being an expert on the SoC I'm not entirely sure of this, hence the question :-)

@MarceloSalazar
Copy link

Internal reference: IOTMORF-2319

@ciarmcom
Copy link
Member

ciarmcom commented Jun 1, 2018

ARM Internal Ref: MBOTRIAGE-312

@SenRamakri
Copy link
Contributor

@NeilMacMullen - Can you please let us know if this is still an issue and needs to be investigated as I haven't seen any activity on this issue recently?

@SenRamakri
Copy link
Contributor

@NeilMacMullen - As I have not received a response, I think this is no longer an issue. I'm closing this issue now, but feel free to re-open if you need this to be investigated more from our side, Thanks for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants