Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

meowsqueak
Posts: 151
Joined: Thu Jun 15, 2017 4:54 am
Location: New Zealand

Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby meowsqueak » Wed Apr 04, 2018 4:29 am

I've seen some issues with RMT on the ESP-IDF v3.0rc1 branch recently. In particular, RMT-heavy operations (such as reading from multiple DS18B20 devices regularly) seems to work well, until something unrelated causes the ESP32 to crash and reset. Then when the device boots up and tries to initialise the RMT driver with rmt_driver_install(), it may crash again. Once this happens, it continues repeatedly until a human presses the reset button.

My JTAG debugger showed that it's sometimes a null pointer dereference in the RMT driver, but I haven't worked out why that particular pointer is null yet.

I suspect, but currently can't prove, that the sudden crash causes the RMT peripheral to be left in an unusual state that the RMT driver init code currently doesn't expect or deal properly with.

Anyway, I'll continue to debug with my own observations, but what I'd like to know in the meantime is if anyone else has seen similar behaviour? I'm looking for patterns.

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby WiFive » Wed Apr 04, 2018 4:39 am

Did you try periph_module_disable, periph_module_enable

meowsqueak
Posts: 151
Joined: Thu Jun 15, 2017 4:54 am
Location: New Zealand

Re: Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby meowsqueak » Wed Apr 04, 2018 4:49 am

WiFive wrote:Did you try periph_module_disable, periph_module_enable
No, I haven't tried those - I see that rmt_config() calls periph_module_enable(), but does not call disable itself, so I'll try a call to periph_module_disable() prior and see if that helps. Thanks for the suggestion.

meowsqueak
Posts: 151
Joined: Thu Jun 15, 2017 4:54 am
Location: New Zealand

Re: Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby meowsqueak » Wed Apr 04, 2018 7:35 am

In terms of finding a root cause, I've increased my RMT operation rate by 100x and added a "crash on demand" button, and I'm able to easily replicate the RMT crash-on-init error, so I'm using this to explore the cause of this problem.

I've been able to tell with certainty that the crash is happening at line 544 of driver/rmt.c:

https://github.com/espressif/esp-idf/bl ... rmt.c#L544

A little debugging shows that the following crashes because p_rmt_obj[0] is NULL in this case, and so is p_rmt:

Code: Select all

rmt_obj_t* p_rmt = p_rmt_obj[channel];
// ...
xSemaphoreGiveFromISR(p_rmt->tx_sem, &HPTaskAwoken);  // Line 544
My code uses two channels, 0 & 1, and p_rmt_obj[1] is not NULL at this time.

The code initialises RMT channel 1 first, i.e. rmt_driver_install(1, ...), then shortly after calls rmt_driver_install(0, ...). When it crashes, I see that the ISR is handling channel 0 even though it hasn't been "installed" yet. I propose that the RMT hardware, having been configured prior to the first crash, is still running and generates an interrupt before the call to rmt_driver_install(0, ...) actually happens, which leads to the attempt to dereference a null pointer since p_rmt_obj[0] hasn't been malloc'd yet.

So this looks like it might be caused by using more than one RMT channel in a program, therefore giving rise to a race between the interrupt handler installed by the first call to rmt_driver_install responding to an interrupt for the uninitialised channel, and the second call to rmt_driver_install which would actually allocate the data structure for the second channel.

Putting a disable/enable just prior to rmt_config() does prevent the problem from occurring:

Code: Select all

    periph_module_disable(PERIPH_RMT_MODULE);
    periph_module_enable(PERIPH_RMT_MODULE);
    rmt_config(&rmt_tx);
    ...
So one simple fix would be to call periph_module_disable() just before periph_module_enable() in rmt_config(), but how friendly is this to code that sleeps?

I feel this could be better fixed by the RMT ISR avoiding the assumption that p_rmt_obj[channel] is always non-NULL. If I am to submit a patch, the question will be, what should the ISR do in this case? Simply ignore the interrupt?

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby WiFive » Wed Apr 04, 2018 8:30 am

Well it should probably do something like

Code: Select all

if(p_rmt==NULL)
{
    RMT.int_ena.val &= (~(BIT(i)));
    RMT.int_clr.val = BIT(i);
    continue;
}

meowsqueak
Posts: 151
Joined: Thu Jun 15, 2017 4:54 am
Location: New Zealand

Re: Request for observations: RMT rmt_driver_install crashing repeatedly after a previous crash

Postby meowsqueak » Wed Apr 04, 2018 9:15 am

Ah, so if the channel is uninitialised then the interrupt is disabled? That makes sense I think. Currently my PR will clear the interrupt but doesn't disable it.

BTW, for completeness: https://github.com/espressif/esp-idf/issues/1815

Who is online

Users browsing this forum: No registered users and 131 guests