SPI re-entrancy

vonnieda
Posts: 145
Joined: Tue Nov 07, 2017 3:42 pm

Re: SPI re-entrancy

Postby vonnieda » Fri Nov 09, 2018 7:45 pm

PeterR wrote:
Fri Nov 09, 2018 10:12 am
Thanks, yes your report seems very similar but my application still fails when SPI activity was pinned to core 0 (achieved 30 min which is longest run so far).
I will go back & ensure that the SPI bus & devices are created from core 0.

I also use Ethernet RMII. There are some broadcasts on my test network and also my application logs via UDP - there will be regular Ethernet interrupts and DMA during normal testing.
With Ethernet removed & all SPI transaction on core 0 I achieved 18 hrs last night & still running.

Did you have any other interrupt and/or DMA activity in your test program?

EDIT:
HSPI on core 0, VSPI on core 1, No Ethernet, Main Application - seems to work
HSPI on core 0, VSPI on core 0, Ethernet, Main application - seems to work
HSPI on core 0, VSPI on core 1, Ethernet, Main application, fails quickly
HSPI on core 0, VSPI on core 1, Ethernet, MWE, seems to work

Thinking that cache may be related I created a large IRAM_ATTR array and tried to read from it.
I get '0x40089a1c: _xt_nmi at ??:?'
How would I use cache & so flush the program out of cache & simulate 'real life'?
EDIT: Fixed. Needed to use 32 bit access.
Aside from SPI I also had I2C interrupts (via transaction complete) and a single GPIO interrupt.

Jason

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: SPI re-entrancy

Postby PeterR » Sat Nov 10, 2018 11:01 am

Thanks.
I have a single GPIO interruptin the main application as well (MCP2515 CAN). I have not yet included the GPIO INT in the MWE.

My application does seem sensitive to Ethernet though.

I scan read the IDF code. The SPI hosts seem quite seperate.
Each driver receieves its own interupt and runs from its own driver context. I have not gone as far as to check how interrupts are actually enabled/disabled e.g. SMP atomicity.
The error is easy enough to catch though. Guess what I want to do is dump the interrupt source and interrupt mask registers. Is everything enabled or ...
Anyone able to short cut me at the applicable registers or api?
I want to; find which interrupts were allocated to VSPI, HSPI, GPIO, what there mask and request status is and what their sources are flagging.
Ta
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: SPI re-entrancy

Postby PeterR » Mon Nov 12, 2018 2:44 pm

Added some interrupt diagnostics.
When VSPI (the CAN) transaction times out:

Owning CPU (handle->host->intr): 0
Executing CPU (xPortGetCoreID): 1

uxQueueMessagesWaiting(handle->trans_queue): 1
Interrupt Source: 0x1f, Number: 0x12
DPORT_PRO_INTR_STATUS_0_REG: 0xd0000000
DPORT_APP_INTR_STATUS_0_REG: 0xd0000000

DPORT_PRO_SPI_INTR_3_MAP_REG: 0x06
DPORT_APP_SPI_INTR_3_MAP_REG: 0x06

So it would appear that SPI 0, 2, 3 are signalling for interrupts.
SPI 3 (VSPI) interrupt is disabled (0x06) but we have a transaction in the VSPI queue.

Now you have probably heard this before but I am not sure how my application code could create this scenario.
I only use spi_master.h API.
& I also believe that IDF CAN should be fixed.

vonnieda
Posts: 145
Joined: Tue Nov 07, 2017 3:42 pm

Re: SPI re-entrancy

Postby vonnieda » Mon Nov 12, 2018 3:51 pm

PeterR wrote:
Mon Nov 12, 2018 2:44 pm
Added some interrupt diagnostics.
When VSPI (the CAN) transaction times out:

Owning CPU (handle->host->intr): 0
Executing CPU (xPortGetCoreID): 1

uxQueueMessagesWaiting(handle->trans_queue): 1
Interrupt Source: 0x1f, Number: 0x12
DPORT_PRO_INTR_STATUS_0_REG: 0xd0000000
DPORT_APP_INTR_STATUS_0_REG: 0xd0000000

DPORT_PRO_SPI_INTR_3_MAP_REG: 0x06
DPORT_APP_SPI_INTR_3_MAP_REG: 0x06

So it would appear that SPI 0, 2, 3 are signalling for interrupts.
SPI 3 (VSPI) interrupt is disabled (0x06) but we have a transaction in the VSPI queue.

Now you have probably heard this before but I am not sure how my application code could create this scenario.
I only use spi_master.h API.
Interesting find! Might be time to file an issue on this at https://github.com/espressif/esp-idf/issues as that is likely to get looked at by Espressif faster than this thread.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: SPI re-entrancy

Postby PeterR » Tue Nov 13, 2018 1:49 pm

Thanks.
I think that the issue has been fixed in v3.2-beta1.
There are a lot of merged changes though so I am not sure. Need to soak test.
Would love to know the mechanism.
& I also believe that IDF CAN should be fixed.

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: SPI re-entrancy

Postby WiFive » Tue Nov 13, 2018 5:38 pm

You updated from 3.1 to 3.2-beta1 and issue has (tentatively) resolved?

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: SPI re-entrancy

Postby PeterR » Wed Nov 14, 2018 12:12 pm

I have been doing most of my tests on v3.2-dev-760-ga0d2dd03 as I need ESP32 CAN and the soc changes etc.

The soak has run for 24 hours now and would normally fail within 5 minutes.
If you have thoughts on what the issue might have been then I would love to hear.
& I also believe that IDF CAN should be fixed.

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: SPI re-entrancy

Postby WiFive » Wed Nov 14, 2018 1:18 pm


PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: SPI re-entrancy

Postby PeterR » Wed Nov 14, 2018 3:04 pm

Quite possibly.
spi_master.c has had quite a few changes & more than a sprinkling of 'atomic'.

The CAN SPI interrupt was allocated on core 0. The CAN SPI transactions were always initiated on core 1.
So TX (core 1) was:
(a) Queue
(b) Enable ISR

RX ISR (core 0) was:
(1) Check each CS Queue
(2) If no item then disable ISR

So there is a race there. If it takes the RX ISR (core 0) longer to search the other 3 (unused) CS's than it takes between (a) and (b) then we die.
I am fairly sure that I had SPI_MASTER_ISR_IN_IRAM set but you could see how that would make life a lot worse.

I am not getting the apparent sensitivity to ExtFlash though - unless ExtFlash SPI interrupt priority is higher & he can jump in over the CAN SPI ISR handler.

I am sure that I RTFMed and that the driver allows multiple cores....

EDIT: The recent changes have certainly slowed the SPI driver down though. I think that that should be looked at as the driver was slow to initiate SPI transactions to start with.
& I also believe that IDF CAN should be fixed.

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 107 guests