Program Flow and timing

catotonic
Posts: 36
Joined: Sun Jul 16, 2017 6:55 pm
Location: Houston, TX
Contact:

Program Flow and timing

Postby catotonic » Fri Aug 17, 2018 12:39 pm

While trying learn about program flow and task management on the ESP32 I ran into some anomalies that I do not understand.

I was trying to see how long and what I could do in a task without having context switched out from under me. So I created the following program that is running on Core 1 by itself. Core 0 is running a web and socket server.

Code: Select all

void RunStateMgrTask(void * parameter){
  uint32_t runCount1 = 0, runCount2 = 0;
  double runCount3 = 0;  //
  while(1){
    Serial.println("Running StateMgr Task");
    runCount1 = runCount2 = 0;
    for(runCount1 = 0; runCount1 < 0xffffff; runCount1++){
      digitalWrite(GPIO_NUM_33,HIGH);
      for(runCount2 = 0; runCount2 < 0x7ffffff; runCount2++){
        runCount3 ++;
      }
      for(runCount2 = 0; runCount2 < 0x7ffffff; runCount2++){
        runCount3 --;
      }
    }
    digitalWrite(GPIO_NUM_33,LOW);
    runCount3 = 0;
    vTaskDelay(2000);
  }
}
When I run this my LED stays on for ~4 seconds. By my very ill informed calculations, it should stay on much longer.
Also, if I move the line "digitalWrite(GPIO_NUM_33,HIGH);" outside of the for loop the LED does not come on. Why?

Code: Select all

void RunStateMgrTask(void * parameter){
  uint32_t runCount1 = 0, runCount2 = 0;
  double runCount3 = 0;  //
  while(1){
    Serial.println("Running StateMgr Task");
    runCount1 = runCount2 = 0;
    digitalWrite(GPIO_NUM_33,HIGH);
    for(runCount1 = 0; runCount1 < 0xffffff; runCount1++){
      //digitalWrite(GPIO_NUM_33,HIGH);
      for(runCount2 = 0; runCount2 < 0x7ffffff; runCount2++){
        runCount3 ++;
      }
      for(runCount2 = 0; runCount2 < 0x7ffffff; runCount2++){
        runCount3 --;
      }
    }
    digitalWrite(GPIO_NUM_33,LOW);
    runCount3 = 0;
    vTaskDelay(2000);
  }
}
Is my confusion with the compiler optimization or FreeRTOS task management, or me?
IF anyone ask why I need to know this, I am moving toward a bare metal high speed (>= 100K transaction/sec) SPI.

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Program Flow and timing

Postby kolban » Fri Aug 17, 2018 4:22 pm

I for one am very interested in the story that setting the GPIO level high before the inner loop doesn't set the output signal high but the same code in the inner loop does.

If it were me, I'd try writing the same app in ESP-IDF and see if the scenario changes. That will tell us if we are looking at an ESP-IDF puzzle or an Arduino puzzle. Do remember, that when you use the Arduino APIs you are working in a level of abstraction distinct from ESP-IDF.
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

User avatar
fly135
Posts: 606
Joined: Wed Jan 03, 2018 8:33 pm
Location: Orlando, FL

Re: Program Flow and timing

Postby fly135 » Fri Aug 17, 2018 7:34 pm

When you move the gpio set high outside the loop, it looks like the delay loops were optimized away. Try declaring runCount3 as volatile. Assuming high turns the led on, if the loops are optimized out the leds would be immediately turned off.

John A

catotonic
Posts: 36
Joined: Sun Jul 16, 2017 6:55 pm
Location: Houston, TX
Contact:

Re: Program Flow and timing

Postby catotonic » Sat Aug 18, 2018 2:47 pm

@ Kolban and fly135
Well Gentlemen, Things have gotten more interesting.
Side note: I am using the Sloeber IDE which uses both ESP-IDF and Arduino code.
First: The app will not run with:
Volatile double runCount3;
I declared it both inside and outside the task function.
I changed from the Arduino GPIO API to the ESP-IDF API. I have heard that the Arduino code is slower because it is a wrapper for the ESP-IDF, but I wanted to check. I thought maybe there was a mutex or semaphore involved. It seems to make no difference.
I also added another print statement to my RunStateMgrTask , which was enlightening.
Using either:

Code: Select all

for(runCount1 = 0; runCount1 < 0xffffff; runCount1++){
     // digitalWrite(GPIO_NUM_33,HIGH);
      gpio_set_level(GPIO_NUM_33, HIGH);

or:

Code: Select all

for(runCount1 = 0; runCount1 < 0xffffff; runCount1++){
     digitalWrite(GPIO_NUM_33,HIGH);
     // gpio_set_level(GPIO_NUM_33, HIGH);
I get the following report:
Web Server Ran 102 times
****** StateMgr Task Started ******
WebSocket Ran 12 times
Web Server Ran 102 times
WebSocket Ran 12 times
Web Server Ran 102 times
***** StateMgr Finished ****
In UART Task
WebSocket Ran 12 times
Web Server Ran 102 times
In UART Task
In UART Task
Web Server Ran 102 times
WebSocket Ran 12 times
In UART Task
****** StateMgr Task Started ******
Web Server Ran 102 times

This seems to confirm what I want, which to have the RunStateMgrTask task runs on Core1 without a context switch.
But: If I run the following code using either API call:

Code: Select all

gpio_set_level(GPIO_NUM_33, HIGH);  // or digitalWrite(GPIO_NUM_33,HIGH);
for(runCount1 = 0; runCount1 < 0xffffff; runCount1++){
     .....
     } 
I get:
In UART Task
****** StateMgr Task Started ******
***** StateMgr Finished ****
In UART Task
Web Server Ran 102 times
In UART Task
WebSocket Ran 12 times
In UART Task
Web Server Ran 102 times
In UART Task
WebSocket Ran 12 times
****** StateMgr Task Started ******
***** StateMgr Finished ****
In UART Task
Web Server Ran 102 times
In UART Task
In UART Task
In AppMain.cpp

Which looks like the for loop does not even run.

While this is intriguing the main reason for writing is how the ESP32 can run 1.84467E+19 interations in 4 seconds.
While the code being shown has a simple runCount3++; operator, I had other and multiple operators and the same effect.
I was just wonder, just how smart the compiler is in reducing and optimizing multiple operations.

I guess I should reframe and resubmit my question:
How do I get a calculated sub 10uSec delay using a for or while loop?
I already have Timer0 involved in anther part of my task.
My goal is to acquire over 500KB of analog data at around 100K samples/sec and save it the SD card.

ESP_igrr
Posts: 2067
Joined: Tue Dec 01, 2015 8:37 am

Re: Program Flow and timing

Postby ESP_igrr » Sun Aug 19, 2018 4:57 am

For sub microsecond delays you can query CPU cycle count register, CCOUNT.

However I suggest using I2S DMA to fetch ADC samples and store them into memory. This is going to guarantee consistent sampling rate without CPU involvement.

Then store data from DMA buffers to SD card. Make sure you have a lot of buffers though, as SD card write latency can occasionally be as high as 700ms.

catotonic
Posts: 36
Joined: Sun Jul 16, 2017 6:55 pm
Location: Houston, TX
Contact:

Re: Program Flow and timing

Postby catotonic » Mon Aug 20, 2018 12:45 pm

@ESP_igrr
Thank you for the heads up.
Can you please clue me in on what are some of the things that can add up to 700 mSec of delay?
What is the benefit of having DMA's if a DMA or dual cores cannot move data for 700 mSec?
I2S is not trivial, I am building a handheld device and board space is limited. Most I2S interfaces are on an audio IC have many pins and need left and right decoding. All add up to complexity and engineering costs.

I'm just trying to circumvent the mutex on the SPI for the ADC by allowing Core1 to be able to do nothing but handle the ADC.
Core 0 will continue running the WiFi and socket data tasks, but nothing will be coming into the ESP32 besides ADC data while acquiring samples. There will be a very controlled environment.
I then hope to find a way to use a timer and the hal libraries to manually clock data out of ADC.
But now, even if I can force the ADC to run at 100K smpls/sec, I need to find a buffer somewhere large enough to buffer the SD Card.
I guess I don't have to worry about buying a class 10 memory card at least.
If I would use a ESP32 WROVER 1 and created a 1MB buffer, do you think I could write to it fast enough to buffer 500K samples of 16 bit data at >= 100 K smpls/sec?

ESP_igrr
Posts: 2067
Joined: Tue Dec 01, 2015 8:37 am

Re: Program Flow and timing

Postby ESP_igrr » Mon Aug 20, 2018 2:52 pm

catotonic wrote: Can you please clue me in on what are some of the things that can add up to 700 mSec of delay?
Write latency of SD cards is mostly determined by their internal operation. NAND memories need some sort of wear levelling algorithms to be usable, so when that algorithm decides that some blocks of data need to be moved, the card will not accept more data until internal housekeeping operations are completed.
catotonic wrote: What is the benefit of having DMA's if a DMA or dual cores cannot move data for 700 mSec?
Consistent sampling rate, mostly. Timing ADC reads using the CPU is going to result in much higher jitter.
catotonic wrote: I2S is not trivial, I am building a handheld device and board space is limited. Most I2S interfaces are on an audio IC have many pins and need left and right decoding. All add up to complexity and engineering costs.
Sorry that I wasn't clear. I2S peripheral of ESP32 can move data between: 1) memory and external I2S codec via physical I2S interface, in both directions, and 2) memory and internal ADC/DAC, via internal bus (which is not I2S...). You may consider this to be simply a DMA between ADC and memory, or between memory and DAC. This DMA just happens to live inside I2S peripheral. Please check this example: https://github.com/espressif/esp-idf/tr ... 2s_adc_dac
catotonic wrote: If I would use a ESP32 WROVER 1 and created a 1MB buffer, do you think I could write to it fast enough to buffer 500K samples of 16 bit data at >= 100 K smpls/sec?
Sure, you can certainly write into a buffer located in PSRAM at 200 kB/sec.

My general advice would be to try both parts (1: reading ADC using DMA into memory, 2: writing buffers of data to SD card) separately, then it will be easier to identify possible bottlenecks while the applications are simple.

catotonic
Posts: 36
Joined: Sun Jul 16, 2017 6:55 pm
Location: Houston, TX
Contact:

Re: Program Flow and timing

Postby catotonic » Mon Aug 20, 2018 4:27 pm

@ESP_igrr
Thank you for you generosity of time in explaining things to me.
The biggest consumer of time for me is running up a, what seems to me, a logical path of implementation only to find out it is a dead end or blocked by something I was unaware of.
With regard to the above.
After reading some of your code, again, thank you for the example, I have a couple of questions.
Am I correct in the assumption that, while you have sample bits set to 16, I am limited to <= 12 bits?
No over sampling is being done, correct?
What is the max sampling rate of the 12S ? I need > 100K.
I am not sure if 12 bits will give me the fidelity I want, this is a alpha implementation of down converting ultrasonics vibrations.
And last,
If I chose to go up the 16 bit ADD SPI path, Is there anything that would stop me from implementing the SPI manually without using the higher level mutex routines?
The reason I am hopeful is because of https://github.com/adafruit/Adafruit_ILI9341/issues/19 that stated: "plz use bitbang SPI, its very fast! the library has to been rewritten specifically to avoid the mutex delay on esp32"
I think I will start with your example, if it works it would be much simpler.
Thanks again

Who is online

Users browsing this forum: No registered users and 98 guests