jdl

  • ***
  • 164
STAB ERROR alarms when DSHOT protocol is selected
« on: April 30, 2020, 08:51:27 pm »
I've recently completed a new build: GepRC Mark4 7" long-range quad capable of autonomous flight. Flight controller is Revo FC, Oplink for radio-control, BN-200 GPS, I2C external mag, T-motor F55A PROII ESC, BrotherHobby Avenger V2 2507 1500kv motors. Running Next r735.

It is flying beatifully, has much smoother and stable flight compared to my other 5" and 6" builds. Cannot tell what really makes the difference: larger frame (7" vs 6"), BLHeli_32 ESCs (vs older BLHelis, LittleBees 20A), faster and potentially more precise ESC protocol (DSHOT1200 vs OneShot125) or 2507 lower kv motors. Maybe all of these...

However, while initially tuning the quad, I've encountered serious problems with occasional STAB errors / ATTI warnings that were also causing momentary instabilities during flight. These glitches come in irregular intervals but happen quite often, sometimes once in 5-10min, sometimes 2-3 times in a minute, so they are a problem and pose danger in flight!

I've initially suspected a faulty IMU (MPU6000) as I've once had such a problem few years ago. However, symptoms were not exactly the same...

I've carefully inspected telemetry logs and DVR recordings and after lot of testing I concluded these STAB errors are not caused by bad hardware or I2C issues or whatever else hardware-wise. The problem is somehow related to setting ESC protocol in the FC to DSHOT. And, more precisely, when DSHOT speed is set to 600, 300 or 150kHz!

Using DSHOT1200 works, however, absolutely fine, never had even a single glitch with it (many hours of continous indoors observation in GCS and several 10+ minute flights) and the quad is currently setup and flying with DSHOT1200 protocol.

Using older Oneshot125 or Multishot protocols is trouble-free, too.

I've tested with and without peripherals (GPS, Auxmag) connected and setup, also tested with Revo FC powered by USB only or by external +5V power on servo pins. Attitude Estimation Algorithm (Complementary or INS13) doesn't matter, also.

I've performed these tests with three different Revo boards, the one on the GepRC Mark4 quad and two other spare ones (different manifacturers). Same results.

Watching System Health in GCS reveals that every glitch is indicated by black STAB with red X and yellow ATTI alarms. Sometimes (often) this is immediately followed by CPU overload alarm. I've also rarely seen (maybe only twice) I2C alarm and GPS alarm at the same time, immediately following the STAB error.

Let me say again that disabling Main and Flexi ports (no GPS and no I2C) changes nothing with STAB errors emerging occasionally when DSHOT 600 (or 300 or 150) is used.

Seems like FC code sometimes skips / fails to process gyro samples and this causes STAB errors. As this issue happens with DSHOT150, DSHOT300 and DSHOT600 and NOT with DSHOT1200, I guess it is somehow timer related but as I'm absolutely ignorant in this matter I may be entirely wrong.

I hope some of the devs will step in and address the issue.

While I believe DSHOT1200 is fine and I'll continue using it with my build, anyone with older ESCs that support DSHOT600 at max will possible have problems.

-------------

Here are GCS Scopes screenshots of several indoor tests, one and same board, USB-to-PC only connection, no external peripherals connected, identical FC settings except ESC protocol / speed used.

I've also attached .opl logs for these same tests that can be replayed in GCS.

The UAV settings file is also here.

-------------

Edit: Added a short DVR record.

Flight was intended to test autonomous modes. DSHOT600. 11 STAB errors for less than 4 minutes, one NO GPS FIX alarm (at 0:27, 0:48, 1:16, 1:24, 1:31, 1:37, 1:38, 1:48, 1:54, 3:27, 3:41)

« Last Edit: April 30, 2020, 10:12:10 pm by jdl »

Re: STAB ERROR alarms when DSHOT protocol is selected
« Reply #1 on: May 01, 2020, 11:31:00 am »
Very interesting.

Oscar Liang say that Dshot1200 has been removed from BF because they can't get it to run at 32K looptime and it's no better than Dshot600 at lower looptimes.  That sounds like the end of pushing for 1200 to me.
https://oscarliang.com/dshot1200-esc-protocol/

I tracked the GPS NoFix alarms seemingly down to uBlox fix jumping out generating a single "no fix" and then drifting back to the correct location.  There are some things (settings and uBlox Neo version numbers) that make it better or worse.  I thought about designing a filter for it.  Strange that I don't recall ever seeing it with a clone DJI GPS even though those use uBlox internally.  Well they are configured differently...

I don't have Dshot hardware, but things that come to mind include:
- Do you see different CPU % utilization for the various Dshot speeds?  I bet 1200 has lower or smoother utilization.
- Anything about the number of glitches being higher or lower with the various "bad" Dshot speeds?
- Have you tried the latest next and the earliest one that supports Dshot to see if there are any differences?

jdl

  • ***
  • 164
Re: STAB ERROR alarms when DSHOT protocol is selected
« Reply #2 on: May 01, 2020, 08:10:40 pm »
Yes, I've read about BF mishaps with Dshot1200 and that it's no longer supported in BF.

However, I've not experienced any ESC failures with Dshot in LibrePilot, at any speed. It works fine.

To be honest, I prefer using Dshot600 instead of Dshot1200 as it should be more stable when longer signal wires are used. Luckily, in this particular build the cables between FC and 4-in-1 ESC are only 7-8cm long and I've seen no communication problems using Dshot1200.

I'm aware of the relation between uBlox dynamic model (position filtering) and possibility of "No GPS Fix" alarms and jumps in position. However, this is not the case here. I've carefully inspected telemetry log of that flight and there is no drift/jump in recorded GPS positions in the time period of -30sec. / +30sec. relative the moment of the "No Gps Fix" alarm.

Even more, I can report that I've started setting the GPS Dynamic Model to '4G' some years ago and since then I've never seen that "No GPS Fix" error or jumps in positions when flying any of my UAVs. All of them use uBlox7/uBlox8 Beitian units. I have one DJI GPS (clone) but I have not used it yet in an aircraft as it's quite heavy...

Btw, it's not necessary to have Dshot hardware connected to Revo board to replicate the issue, it's just enough to set Dshot mode in the FC (or import the .uav file attached)...

I've tested CPU load for different Dshot speeds, see the snapshot attached.

OneShot / Multishot - 44%
Dshot 1200 - 44.5%
Dshot 600 - 45.5%
Dshot 300 - 46.5%
Dshot 150 - 49%

Yes, I was initially under impression that lower Dshot speeds are causing STAB errors more frequently. However,  these STAB errors don't come in regular intervals, and the rate of occurence varies in different tests, so I cannot state with absolute confidence there is straightforward relation between Dshot speeds and STAB errors. Still, I think the situation gets worse when lowering the Dshot speed!

No, I don't have older "nexts". But worths testing!

Re: STAB ERROR alarms when DSHOT protocol is selected
« Reply #3 on: May 02, 2020, 05:00:33 am »
Increasing CPU utilization with lowered data rate is not typical, but there are several ways it could happen.  The most likely in my guess is that the buffer isn't large enough to hold a whole packet and something like an interrupt handler or high priority task is trying to put data in it and must wait for buffer room.

If I can re-create the CPU utilization differences with your uav file, I might be able to get you some code to test.  :)

Re: STAB ERROR alarms when DSHOT protocol is selected
« Reply #4 on: May 02, 2020, 06:24:56 am »
I just had a look at the code in flight/pios/common/pios_servo.c  ...  :(

Code: [Select]
disable interrupts();
loop 16 times {
  bit_bang1();
  wait_some_data_rate_dependant_time1();
  bit_bang2();
  wait_some_data_rate_dependant_time2();
  bit_bang3();
  wait_some_data_rate_dependant_time3();
}
enable interrupts();

So the lower the data rate, the longer it waits.  The fact that it is noticeable in the CPU utilization shows that it isn't a trivial length of time.

There is no buffering or hardware timer wait or interrupts.  Disabling interrupts and waiting some fixed length of time (or 48 fixed lengths of time like above) in a multi-tasking environment is against first principles.  I can understand doing it for a test or prototype though.

For this to be changed reasonably quickly, this to me needs some TLC from Alessio or someone else who has more experience with DMA than me (forgive me, I know Alessio did the WS2811 DMA and I don't personally know of others).  If possible, DMA strikes me as one correct solution here.  Another correct solution would be to have a supremely high priority interrupt handle this after all the data is set up at a lower priority, but I don't know everything else that may be going on that may get unacceptable jitter from higher priority interrupts.  Adjusting thread / interrupt priorities is something that has caused heartburn here before...

I talked to Alessio a week or two ago.  I will at least try to run it by him.
« Last Edit: May 02, 2020, 06:55:09 am by TheOtherCliff »

jdl

  • ***
  • 164
Re: STAB ERROR alarms when DSHOT protocol is selected
« Reply #5 on: May 02, 2020, 08:27:52 pm »
Thank you, Cliff, for stepping in!

I understood your comments. 

This is a really sensitive matter, not a place for making risky experiments from enthusiasts with little/no experience with RTOS, like me.