how to offload precise ADC oversampling with RISC-V GD32VF103CBT6 Development Board

how to offload precise ADC oversampling with RISC-V GD32VF103CBT6 Development Board - c++

I'm hoping to work up a very basic audio effects device using a RISC-V GD32VF103CBT6 Development Board. I have managed to do some hardware-interrupt-based sampling with another MCU, but I'm a bit confused by the documentation for the RISC-V board. Chapter 11 in the user manual. I haven't the slightest idea how to turn the instructions there into actual C/C++ code. Sadly, their github repo has almost no examples at all, and none appear to deal with high speed sampling. There's also a datasheet in this github repo but I haven't been able to find any specific code examples or revealing instruction in there, either.
What I want to do is:
Perform the calibration described in the user manual, which must precede sampling operations.
collect 12-bit audio samples of audio signal voltage off an external pin using its oversampling capability to sum numerous 12-bit samples into a single 16-bit sample at a high sampling rate. Ultimately I want audio sampled with 16-bits at 48khz-96khz.
I need help instructing the MCU to collect these samples using its built-in hardware features.
I want to continuously sample, offloading as much as possible to built-in hardware functions so I can leave enough processing overhead left to do a bit of signal processing for simple effects.
Section 11.4.1 clearly says
Calibration should be performed before starting A/D conversion.
The calibration is initiated by software by setting bit CLB=1. CLB bit stays at 1 during all the calibration sequence. It is then cleared by hardware as soon as the calibration is completed.
The internal analog calibration can be reset by setting the RSTCLB bit in ADC_CTL1 register.
Calibration software procedure:
1) Ensure that ADCON=1.
2) Delay 14 ADCCLK to wait for ADC stability
3) Set RSTCLB (optional)
4) Set CLB=1.5.Wait until CLB=0.
Question 1: How do I set these memory registers as these instructions indicate. I need a code example, and the manufacturer provides none.
Question 2: How do I delay 14 ADDCCLK in C/C++. Seems like a loop would be enormously inefficient. Should I call sleep()? Any explanation of ADDCCLK also helpful.
This also seems important, but I have no idea what it portends:
The ADCCLK clock provided by the clock controller is synchronous APB2 clock. The RCU controller has a dedicated programmable prescaler for the ADC clock.
I am not at all certain but I think this is the conversion mode I want:
Continuous conversion mode
This mode can be run on the regular channel group. The continuous conversion mode will be enabled when CTN bit in the ADC_CTL1 register is set. In this mode, the ADC performs conversion on the channel specified in the RSQ0[4:0]. When the ADCON has been set high, the ADC samples and converts specified channel, once the corresponding software trigger or external trigger is active. The conversion data will be stored in the ADC_RDATA register.
Software procedure for continuous conversion on a regular channel. To get rid of checking, DMA can be used to transfer the converted data:
1.Set the CTN and DMA bit in the ADC_CTL1 register
2.Configure RSQ0 with the analog channel number
3.Configure ADC_SAMPTx register
4.Configure ETERC and ETSRC bits in the ADC_CTL1 register if in need
5.Prepare the DMA module to transfer data from the ADC_RDATA.
6.Set the SWRCST bit, or generate an external trigger for the regular group

ADCCLK refers the input clock of the ADC. May be take a look at your datasheet. The most µC have a block diagram of the clock architecture of the µC usually there is a main system clock and then the different peripherals have a prescaler that you can program and which divide the system clock by some power of 2.
so 14 ADCCLK cycles mean that its not 14 CPU cycles but 14 ADC-Input-Clock edges.
For example if the ADC prescaler is set to 64 then you have to wait 64*14 CPU clock cycles.
How to wait at all:
Mostly (I do not know if such a thing is present on your device) peripherals have a busy flag that is set as long the current operation is ongoing. So may be you can poll this flag (e.g. like while (ADC0_FLAGS & ADC_ISBUSY); ).
Another option may be checking if there is an interrupt that signals the completion of your operation. But at least for the calibration the simplest thing would be to start the calibration and just use a wait or delay function that just wastes a bit of time.
I personally would start the calibration on system start up and then doing other initialization stuff. May be delay at end of setup a few milliseconds to make sure all components on the board are powerd up correctly. After that the ADC should be already finished a long time.

Related

STM32 (using Mbed online) showing delay at higher analog input frequency

I am new to the use of controllers.
I am setting up a STM32F769 Controller(Using Mbed online compiler), my target is to get a PWM output which changes its frequency according to an analog input. I did some basic coding but there is a problem. When i check the output on oscilloscope with analog input 1Hz frequency, its working perfectly, but when i check it with 100Hz analog input there is delay in the output, and i get wrong values. I do not understand why, because this board is faster(216 MHZ) and i should not face such issue. (If someone could also explain, is it possible to use the board at 216MHz or other max frequency? and how?)
1st time user
{
meas_r=0;
for(int i=1;i<=1024;i++)
{
meas_r = meas_r+analog_value.read();
}
meas_r=meas_r/1024;
meas_v = meas_r * 3300;
out_freq=50000+(meas_v*50);
pulse.period( 1.0 / out_freq);
}
}
It should be working on 100Hz analog input as it works on 1 Hz.

216MHz might be the maximum clock frequency at which your processor can operate, however it does not mean that it can input/output that much frequency from its ports.
The delays are caused by time it takes to read analog values and compute required math operations. You are using multiple multiplications and divisions which are more complex than adding and subtracting for almost any hardware device. Obviously, you are using library/libraries as well (pulse.period(), analog_value.read()), there are some hidden computations on top of those multiplications and divisions. Finally, it is possible that your device is working with other stuffs as well (only you know about this). All those computations need time. At lower frequencies you might not be able to notice the delay, however when the frequency is high enough, the delays can be noticed. Also consider time required to read the analog values many times.
Wrong signal and period is due to the delays and some other uncertainties. If the processor is working with other tasks as well, then it will be hard to predict time it takes to finish them all. As the processor executes the instructions line by line and waits for previous computation to finish before the new one starts, it causes some uncertainty in timing. Data path and frequency of peripheral devices (getting input from peripherals) play crucial role in timing uncertainty and delays.
If timing and accuracy is really important in solving the problem you have, and if you can't solve the problem with DSP, MPU, MCU, CPU, GPU, etc. I would suggest you to use an FPGA to solve the problem.

NTPD synchronization with 1PPS signal

I have an AHRS (attitude heading reference system) that interfaces with my C++ application. I receive a 50Hz stream of messages via Ethernet from the AHRS, and as part of this message, I get UTC time. My system will also have NTPD running as the time server for our embedded network. The AHRS also has a 1PPS output that indicates the second roll-over time for UTC. I would like to synchronize the NTPD time with the UTC. After some research, I have found that there are techniques that utilize a serial port as input for the 1PPS. From what I can find, these techniques use GPSD to read the 1PPS and communicate with NTPD to synchronize the system time. However, GPSD is expecting a NMEA formatted message from a GPS. I don't have that.
The way I see it now, I have a couple of optional approaches:
Don't use GPSD. Write a program that reads the 1PPS and the Ethernet
message contain UTC, and then somehow communicates this information
to NTPD.
Use GPSD. Write a program that repackages the Ethernet message into
something that can be sent to GPSD, and let it handle the
interaction with NTPD.
Something else?
Any suggestions would be very much appreciated.
EDIT:
I apologize for this poorly constructed question.
My solution to this problem is as follows:
1 - interface 1PPS to RS232 port, which as it turns out is a standard approach that is handled by GPSD.
2 - write a custom C++ application to read the Ethernet messages containing UTC, and from that build an NMEA message containing the UTC.
3 - feed the NMEA message to GPSD, which in turn interfaces with NTPD to synchronize the GPS/1PPS information with system time.

I dont know why you would want drive a PPS device with a signal that is delivered via ethernet frames. Moreover PPS does not work the way you seem to think it does. There is no timecode in a PPS signal so you cant sync the time to the PPS signal. The PPS signal is simply used to inform the computer of how long a second is.

there are examples that show how a PPS signal can be read in using a serial port, e.g. by attaching it to an interrupt capable pin - that might be RingIndicator (RI) or something else with comparable features. the problem i am seeing there is that any sort of code-driven service of an interrupt has its latencys and jitter. this is defined by your system design (and if you are doing it, by your own system tailored special interrupt handler routine - on a PC even good old ISA bus introduced NMI handlers might see such effects).
to my best understanding people that are doing time sync on a "computer" are using a true hardware timer-counter (with e.g. 64 bits) and a latch that gets triggered to sample and hold the value of the timer on every incoming 1PPS pulse. - folks are doing that already with PTP over the ethernet with the small variation that a special edge of the incoming data is used as the trigger and by this sender and receiver can be synchronized using further program logic that grabs the resulting value from the built in PTP-hardware-latch.
see here: https://en.wikipedia.org/wiki/Precision_Time_Protocol
along with e.g. 802.1AS: http://www.ieee802.org/1/pages/802.1as.html
described wikipedia in section "Related initiatives" as:
"IEEE 802.1AS-2011 is part of the IEEE Audio Video Bridging (AVB) group of standards, further extended by the IEEE 802.1 Time-Sensitive Networking (TSN) Task Group. It specifies a profile for use of IEEE 1588-2008 for time synchronization over a virtual bridged local area network (as defined by IEEE 802.1Q). In particular, 802.1AS defines how IEEE 802.3 (Ethernet), IEEE 802.11 (Wi-Fi), and MoCA can all be parts of the same PTP timing domain."
some article (in German): https://www.elektronikpraxis.vogel.de/ethernet-fuer-multimediadienste-im-automobil-a-157124/index4.html
and some presentation: http://www.ieee802.org/1/files/public/docs2008/as-kbstanton-8021AS-overview-for-dot11aa-1108.pdf
my rationale to your question is:
yes its possible. but it is a precision limited design due to the various internal things like latency and jitter of the interrupt handler you are forced to use. the achievable overall precision per pulse and in a long term run is hard to say but might be in the range of some 10 ms at startup with a single pulse to maybe/guessed 0,1 ms. - doing it means proving it. long term observations should help you unveiling the true practical caps with your very specific computer and selected software environment.

Time Short Functions with cpu time using RTEMS operating system

I am looking to profile some code in a real time operating system, RTEMS. Essentially, rtems has a bunch of functions to read the time, the most useful of which is rtems_clock_get_ticks_since_boot.
The problem here is that for whatever reason the clock ticks reported are synchronized with our state machine loop rate, 5kHz whereas the processor is running at around 200MHz (embedded system). I know this because i recorded the clock time, waited 1 sec and only 5000 ticks had gone by.
So the question is:
How can I get the actual CPU ticks from RTEMS?
PS.
clock() from GNU C (has the same problem)
There is a guide that i have been looking into here, but I get impossible constraint in asm which indicates that i would need to use some different assembler keywords. Maybe someone can point me to something similar?
Context
I want to profile some code, so essentially:
start = cpu_clock_ticks()
//Some code
time = cpu_clock_ticks() - start;
The code runs in less than 0.125ms so the 8khz counter that clock() and other rtems functions get, wont cut it.

Accurate performance measurements can be made using an oscilloscope, provided that there is a GPIO, test point or pin that the software can write to (and the oscilloscope probe can attach to).
The method here is to send a pulse to the pin. The o'scope can be set up to trigger on the pulse. Some smarter o'scopes can perform statistics on the pulse width, such as mean time and maximum time.
On our embedded system, the H/W team was nice enough to bring out 8 test points for us to use. We initialize the pin to zero. At the start of the code to profile, we write a 1 to the pin. At the end of the profiling code, we write a 0 to the pin. This produces a pulse or square wave.
The o'scope is set up to trigger on the rising edge. The probe is connected to the pin and the program is run. Adjust the o'scope so the the entire pulse is visible on the screen. Re-run the program. When the o'scope triggers, measure the width of the pulse. This will be the actual time of the execution.

So a solution to this is to use the following function:
inline unsigned long timer_now() {
unsigned int time;
// The internal timer is accessed as special purpose register #268
// (#24.576 MHz => 1tick=4.069010416E-8 sec,~.04µs
asm volatile ("mfspr %0,268; sync" : "=r" (time));
return time;
}
timer_now will return tics that are still not at the processor speed, but much faster than 8kHz, the time taken can then be calculated as tics * 0.04µs.
NOTE This may only work for the powerPC MPC5200 BSP for rtems, since it uses an assembler routine.

In RTEMS 4.11 or newer you can use rtems_counter_read to obtain high-precision counters that abstracts the CPU-specific assembly code. Please see: https://docs.rtems.org/doxygen/cpukit/html/group__ClassicCounter.html
RTEMS related questions like this are invariably answered more quickly and accurately when submitted to the subscribe-only users mailing list.

Arduino read pulse-width frequency and duty cycle from a single digital input

I'm new to Arduino and coding, but have done all the tutorials and think I'm getting a grasp on how it all works.
I have a real-world problem that I'd love to solve with the arduino.
I have a PWM signal from a fuel injector on a gasoline engine that I need to derive two separate logical functions from inside the arduino.
Determine the delay between each rising edge (to derive engine RPM)
range between 6ms - 120ms between rising edges
and
read pulse-width Duty Cycle (to determine the fuel injector's duty cycle)
Pulsewidth range from 0.02ms to over 10ms for the pulse lengths.
these need to be represented independently in the logic as "RPM" and "Pulse Width"
I have read this blog about "secrets of Arduino PWM" and find it informative on how to WRITE pulse-width outputs of varying frequency and duty cycle, but I am trying to READ pulse-widths of varying frequency and duty cycle to create a variable byte or int to use for each.

Correct there is not a lot on timing pulse inputs or alike. Where the Arduino's ATmega can capture the timing of each side of the duty cycle by the below methods. And it will be up to the code to put them together and consider them a PWM for your needs.
There are several methods with examples.
Tight loop polling of the timed events. Such as with PulseIn
A better method is to create a timer1 overflow interrupt and
during that ISR pull the pin. This is the original method that Ken
Shirriff's Infrared Library works - 50ms pull shirriff IR Library where its resolution is only as good as the overflow.
Use Pin Change Interrupts ISR to get the time. Where it will be slightly latent. Where microtherion's fork of Ken's IR library converted the overflow to PinChangeInt. Where MicroTherion's code did this discretely in the library. Where the PinChangeInt library makes it simpler.
Use the Timer Input capture. In short when the corresponding
input pin changes the system clock is captured and an interrupt is
issued. So the ISR can latently get the exact time it occurred. InputCapture.ino

I just wrote a library with an example that does exactly this. In my Timer2_Counter library, I've written an example currently titled "read_PWM_pulses_on_ANY_pin_via_pin_change_interrupt" which reads in pulses then outputs the pulse width in us, with a resolution of 0.5us, as well as the period between pulses, and the frequency of the pulses.
Download the library and check out the example. To test the example you can connect a wire from a PWM pin outputting a PWM signal to the input pin. The library with example is found here: http://www.electricrcaircraftguy.com/2014/02/Timer2Counter-more-precise-Arduino-micros-function.html
PS. this example code uses pin change interrupts and can be done on ANY Arduino pin, including the analog pins.

Low latency serial communication on Linux

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.

Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.

Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.

Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.

Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);

In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.

None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.

At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js