STM32 (using Mbed online) showing delay at higher analog input frequency - c++

I am new to the use of controllers.
I am setting up a STM32F769 Controller(Using Mbed online compiler), my target is to get a PWM output which changes its frequency according to an analog input. I did some basic coding but there is a problem. When i check the output on oscilloscope with analog input 1Hz frequency, its working perfectly, but when i check it with 100Hz analog input there is delay in the output, and i get wrong values. I do not understand why, because this board is faster(216 MHZ) and i should not face such issue. (If someone could also explain, is it possible to use the board at 216MHz or other max frequency? and how?)
1st time user
{
meas_r=0;
for(int i=1;i<=1024;i++)
{
meas_r = meas_r+analog_value.read();
}
meas_r=meas_r/1024;
meas_v = meas_r * 3300;
out_freq=50000+(meas_v*50);
pulse.period( 1.0 / out_freq);
}
}
It should be working on 100Hz analog input as it works on 1 Hz.

216MHz might be the maximum clock frequency at which your processor can operate, however it does not mean that it can input/output that much frequency from its ports.
The delays are caused by time it takes to read analog values and compute required math operations. You are using multiple multiplications and divisions which are more complex than adding and subtracting for almost any hardware device. Obviously, you are using library/libraries as well (pulse.period(), analog_value.read()), there are some hidden computations on top of those multiplications and divisions. Finally, it is possible that your device is working with other stuffs as well (only you know about this). All those computations need time. At lower frequencies you might not be able to notice the delay, however when the frequency is high enough, the delays can be noticed. Also consider time required to read the analog values many times.
Wrong signal and period is due to the delays and some other uncertainties. If the processor is working with other tasks as well, then it will be hard to predict time it takes to finish them all. As the processor executes the instructions line by line and waits for previous computation to finish before the new one starts, it causes some uncertainty in timing. Data path and frequency of peripheral devices (getting input from peripherals) play crucial role in timing uncertainty and delays.
If timing and accuracy is really important in solving the problem you have, and if you can't solve the problem with DSP, MPU, MCU, CPU, GPU, etc. I would suggest you to use an FPGA to solve the problem.

Related

how to offload precise ADC oversampling with RISC-V GD32VF103CBT6 Development Board

I'm hoping to work up a very basic audio effects device using a RISC-V GD32VF103CBT6 Development Board. I have managed to do some hardware-interrupt-based sampling with another MCU, but I'm a bit confused by the documentation for the RISC-V board. Chapter 11 in the user manual. I haven't the slightest idea how to turn the instructions there into actual C/C++ code. Sadly, their github repo has almost no examples at all, and none appear to deal with high speed sampling. There's also a datasheet in this github repo but I haven't been able to find any specific code examples or revealing instruction in there, either.
What I want to do is:
Perform the calibration described in the user manual, which must precede sampling operations.
collect 12-bit audio samples of audio signal voltage off an external pin using its oversampling capability to sum numerous 12-bit samples into a single 16-bit sample at a high sampling rate. Ultimately I want audio sampled with 16-bits at 48khz-96khz.
I need help instructing the MCU to collect these samples using its built-in hardware features.
I want to continuously sample, offloading as much as possible to built-in hardware functions so I can leave enough processing overhead left to do a bit of signal processing for simple effects.
Section 11.4.1 clearly says
Calibration should be performed before starting A/D conversion.
The calibration is initiated by software by setting bit CLB=1. CLB bit stays at 1 during all the calibration sequence. It is then cleared by hardware as soon as the calibration is completed.
The internal analog calibration can be reset by setting the RSTCLB bit in ADC_CTL1 register.
Calibration software procedure:
1) Ensure that ADCON=1.
2) Delay 14 ADCCLK to wait for ADC stability
3) Set RSTCLB (optional)
4) Set CLB=1.5.Wait until CLB=0.
Question 1: How do I set these memory registers as these instructions indicate. I need a code example, and the manufacturer provides none.
Question 2: How do I delay 14 ADDCCLK in C/C++. Seems like a loop would be enormously inefficient. Should I call sleep()? Any explanation of ADDCCLK also helpful.
This also seems important, but I have no idea what it portends:
The ADCCLK clock provided by the clock controller is synchronous APB2 clock. The RCU controller has a dedicated programmable prescaler for the ADC clock.
I am not at all certain but I think this is the conversion mode I want:
Continuous conversion mode
This mode can be run on the regular channel group. The continuous conversion mode will be enabled when CTN bit in the ADC_CTL1 register is set. In this mode, the ADC performs conversion on the channel specified in the RSQ0[4:0]. When the ADCON has been set high, the ADC samples and converts specified channel, once the corresponding software trigger or external trigger is active. The conversion data will be stored in the ADC_RDATA register.
Software procedure for continuous conversion on a regular channel. To get rid of checking, DMA can be used to transfer the converted data:
1.Set the CTN and DMA bit in the ADC_CTL1 register
2.Configure RSQ0 with the analog channel number
3.Configure ADC_SAMPTx register
4.Configure ETERC and ETSRC bits in the ADC_CTL1 register if in need
5.Prepare the DMA module to transfer data from the ADC_RDATA.
6.Set the SWRCST bit, or generate an external trigger for the regular group
ADCCLK refers the input clock of the ADC. May be take a look at your datasheet. The most µC have a block diagram of the clock architecture of the µC usually there is a main system clock and then the different peripherals have a prescaler that you can program and which divide the system clock by some power of 2.
so 14 ADCCLK cycles mean that its not 14 CPU cycles but 14 ADC-Input-Clock edges.
For example if the ADC prescaler is set to 64 then you have to wait 64*14 CPU clock cycles.
How to wait at all:
Mostly (I do not know if such a thing is present on your device) peripherals have a busy flag that is set as long the current operation is ongoing. So may be you can poll this flag (e.g. like while (ADC0_FLAGS & ADC_ISBUSY); ).
Another option may be checking if there is an interrupt that signals the completion of your operation. But at least for the calibration the simplest thing would be to start the calibration and just use a wait or delay function that just wastes a bit of time.
I personally would start the calibration on system start up and then doing other initialization stuff. May be delay at end of setup a few milliseconds to make sure all components on the board are powerd up correctly. After that the ADC should be already finished a long time.

How do I measure GPU time on Metal?

I want to see programmatically how much GPU time a part of my application consumes on macOS and iOS. On OpenGL and D3D I can use GPU timer query objects. I searched and couldn't find anything similar for Metal. How do I measure GPU time on Metal without using Instruments etc. I'm using Objective-C.
There are a couple of problems with this method:
1) You really want to know what is the GPU side latency within a command buffer most of the time, not round trip to CPU. This is better measured as the time difference between running 20 instances of the shader and 10 instances of the shader. However, that approach can add noise since the error is the sum of the errors associated with the two measurements.
2) Waiting for completion causes the GPU to clock down when it stops executing. When it starts back up again, the clock is in a low power state and may take quite a while to come up again, skewing your results. This can be a serious problem and may understate your performance in benchmark vs. actual by a factor of two or more.
3) if you start the clock on scheduled and stop on completed, but the GPU is busy running other work, then your elapsed time includes time spent on the other workload. If the GPU is not busy, then you get the clock down problems described in (2).
This problem is considerably harder to do right than most benchmarking cases I've worked with, and I have done a lot of performance measurement.
The best way to measure these things is to use on device performance monitor counters, as it is a direct measure of what is going on, using the machine's own notion of time. I favor ones that report cycles over wall clock time because that tends to weed out clock slewing, but there is not universal agreement about that. (Not all parts of the hardware run at the same frequency, etc.) I would look to the developer tools for methods to measure based on PMCs and if you don't find them, ask for them.
You can add scheduled and completed handler blocks to a command buffer. You can take timestamps in each and compare. There's some latency, since the blocks are executed on the CPU, but it should get you close.
With Metal 2.1, Metal now provides "events", which are more like fences in other APIs. (The name MTLFence was already used for synchronizing shared heap stuff.) In particular, with MTLSharedEvent, you can encode commands to modify the event's value at particular points in the command buffer(s). Then, you can either way for the event to have that value or ask for a block to be executed asynchronously when the event reaches a target value.
That still has problems with latency, etc. (as Ian Ollmann described), but is more fine grained than command buffer scheduling and completion. In particular, as Klaas mentions in a comment, a command buffer being scheduled does not indicate that it has started executing. You could put commands to set an event's value at the beginning and (with a different value) at the end of a sequence of commands, and those would only notify at actual execution time.
Finally, on iOS 10.3+ but not macOS, MTLCommandBuffer has two properties, GPUStartTime and GPUEndTime, with which you can determine how much time a command buffer took to execute on the GPU. This should not be subject to latency in the same way as the other techniques.
As an addition to Ken's comment above, GPUStartTime and GPUEndTime is now available on macOS too (10.15+):
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1639926-gpuendtime?language=objc

Time Short Functions with cpu time using RTEMS operating system

I am looking to profile some code in a real time operating system, RTEMS. Essentially, rtems has a bunch of functions to read the time, the most useful of which is rtems_clock_get_ticks_since_boot.
The problem here is that for whatever reason the clock ticks reported are synchronized with our state machine loop rate, 5kHz whereas the processor is running at around 200MHz (embedded system). I know this because i recorded the clock time, waited 1 sec and only 5000 ticks had gone by.
So the question is:
How can I get the actual CPU ticks from RTEMS?
PS.
clock() from GNU C (has the same problem)
There is a guide that i have been looking into here, but I get impossible constraint in asm which indicates that i would need to use some different assembler keywords. Maybe someone can point me to something similar?
Context
I want to profile some code, so essentially:
start = cpu_clock_ticks()
//Some code
time = cpu_clock_ticks() - start;
The code runs in less than 0.125ms so the 8khz counter that clock() and other rtems functions get, wont cut it.
Accurate performance measurements can be made using an oscilloscope, provided that there is a GPIO, test point or pin that the software can write to (and the oscilloscope probe can attach to).
The method here is to send a pulse to the pin. The o'scope can be set up to trigger on the pulse. Some smarter o'scopes can perform statistics on the pulse width, such as mean time and maximum time.
On our embedded system, the H/W team was nice enough to bring out 8 test points for us to use. We initialize the pin to zero. At the start of the code to profile, we write a 1 to the pin. At the end of the profiling code, we write a 0 to the pin. This produces a pulse or square wave.
The o'scope is set up to trigger on the rising edge. The probe is connected to the pin and the program is run. Adjust the o'scope so the the entire pulse is visible on the screen. Re-run the program. When the o'scope triggers, measure the width of the pulse. This will be the actual time of the execution.
So a solution to this is to use the following function:
inline unsigned long timer_now() {
unsigned int time;
// The internal timer is accessed as special purpose register #268
// (#24.576 MHz => 1tick=4.069010416E-8 sec,~.04µs
asm volatile ("mfspr %0,268; sync" : "=r" (time));
return time;
}
timer_now will return tics that are still not at the processor speed, but much faster than 8kHz, the time taken can then be calculated as tics * 0.04µs.
NOTE This may only work for the powerPC MPC5200 BSP for rtems, since it uses an assembler routine.
In RTEMS 4.11 or newer you can use rtems_counter_read to obtain high-precision counters that abstracts the CPU-specific assembly code. Please see: https://docs.rtems.org/doxygen/cpukit/html/group__ClassicCounter.html
RTEMS related questions like this are invariably answered more quickly and accurately when submitted to the subscribe-only users mailing list.

Arduino read pulse-width frequency and duty cycle from a single digital input

I'm new to Arduino and coding, but have done all the tutorials and think I'm getting a grasp on how it all works.
I have a real-world problem that I'd love to solve with the arduino.
I have a PWM signal from a fuel injector on a gasoline engine that I need to derive two separate logical functions from inside the arduino.
Determine the delay between each rising edge (to derive engine RPM)
range between 6ms - 120ms between rising edges
and
read pulse-width Duty Cycle (to determine the fuel injector's duty cycle)
Pulsewidth range from 0.02ms to over 10ms for the pulse lengths.
these need to be represented independently in the logic as "RPM" and "Pulse Width"
I have read this blog about "secrets of Arduino PWM" and find it informative on how to WRITE pulse-width outputs of varying frequency and duty cycle, but I am trying to READ pulse-widths of varying frequency and duty cycle to create a variable byte or int to use for each.
Correct there is not a lot on timing pulse inputs or alike. Where the Arduino's ATmega can capture the timing of each side of the duty cycle by the below methods. And it will be up to the code to put them together and consider them a PWM for your needs.
There are several methods with examples.
Tight loop polling of the timed events. Such as with PulseIn
A better method is to create a timer1 overflow interrupt and
during that ISR pull the pin. This is the original method that Ken
Shirriff's Infrared Library works - 50ms pull shirriff IR Library where its resolution is only as good as the overflow.
Use Pin Change Interrupts ISR to get the time. Where it will be slightly latent. Where microtherion's fork of Ken's IR library converted the overflow to PinChangeInt. Where MicroTherion's code did this discretely in the library. Where the PinChangeInt library makes it simpler.
Use the Timer Input capture. In short when the corresponding
input pin changes the system clock is captured and an interrupt is
issued. So the ISR can latently get the exact time it occurred. InputCapture.ino
I just wrote a library with an example that does exactly this. In my Timer2_Counter library, I've written an example currently titled "read_PWM_pulses_on_ANY_pin_via_pin_change_interrupt" which reads in pulses then outputs the pulse width in us, with a resolution of 0.5us, as well as the period between pulses, and the frequency of the pulses.
Download the library and check out the example. To test the example you can connect a wire from a PWM pin outputting a PWM signal to the input pin. The library with example is found here: http://www.electricrcaircraftguy.com/2014/02/Timer2Counter-more-precise-Arduino-micros-function.html
PS. this example code uses pin change interrupts and can be done on ANY Arduino pin, including the analog pins.

Achieving game engine determinism with threading

I would like to achieve determinism in my game engine, in order to be able to save and replay input sequences and to make networking easier.
My engine currently uses a variable timestep: every frame I calculate the time it took to update/draw the last one and pass it to my entities' update method. This makes 1000FPS games seem as fast ad 30FPS games, but introduces undeterministic behavior.
A solution could be fixing the game to 60FPS, but it would make input more delayed and wouldn't get the benefits of higher framerates.
So I've tried using a thread (which constantly calls update(1) then sleeps for 16ms) and draw as fast as possible in the game loop. It kind of works, but it crashes often and my games become unplayable.
Is there a way to implement threading in my game loop to achieve determinism without having to rewrite all games that depend on the engine?
You should separate game frames from graphical frames. The graphical frames should only display the graphics, nothing else. For the replay it won't matter how many graphical frames your computer was able to execute, be it 30 per second or 1000 per second, the replaying computer will likely replay it with a different graphical frame rate.
But you should indeed fix the gameframes. E.g. to 100 gameframes per second. In the gameframe the game logic is executed: stuff that is relevant for your game (and the replay).
Your gameloop should execute graphical frames whenever there is no game frame necessary, so if you fix your game to 100 gameframes per second that's 0.01 seconds per gameframe. If your computer only needed 0.001 to execute that logic in the gameframe, the other 0.009 seconds are left for repeating graphical frames.
This is a small but incomplete and not 100% accurate example:
uint16_t const GAME_FRAMERATE = 100;
uint16_t const SKIP_TICKS = 1000 / GAME_FRAMERATE;
uint16_t next_game_tick;
Timer sinceLoopStarted = Timer(); // Millisecond timer starting at 0
unsigned long next_game_tick = sinceLoopStarted.getMilliseconds();
while (gameIsRunning)
{
//! Game Frames
while (sinceLoopStarted.getMilliseconds() > next_game_tick)
{
executeGamelogic();
next_game_tick += SKIP_TICKS;
}
//! Graphical Frames
render();
}
The following link contains very good and complete information about creating an accurate gameloop:
http://www.koonsolo.com/news/dewitters-gameloop/
To be deterministic across a network, you need a single point of truth, commonly called "the server". There is a saying in the game community that goes "the client is in the hands of the enemy". That's true. You cannot trust anything that is calculated on the client for a fair game.
If for example your game gets easier if for some reasons your thread only updates 59 times a second instead of 60, people will find out. Maybe at the start they won't even be malicious. They just had their machines under full load at the time and your process didn't get to 60 times a second.
Once you have a server (maybe even in-process as a thread in single player) that does not care for graphics or update cycles and runs at it's own speed, it's deterministic enough to at least get the same results for all players. It might still not be 100% deterministic based on the fact that the computer is not real time. Even if you tell it to update every $frequence, it might not, due to other processes on the computer taking too much load.
The server and clients need to communicate, so the server needs to send a copy of it's state (for performance maybe a delta from the last copy) to each client. The client can draw this copy at the best speed available.
If your game is crashing with the thread, maybe it's an option to actually put "the server" out of process and communicate via network, this way you will find out pretty fast, which variables would have needed locks because if you just move them to another project, your client will no longer compile.
Separate game logic and graphics into different threads . The game logic thread should run at a constant speed (say, it updates 60 times per second, or even higher if your logic isn't too complicated, to achieve smoother game play ). Then, your graphics thread should always draw the latest info provided by the logic thread as fast as possible to achieve high framerates.
In order to prevent partial data from being drawn, you should probably use some sort of double buffering, where the logic thread writes to one buffer, and the graphics thread reads from the other. Then switch the buffers every time the logic thread has done one update.
This should make sure you're always using the computer's graphics hardware to its fullest. Of course, this does mean you're putting constraints on the minimum cpu speed.
I don't know if this will help but, if I remember correctly, Doom stored your input sequences and used them to generate the AI behaviour and some other things. A demo lump in Doom would be a series of numbers representing not the state of the game, but your input. From that input the game would be able to reconstruct what happened and, thus, achieve some kind of determinism ... Though I remember it going out of sync sometimes.