C++ Qt fast timing of asynchronous processes advice - c++

i'm currently dealing with a Qt GUI i have to set up for a measurement device. The device is working with a frame grabber card which grabs images from a line camera really fast. My image processing which is not that complex takes 0.2ms to complete, and it takes about 40ms to display the signal and the processing result with QCustomPlot which is totally okay.
Besides the GUI output the processed signal will also be put out as an analog signal by a NI DAQ device.
My problem is that i have to update the analog signal with a constant frequency and still update the GUI from time to time.
My current approach or idea was to create a data pool thread and two worker threads. One worker thread receives the data from the frame grabber, processes it an updates the data pool. The second worker thread updates the analog channel of the NI DAQ with a certain frequency of about 2-5kHz given by a clock in the NI DAQ device.
And the GUI thread would read the data pool from to time to time to update the signal display with a rate of about 20-30Hz.
I wanted to use the Qt thread management and he signal-and-slot mechanism because of its "simplicity" and because i already worked with threads in combination with Qt and its thread classes.
Is there maybe a better way, does somebody have an idead or any suggestion? Is it possible that i get problems in the timing of the threads?
Furhtermore is it possible to assign one thread to one single CPU core on a multi core CPU, so that this core only processes this single thread?

Is there maybe a better way, does somebody have an idead or any suggestion? Is it possible that i get problems in the timing of the threads?
Signal/Slot mechanism is fine, try it and if you get into performance issues you can still try to find another approach. I used Signal/Slot Mechanism for real-time video processing with QAbstractVideoSurface and Mediaplayer. It worked for me.
Furhtermore is it possible to assign one thread to one single CPU core on a multi core CPU, so that this core only processes this single thread?
Why would you do that? The operating system or threading library has a scheduler, which takes care of such things. As long you got no good reason doing this yourself, you should just use the existing way.

I would try it with three threads: 1)UI thread, 2)grab-and-process thread, 3)analogue output thread.
The trick is to use a triple buffer to connect output of grab-and-process to input of analogue output.
Say, at moment t, thread(2) finishes processing frame[(t+0)%3], change output destination to frame[(t+1)%3] immediately, and notifies thread(3), which is looping through data in frame[(t+2)%3], to switch to frame[(t+0)%3] when appropriate.
I used this technique when I was working on an image processing project that has a 10fps processing frame rate and a 60fps NTSC output frame rate. To eliminate the tearing effect, a circular buffer with three buffers is the least.

Related

Multithreading with very low latencies for calls between threads

I'm starting on a new robot control software for our robotic system which has to read out values from multiple, independent sensors and control the motors accordingly.
The software will be running on an i5 PC with PREEMPT_RT Ubuntu. Each sensor device comes with an SDK which I want to run in a separate thread. As soon as they get new values from their sensors (can be up to 50 doubles at once from one sensor) they should update those values in a superior control thread. The update rate depends on the sensor, but will be as fast as 1 kHz. As soon as the "main sensor" got new values and sent them to the control thread, the thread of the main sensor should trigger the control loop in the control thread (with a non-blocking call). The control thread should then compute new values for the motors with the currently stored sensor values and transmit them to the motors. The main sensor that will trigger the control loop is also getting new data at a 1 kHz rate, so this often the control thread must be performed.
I'm unsure now how to approach this. Do you think the threads functionality from C++11 can already solve this? Or should I use something like pthreads or boost?
The main requirements are super low latency (~ 10 µs) to send data (up to 50 doubles) from one thread to another, and the ability to trigger a function (non-blocking) in another thread.
As soon as the sensor threads sent the current data to the control thread, they should continue monitoring the hardware to check for new sensor values and retrieve them. Some of the sensor threads perform extra computations and filtering on the sensor data, that's why I want them to run in extra threads, then taking advantage of the quadcore processor.

Adapting program from single to multicore

I am considering a programming project. Will run under Ubuntu or other Linux OS on a small board. Quad core x86 - N-Series Pentium. The software generates 8 fast signals; square wave pulse trains for stepper motor motion control of 4 axes. Step signals being 50-100 KHz maximum, but usually slower. Want to avoid jitter in these stepping signals (call it good fidelity), so that around 1-2us for each thread loop cycle would be a nice target. The program does other kinds of tasks, like read/write hard drive, Ethernet, continues update on the graphics display, keyboard. The existing Single core programs just can not process motion signals with this kind of timing and require external hardware/techniques to achieve this.
I have been reading other posts here, like on a thread running selected core, continuously. The exact meaning in these posts is "lose", not sure really what is meant. Continuous might mean testing every minute or ?????
So, I might be wordy, but it will clear I hope. The considered program has all the threads, routines, memory, shared memory all included. I am not considering that this program launches another program or service. Other threads are written in this program and launched when the program starts up. I will call this signal generating thread the FAST THREAD.
The FAST THREAD is to be launched to an otherwise "free" core. It needs to be the only thread that runs on the core. Hopefully, the OS thread scheduler component on that core can be "turned off", so that it does not even interrupt on that core to decide what thread runs next. In looking at the processor manual, Each core has a counter timer chip. Is it possible then that I can use it to provide a continuous train of interrupts then into my "locked in" FAST THREAD for timing purposes? This is the range of about 1-2 us. If not, then just reading one channel on that CTC to provide software sync. This fast thread will, therefore, see (experience) no delays from the interrupts issued in the other cores and associated multicore fabric. This FAST THREAD, when running, will continue to run until the program closes. This could be hours.
Input data to drive this FAST THREAD will be common shared memory defined in the program. There are also hardware signals for motion limits (From GPIOs or SDI port). If any go TRUE, that forces a programmed halt all motion. It does not need a 1~2us response. It could go to a slower Motion loop.
Ah, the output:
Some motion data is written back to the shared memory (assigned for this purpose). Like current location, and current loop number,
Some signals need to be output (the 8 outputs). There are numerous free GPIOs. Not sure of the path taken to get the signaled GPIO pin to change the output. The system call to Linux initiates the pin change event. There is also an SDI port available, running up to the 25Mhz clock. It seems these ports (GPIO, UART, USB, SDI) exist in the fabric that is not on any specific core. I am not sure of the latency from the issuance of these signals in the program until the associated external pin actually presents that signal. In the fast thread, even 10us would be OK, if it was always the same latency! I know that will not be so, there will jitter. I need to think on this spec.
There will possibly be a second dedicated core (similar to above) for slower motion planning. That leaves two cores for everything thing else. Since then everything else items (sata, video screen, keyboard ...) are already able to work in a single core, then the remaining two cores should be great.
At close of program, the FAST THREAD returns the CTC and any other device on its core back to "as it was", re-enables the OS components in this core to their more normal operation. End of thread.
Concluding: I have described the overall program, so as for you to understand what I want to do with this FAST THREAD running, how responsive it needs to be, and that it needs to be undisturbed!! This processor runs in the 1.5 ~ 2.0 GHz range. It certainly can do the repeated calculations in the required time frame.
DESIRED: I do not know the system calls that would allow me to use a selected x86 core in this way. Any pointers would be helpful. Any manual or document that described these calls/procedures.
Can this use of a core also be done in windows 7,10)?
Thanks for reading and any pointers you have.
Stan

How to keep UI responsive when CPU load is 100% (Mainly using C++ and Qt)?

I'm facing a problem between where I need to keep my UI (and the full OS) responsive in multi-threaded application.
I'm developing an application (c++ and Qt based) which received and transform lot of video frame from multiple stream at the same time.
Each stream is retrieved, transformed and rendered in its own separate worker thread (using DirectX). That means I'm not using the default GUI thread to render the frame.
On a powerful computer I have no problem because the cpu can process all data and keep time for the GUI thread to process user request. But on a old computer, it doesn't work, the CPU is used at 100% to process my data, and UI is lagging, it may takes 10 seconds before a button click become processed.
I would like to keep my UI responsive. In fact, I want my worker thread works only if there is no others action to do. I tried to change the worker thread priority to low, but it doesn't work. I also tried a sleep(10) in the worker thread, but because I can have lots of threads, they don't fall in sleep at the same time, so it's not working either.
What is the best way to keep an UI responsive in that case (whatever the toolkit use)?
can't add my comments on above list so I've to add my few cents here:
if You want OS more responsive then make sure you don't consume too much RAM and start process in lower priority - afaik thread priorities are taken into account only when OS has to decide which thread from process should be run, but whole process still works on 100% cpu when other processes from system are taken into account
make sure not to run too many threads, good solution is to create as many threads that use 100% cpu as there are cores, if you want more then use multitasking techniques
One thing to check - how You do video display? Do you make sure your display rate (data from streams) matches display card's refresh rate? When You have data to display do You notify main thread about need to update screen (better solution) or You force frame display from each thread (bad solution)?

multiple processing in WinCE6.0 or DMA implementation

in my application i want to do the task parallely like one thread will do the calculation, and other will draw the data on screen, but while drawing the data processor is gettign engaged and during that time it is not able to process the data of diffrent thread. i runnig both thread on above normal priorty. Is there any way in whch i can do the drawing parallely, so that measurment thread can do the calculation at that speed wthout getting affected by drawing thread. i heared from some one DMA can solve the problem, but how to imlement it in WINCE6.0 platform i have no idea.
Pls provide any pointer
Mukesh
No idea how DMA would "solve" this issue - you're using a single processor core, which can only execute one set of instructions at a time. DMA won't change that.
The problem you're having sounds like you're using the processor at just about full capacity and so you're not seeing much time sharing between your threads. There are generally 2 ways to approach this.
1) adjust the priority of your more important thread to have it get more time from the scheduler to do its work.
or
2) adjust the thread quantum for your threads to force the scheduler to swap between threads more frequently.

Multiple Producers Single Consumer Queue

I am new to multithreading and have designed a program that receives data from two microcontroller measuring various temperatures (Ambient and Water) and draws the data to the screen. Right now the program is singly threaded and its performance SUCKS A BIG ONE.
I get basic design approaches with multithreading but not well enough to create a thread to do a task but what I don't get is how to get threads to perform seperate task and place the data into a shared data pool. I figured that I need to make a queue that has one consumer and multiple producers (would like to use std::queue). I have seen some code on the gtkmm threading docs that show a single Con/Pro queue and they would lock the queue object produce data and signal the sleeping thread that it is finished then the producer would sleep. For what I need would I need to sleep a thread, would there be data conflicts if i didn't sleep any of the threads, and would sleeping a thread cause a data signifcant data delay (I need realtime data to be drawn 30 frames a sec)
How would I go about coding such a queue using the gtkmm/glibmm library.
Here's a suggestion:
1. Have two threads, that are responsible for obtaining data and placing into a buffer. Each thread has it's own (circular) buffer.
2. There will be a third thread that is responsible for getting data from the buffers and displaying on the screen.
3. The screen thread sends messages to the data threads requesting some data, then displays the data. The messages help synchronize execution and avoid dead-locks.
4. None of the threads should "wait on single or multiple objects", but poll for events.
Think of this scenario using people. One person is delivering water temperature readings. Another person delivering ambient temperature readings. A third person receives or asks for the data and displays the data (on a white board). The objective is to keep everybody operating at maximum efficiency without any collisions.
If you're looking for a lock free implementation of this, you won't find one. When data structures are being written to, something needs to keep two threads from simultaneously updating the data structure and corrupting it.
Is there any reason you can't have each thread collect on it's own, with it's own structure, and then combine the results at the end?