Timing inconsistent for CAN message transmission - c++

I am attempting to write a program in C++ that does some video processing using OpenCV and then uses the information from the video to send a message onto a CAN bus using PCAN-basic.
When the code for the CAN bus is running by itself, the timing on the messages are pretty good i.e. the system that I am talking to does not complain. However when the OpenCV part of the program is introduced, then the cycle time intermittently increases to an unacceptable value, which causes issues.
I am using chrono::high_resolution_clock to compare a start time and the time now. If this comparison is >10ms then I am sending a CAN message and restarting the clock.
I have tried the following:
Updated OpenCV to latest version (in hope that it would run faster/free up resources)
Set the thread priority of the thread that the CAN message function lives in, to be of a higher priority. Set as 0, which I assume is the highest priority.
Lowered the comparison to send out the message at 8ms, this was intended as a workaround, not a fix.
//Every 10ms send a CAN signal
chrono::duration<double, milli>xyTimeDifference = timeNow - xyTimer;
xyTimerCompare = xyTimeDifference.count();
if (xyTimerCompare > 10)
{
if (xyTimerCompare > 16)
{
cout << "xyTimerComapare went over by: " << xyTimerCompare << endl;
}
result = CAN_Write(PCAN_USBBUS1, &joystickXY);
//Reset the timer
xyTimer = chrono::high_resolution_clock::now();
if (result != PCAN_ERROR_OK)
{
break;
}
}
Is there a better method to obtain a reliable signal to within +/- 1ms?

Related

Qt QTcpSocket Reading Data Overlap Causes Invalid TCP Behavior During High Bandwidth Reading and Writing

Summary: Some of the memory within the TCP socket to be overwritten by other incoming data.
Application:
A client/server system that utilizes TCP within Qt (QTcpSocket and QTcpServer). The client request a frame from the server(just a simple string message), and the response (Server -> Client) which consists of that frame (614400 bytes for testing purposes). Frame sizes are established in advance and are fixed.
Implementation Details:
From the guarantees of the TCP protocol (Server -> Client), I know that I should be able to read the 614400 bytes from the socket and that they are in order. If any either of these two things fails, the connection must have failed.
Important Code:
Assuming the socket is connected.
This code requests a frame from the server. Known as the GetFrame() function.
// Prompt the server to send a frame over
if(socket->isWritable() && !is_receiving) { // Validate that socket is ready
is_receiving = true; // Forces only one request to go out at a time
qDebug() << "Getting frame from socket..." << image_no;
int written = SafeWrite((char*)"ReadyFrame"); // Writes then flushes the write buffer
if (written == -1) {
qDebug() << "Failed to write...";
return temp_frame.data();
}
this->SocketRead();
is_receiving = false;
}
qDebug() << image_no << "- Image Received";
image_no ++;
return temp_frame.data();
This code waits for the frame just requested to be read. This is the SocketRead() function
size_t byte_pos = 0;
qint64 bytes_read = 0;
do {
if (!socket->waitForReadyRead(500)) { // If it timed out return existing frame
if (!(socket->bytesAvailable() > 0)) {
qDebug() << "Timed Out" << byte_pos;
break;
}
}
bytes_read = socket->read((char*)temp_frame.data() + byte_pos, frame_byte_size - byte_pos);
if (bytes_read < 0) {
qDebug() << "Reading Failed" << bytes_read << errno;
break;
}
byte_pos += bytes_read;
} while (byte_pos < frame_byte_size && is_connected); // While we still have more pixels
qDebug() << "Finished Receiving Frame: " << byte_pos;
As shown in the code above, I read until the frame is fully received (where the number of bytes read is equal to the number of bytes in the frame).
The issue that I'm having is that the QTcpSocket read operation is skipping bytes in ways that are not in line with the guarantees of the TCP protocol. Since I skip bytes I end up not reaching the end of the while loop and just "Time Out". Why is this happening?
What I have done so far:
The data that the server sends is directly converted into uint16_t (short) integers which are used in other parts of the client. I have changed the server to simply output data that just counts up adding one for each number sent. Since the data type is uint16_t and the number of bytes exceeds that maximum number for that integer type, the int-16's will loop every 65535.
This is a data visualization software so this debugging configuration (on the client side) leads to something like this:
I have determined (and as you can see a little at the bottom of the graphic) that some bytes are being skipped. In the memory of temp_frame it is possible to see the exact point at which the memory skipped:
Under correct circumstances, this should count up sequentially.
From Wireshark and following this specific TCP connection I have determined that all of the bytes are in fact arriving (all 6114400), and that all the numbers are in order (I used a python script to ensure counting was sequential).
This is work on an open source project so this is the whole code base for the client.
Overall, I don't see how I could be doing something wrong in this solution, all I am doing is reading from the socket in the standard way.
Caveat: This isn't a definitive answer to your problem, but some things to try (it's too large for a comment).
With (e.g.) GigE, your data rate is ~100MB/s. With a [total] amount of kernel buffer space of 614400, this will be refilled ~175 times per second. IMO, this is still too small. When I've used SO_RCVBUF [for a commercial product], I've used a minimum of 8MB. This allows a wide(er) margin for task switch delays.
Try setting something huge like 100MB to eliminate this as a factor [during testing/bringup].
First, it's important to verify that the kernel and NIC driver can handle the throughput/latency.
You may be getting too many interrupts/second and the ISR prolog/epilog overhead may be too high. The NIC card driver can implement polled vs interrupt driver with NAPI for ethernet cards.
See: https://serverfault.com/questions/241421/napi-vs-adaptive-interrupts
See: https://01.org/linux-interrupt-moderation
You process/thread may not have high enough priority to be scheduled quickly.
You can use the R/T scheduler with sched_setscheduler, SCHED_RR, and a priority of (e.g.) 8. Note: going higher than 11 kills the system because at 12 and above you're at a higher priority than most internal kernel threads--not a good thing.
You may need to disable IRQ balancing and set the IRQ affinity to a single CPU core.
You can then set your input process/thread locked to that core [with sched_setaffinity and/or pthread_setaffinity].
You might need some sort of "zero copy" to bypass the kernel copying from its buffers into your userspace buffers.
You can mmap the kernel socket buffers with PACKET_MMAP. See: https://sites.google.com/site/packetmmap/
I'd be careful about the overhead of your qDebug output. It looks like an iostream type implementation. The overhead may be significant. It could be slowing things down significantly.
That is, you're not measuring the performance of your system. You're measuring the performance of your system plus the debugging code.
When I've had to debug/trace such things, I've used a [custom] "event" log implemented with an in-memory ring queue with a fixed number of elements.
Debug calls such as:
eventadd(EVENT_TYPE_RECEIVE_START,some_event_specific_data);
Here eventadd populates a fixed size "event" struct with the event type, event data, and a hires timestamp (e.g. struct timespec from clock_gettime(CLOCK_MONOTONIC,...).
The overhead of each such call is quite low. The events are just stored in the event ring. Only the last N are remembered.
At some point, your program triggers a dump of this queue to a file and terminates.
This mechanism is similar to [and modeled on] a H/W logic analyzer. It is also similar to dtrace
Here's a sample event element:
struct event {
long long evt_tstamp; // timestamp
int evt_type; // event type
int evt_data; // type specific data
};

Busy Loop/Spinning sometimes takes too long under Windows

I'm using a windows 7 PC to output voltages at a rate of 1kHz. At first I simply ended the thread with sleep_until(nextStartTime), however this has proven to be unreliable, sometimes working fine and sometimes being of by up to 10ms.
I found other answers here saying that a busy loop might be more accurate, however mine for some reason also sometimes takes too long.
while (true) {
doStuff(); //is quick enough
logDelays();
nextStartTime = chrono::high_resolution_clock::now() + chrono::milliseconds(1);
spinStart = chrono::high_resolution_clock::now();
while (chrono::duration_cast<chrono::microseconds>(nextStartTime -
chrono::high_resolution_clock::now()).count() > 200) {
spinCount++; //a volatile int
}
int spintime = chrono::duration_cast<chrono::microseconds>
(chrono::high_resolution_clock::now() - spinStart).count();
cout << "Spin Time micros :" << spintime << endl;
if (spinCount > 100000000) {
cout << "reset spincount" << endl;
spinCount = 0;
}
}
I was hoping that this would work to fix my issue, however it produces the output:
Spin Time micros :9999
Spin Time micros :9999
...
I've been stuck on this problem for the last 5 hours and I'd very thankful if somebody knows a solution.
According to the comments this code waits correctly:
auto start = std::chrono::high_resolution_clock::now();
const auto delay = std::chrono::milliseconds(1);
while (true) {
doStuff(); //is quick enough
logDelays();
auto spinStart = std::chrono::high_resolution_clock::now();
while (start > std::chrono::high_resolution_clock::now() + delay) {}
int spintime = std::chrono::duration_cast<std::chrono::microseconds>
(std::chrono::high_resolution_clock::now() - spinStart).count();
std::cout << "Spin Time micros :" << spintime << std::endl;
start += delay;
}
The important part is the busy-wait while (start > std::chrono::high_resolution_clock::now() + delay) {} and start += delay; which will in combination make sure that delay amount of time is waited, even when outside factors (windows update keeping the system busy) disturb it. In case that the loop takes longer than delay the loop will be executed without waiting until it catches up (which may be never if doStuff is sufficiently slow).
Note that missing an update (due to the system being busy) and then sending 2 at once to catch up might not be the best way to handle the situation. You may want to check the current time inside doStuff and abort/restart the transmission if the timing is wrong by more then some acceptable amount.
On Windows I dont think its possible to ever get such precise timing, because you can not garuntee your thread is actually running at the time you desire. Even with low CPU usage and setting your thread to real time priority, it can still be interuptted (Hardware interupts as I understand. Never fully investigate but even a simple while(true) ++i; type loop at realtime Ive seen get interupted then moved between CPU cores). While such interrupts and switching for a realtime thread is very quick, its still significant if your trying to directly drive a signal without buffering.
Instead you really want to read and write buffers of digital samples (so at 1KHz each sample is 1ms). You need to be sure to queue another buffer before the last one is completed, which will constrain how small they can be, but at 1KHz at realtime priority if the code is simple and no other CPU contention a single sample buffer (1ms) might even be possible, which is at worst 1ms extra latency over "immediate" but you would have to test. You then leave it up to the hardware and its drivers to handle the precise timing (e.g. make sure each output sample is "exactly" 1ms to the accuracy the vendor claims).
This basically means your code only has to be accurate to 1ms in worst case, rather than trying to persue somthing far smaller than the OS really supports such as microsecond accuracy.
As long as you are able to queue a new buffer before the hardware used up the previous buffer, it will be able to run at the desired frequency without issue (to use audio as an example again, while the tolerated latencies are often much higher and thus the buffers as well, if you overload the CPU you can still sometimes hear auidble glitches where an application didnt queue up new raw audio in time).
With careful timing you might even be able to get down to a fraction of a millisecond by waiting to process and queue your next sample as long as possible (e.g. if you need to reduce latency between input and output), but remember that the closer you cut it the more you risk submitting it too late.

Qt Process Events processing for longer than specified

I've hit a bit of an issue and I'm not sure what to make of it.
I'm running Qt 4.8.6, Qt creator 3.3.2, environment in Ubuntu 12.04 cross compiling to a Beaglebone Black running Debian 7 kernel 3.8.13.
The issue that I'm seeing is that this code:
if (qApp->hasPendingEvents())
{
qDebug() << "pending events";
}
qApp->processEvents(QEventLoop::AllEvents, 10);
does not function as it should according to (at least my interpretation of) the Qt documentation. I would expect the process events loop to function for AT MOST the 10 milliseconds specified.
What happens is the qDebug statement is never printed. I would then expect that there are therefore no events to be processed, and the process events statement goes in and out very quickly. Most of the time this is the case.
What happens (not every time, but often enough) the qDebug statement is skipped, and the processEvents statement executes for somewhere between 1 and 2 seconds.
Is there some way that I can dig into what is happening in the process events and find out what is causing the delay?
Qt is processing events for longer than specified for QApplication::processEvents call on Linux
system. Is there some way that I can dig into what is happening in the
process events and find out what is causing the delay?
Yes, observing Qt source code may help. The source code is in /home/myname/software/Qt/5.5/Src/qtbase/src/corelib/kernel/qeventdispatcher_unix.cpp or maybe somewhere around that:
bool QEventDispatcherUNIX::processEvents(QEventLoop::ProcessEventsFlags flags)
{
Q_D(QEventDispatcherUNIX);
d->interrupt.store(0);
// we are awake, broadcast it
emit awake();
// This statement implies forcing events from system event queue
// to be processed now with doSelect below
QCoreApplicationPrivate::sendPostedEvents(0, 0, d->threadData);
int nevents = 0;
const bool canWait = (d->threadData->canWaitLocked()
&& !d->interrupt.load()
&& (flags & QEventLoop::WaitForMoreEvents));
if (canWait)
emit aboutToBlock();
if (!d->interrupt.load()) {
// return the maximum time we can wait for an event.
timespec *tm = 0;
timespec wait_tm = { 0l, 0l };
if (!(flags & QEventLoop::X11ExcludeTimers)) {
if (d->timerList.timerWait(wait_tm))
tm = &wait_tm;
}
if (!canWait) {
if (!tm)
tm = &wait_tm;
// no time to wait
tm->tv_sec = 0l;
tm->tv_nsec = 0l;
}
// runs actual event loop with POSIX select
nevents = d->doSelect(flags, tm);
It seems there system posted events that are not accounted for qApp->hasPendingEvents(). And then QCoreApplicationPrivate::sendPostedEvents(0, 0, d->threadData); flushes those events to be processed by d->doSelect. If I was solving this task I would try to either flush those posted events out or maybe realize if and why flags parameter has QEventLoop::WaitForMoreEvents bit set. I usually either build Qt from source code or provide debugger with the path to its symbols/source so it is possible to dig in there.
P.S. I glanced at Qt 5.5.1 source event processing code but that should be very similar to what you deal with. Or could that implementation actually be bool QEventDispatcherGlib::processEvents(QEventLoop::ProcessEventsFlags flags)? It is easy to find on an actual system.

running a background process on arduino

I am trying to get my arduino mega to run a function in the background while it is also running a bunch of other functions.
The function that I am trying to run in the background is a function to determine wind speed from an anemometer. The way it processes the data is similar to that of an odometer in that it reads the number of turns that the anemometer makes during a set time period and then takes that number of turns over the time to determine the wind speed. The longer time period that i have it run over the more accurate data i receive as there is more data to average.
The problem that i have is there is a bunch of other data that i am also reading in to the arduino which i would like to be reading in once a second. This one second time interval is too short for me to get accurate wind readings as not enough revolutions are being completed by the anemometer to give high accuracy wind data.
Is there a way to have the wind sensor function run in the background and update a global variable once every 5 seconds or so while the rest of my program is running simultaneously and updating the other data every second.
Here is the code that i have for reading the data from the wind sensor. Every time the wind sensor makes a revolution there is a portion where the signal reads in as 0, otherwise the sensor reads in as a integer larger than 0.
void windmeterturns(){
startime = millis();
endtime = startime + 5000;
windturncounter = 0;
turned = false;
int terminate = startime;
while(terminate <= endtime){
terminate = millis();
windreading = analogRead(windvelocityPin);
if(windreading == 0){
if(turned == true){
windturncounter = windturncounter + 1;
turned = false;
}
}
else if(windreading >= 1){
turned = true;
}
delay(5);
}
}
The rest of the processing of takes place in another function but this is the one that I am currently struggling with. Posting the whole code would not really be reasonable here as it is close to a 1000 lines.
The rest of the functions run with a 1 second delay in the loop but as i have found through trial and error the delay along with the processing of the other functions make it so that the delay is actually longer than a second and it varies based off of what kind of data i am reading in from the other sensors so a 5 loop counter for timing i do not think will work here
Let Interrupts do the work for you.
In short, I recommend using a Timer Interrupt to generate a periodic interrupt that measures the analog reading in the background. Subsequently this can update a static volatile variable.
See my answer here as it is a similar scenario, detailing how to use the timer interrupt. Where you can replace the callback() with your above analogread and increment.
Without seeing how the rest of your code is set up, I would try having windturncounter as a global variable, and add another integer that is iterated every second your main program loops. Then:
// in the main loop
if(iteratorVariable >= 5){
iteratorVariable = 0;
// take your windreading and implement logic here
} else {
iteratorVariable++;
}
I'm not sure how your anemometer stores data or what other challenges you might be facing, so this may not be a 100% solution, but it would allow you to run the logic from your original post every five seconds.

Make select based loop as responsive as possible

This thread will be very responsive to network activity but can be guaranteed to process the message queue only as often as 100 times a second. I can keep reducing the timeout but after a certain point I will be busy-waiting and chewing up CPU. Is it true that this solution is about as good as I'll get without switching to another method?
// semi pseudocode
while (1) {
process_thread_message_queue(); // function returns near-instantly
struct timeval t;
t.tv_sec = 0;
t.tv_usec = 10 * 1000; // 10ms = 0.01s
if (select(n,&fdset,0,0,t)) // see if there are incoming packets for next 1/100 sec
{
... // respond with more packets or processing
}
}
It depends on what your OS provides for your. On Windows you can wait for a thread message and a bunch of handles simultaneously using MsgWaitForMultipleObjectsEx. This solves your problem. On other OS you should have something similar.