What is the most efficient way to calling a function every n seconds in c++? - c++

So I'm trying to call a function every n seconds. The below is a simple representation of what I'm trying to achieve. I wanted to know if the below method is the only way to achieve this. I would love if the "if" condition can be avoided.
#include <stdio.h>
#include <time.h>
void print_hello(int i) {
printf("hello\n");
printf("%d\n", i);
}
int main () {
time_t start_t, end_t;
double diff_t;
time(&start_t);
int i = 0;
while(1) {
time(&end_t);
// printf("here in main");
i = i + 1;
diff_t = difftime(end_t, start_t);
if(diff_t==5) {
// printf("Execution time = %f\n", diff_t);
print_hello(i);
time(&start_t);
}
}
return(0);
}

The usage of time in OPs program can be reduced to something like
// get tStart;
// set tEnd = tStart + x;
do {
// get t;
} while (t < tEnd);
This is what is called busy-wait.
It might be used to write code with most precise timing as well as in other special cases. The draw-back is that the waiting consumes ful CPU load. (You might be even able to hear this – by raising ventilation noise.)
In general, however, spinning is considered an anti-pattern and should be avoided, as processor time that could be used to execute a different task is instead wasted on useless activity.
Another option is to delegate the wake-up to the system, which reduces the load of process/thread to minimum while waiting:
#include <chrono>
#include <iostream>
#include <thread>
void print_hello(int i)
{
std::cout << "hello\n"
<< i << '\n';
}
int main ()
{
using namespace std::chrono_literals; // to support e.g. 5s for 5 sceconds
auto tStart = std::chrono::system_clock::now();
for (int i = 1; i <= 3; ++i) {
auto tEnd = tStart + 2s;
std::this_thread::sleep_until(tEnd);
print_hello(i);
tStart = tEnd;
}
}
Output:
hello
1
hello
2
hello
3
Live Demo on coliru
(I had to reduce number of iterations and the waiting times to prevent the TLE in online compiler.)
std::this_thread::sleep_until
Blocks the execution of the current thread until specified sleep_time has been reached.
The clock tied to sleep_time is used, which means that adjustments of the clock are taken into account. Thus, the duration of the block might, but might not, be less or more than sleep_time - Clock::now() at the time of the call, depending on the direction of the adjustment. The function also may block for longer than until after sleep_time has been reached due to scheduling or resource contention delays.
The last sentence mentions the draw-back of this solution: The OS may decide to wake-up the thread/process later than requested. That may happen e.g. is OS is under high load. In the “normal” case, the latency shouldn't be more than a few milli-seconds. So, the latency might be tolerable.
Please, note how tEnd and tStart are updated in loop. The current wake-up time is not considered to prevent accumulation of latencies.

Related

How to call a function in a certain frequency, C++

I am a beginner to C++, trying to improve my skills by working on a project.
I am trying to have my program call a certain function 100 times a second for 30 seconds.
I thought that this would be a common, well documented problem but so far I did not manage to find a solution.
Could anyone provide me with an implementation example or point me towards one?
Notes: my program is intended to be single-threaded and to use only the standard library.
There are two reasons you couldn't find a trivial answer:
This statement "I am trying to have my program call a certain function 100 times a second for 30 seconds" is not well-defined.
Timing and scheduling is a very complication problem.
In a practical sense, if you just want something to run approximately 100 times a second for 30 seconds, assuming the function doesn't take long to run, you can say something like:
for (int i=0;i<3000;i++) {
do_something();
this_thread::sleep_for(std::chrono::milliseconds(10));
}
This is an approximate solution.
Problems with this solution:
If do_something() takes longer than around 0.01 milliseconds your timing will eventually be way off.
Most operating systems do not have very accurate sleep timing. There is no guarantee that asking to sleep for 10 milliseconds will wait for exactly 10 milliseconds. It will usually be approximately accurate.
You can use std::this_thread::sleep_until and calculate the end time of the sleep according to desired frequency:
void f()
{
static int counter = 0;
std::cout << counter << '\n';
++counter;
}
int main() {
using namespace std::chrono_literals;
using Clock = std::chrono::steady_clock;
constexpr auto period = std::chrono::duration_cast<std::chrono::milliseconds>(1s) / 100; // conversion to ms needed to prevent truncation in integral division
constexpr auto repetitions = 30s / period;
auto const start = Clock::now();
for (std::remove_const_t<decltype(repetitions)> i = 1; i <= repetitions; ++i)
{
f();
std::this_thread::sleep_until(start + period * i);
}
}
Note that this code will not work, if f() takes more than 10ms to complete.
Note: The exact duration of the sleep_until calls may be off, but the fact that the sleep duration is calculated based on the current time by sleep_until should result in any errors being kept to a minimum.
You can't time it perfectly, but you can try like this:
using std::chrono::steady_clock;
using namespace std::this_thread;
auto running{ true };
auto frameTime{ std::chrono::duration_cast<steady_clock::duration>(std::chrono::duration<float>{1.0F / 100.0F}) }
auto delta{ steady_clock::duration::zero() };
while (running) {
auto t0{ steady_clock::now() };
while (delta >= frameTime) {
call_your_function(frameTime);
delta -= frameTime;
}
if (const auto dt{ delta + steady_clock::now() - t0 }; dt < frameTime) {
sleep_for(frameTime - dt);
delta += steady_clock::now() - t0;
}
else {
delta += dt;
}
}

Best way to implement a high resolution timer

What is the best way in C++11 to implement a high-resolution timer that continuously checks for time in a loop, and executes some code after it passes a certain point in time? e.g. check what time it is in a loop from 9am onwards and execute some code exactly at 11am. I require the timing to be precise (i.e. no more than 1 microsecond after 9am).
I will be implementing this program on Linux CentOS 7.3, and have no issues with dedicating CPU resources to execute this task.
Instead of implementing this manually, you could use e.g. a systemd.timer. Make sure to specify the desired accuracy which can apparently be as precise as 1us.
a high-resolution timer that continuously checks for time in a loop,
First of all, you do not want to continuously check the time in a loop; that's extremely inefficient and simply unnecessary.
...executes some code after it passes a certain point in time?
Ok so you want to run some code at a given time in the future, as accurately as possible.
The simplest way is to simply start a background thread, compute how long until the target time (in the desired resolution) and then put the thread to sleep for that time period. When your thread wakes up, it executes the actual task. This should be accurate enough for the vast majority of needs.
The std::chrono library provides calls which make this easy:
System clock in std::chrono
High resolution clock in std::chrono
Here's a snippet of code which does what you want using the system clock (which makes it easier to set a wall clock time):
// c++ --std=c++11 ans.cpp -o ans
#include <thread>
#include <iostream>
#include <iomanip>
// do some busy work
int work(int count)
{
int sum = 0;
for (unsigned i = 0; i < count; i++)
{
sum += i;
}
return sum;
}
std::chrono::system_clock::time_point make_scheduled_time (int yyyy, int mm, int dd, int HH, int MM, int SS)
{
tm datetime = tm{};
datetime.tm_year = yyyy - 1900; // Year since 1900
datetime.tm_mon = mm - 1; // Month since January
datetime.tm_mday = dd; // Day of the month [1-31]
datetime.tm_hour = HH; // Hour of the day [00-23]
datetime.tm_min = MM;
datetime.tm_sec = SS;
time_t ttime_t = mktime(&datetime);
std::chrono::system_clock::time_point scheduled = std::chrono::system_clock::from_time_t(ttime_t);
return scheduled;
}
void do_work_at_scheduled_time()
{
using period = std::chrono::system_clock::period;
auto sched_start = make_scheduled_time(2019, 9, 17, // date
00, 14, 00); // time
// Wait until the scheduled time to actually do the work
std::this_thread::sleep_until(sched_start);
// Figoure out how close to scheduled time we actually awoke
auto actual_start = std::chrono::system_clock::now();
auto start_delta = actual_start - sched_start;
float delta_ms = float(start_delta.count())*period::num/period::den * 1e3f;
std::cout << "worker: awoken within " << delta_ms << " ms" << std::endl;
// Now do some actual work!
int sum = work(12345);
}
int main()
{
std::thread worker(do_work_at_scheduled_time);
worker.join();
return 0;
}
On my laptop, the typical latency is about 2-3ms. If you use the high_resolution_clock you should be able to get even better results.
There are other APIs you could use too, such as Boost where you could use ASIO to implement high res timeout.
I require the timing to be precise (i.e. no more than 1 microsecond after 9am).
Do you really need it to be accurate to the microsecond? Consider that at this resolution, you will also need to take into account all sorts of other factors, including system load, latency, clock jitter, and so on. Your code can start to execute at close to that time, but that's only part of the problem.
My suggestion would be to use timer_create(). This allows you to get notified by a signal at a given time. You can then implement your action in the signal handler.
In any case you should be aware that the accuracy of course depends on the system clock accuracy.

c++ while loop timer varying wildly in accuracy

I am trying to use a while loop to create a timer that consistently measures out 3000μs (3 ms) and while it works most of the time, other times the timer can be late by as much as 500μs. Why does this happen and is there a more precise way to make a timer like this?
int getTime() {
chrono::microseconds μs = chrono::duration_cast< chrono::microseconds >(
chrono::system_clock::now().time_since_epoch() //Get time since last epoch in μs
);
return μs.count(); //Return as integer
}
int main()
{
int target = 3000, difference = 0;
while (true) {
int start = getTime(), time = start;
while ((time-start) < target) {
time = getTime();
}
difference = time - start;
if (difference - target > 1) { //If the timer wasn't accurate to within 1μs
cout << "Timer missed the mark by " + to_string(difference - target) + " microseconds" << endl; //Log the delay
}
}
return 0;
}
I would expect this code to log delays that are consistently within 5 or so μs, but the console output looks like this.
Edit to clarify: I'm running on Windows 10 Enterprise Build 16299, but the behavior persists on a Debian virtual machine.
You need to also take into account other running processes. The operating system is likely preempting your process to give CPU time to those other processes/threads, and will non-deterministically return control to your process/thread running this timer.
Granted, this is not 100% true when we consider real-time operating systems or flat-schedulers. But this is likely the case in your code if you're running on a general purpose machine.
Since you are running on Windows, that RTOS is responsible for keeping time through NTP, as C++ has no built-in functions for it. Check out this Windows API for the SetTimer() function: http://msdn.microsoft.com/en-us/library/ms644906(v=vs.85).aspx.
If you want the best and most high-resolution clock through C++, check out the chrono library:
#include <iostream>
#include <chrono>
#include "chrono_io"
int main()
{
typedef std::chrono::high_resolution_clock Clock;
auto t1 = Clock::now();
auto t2 = Clock::now();
std::cout << t2-t1 << '\n';
}

std::thread does not start immediately as expected (c++11)

I have the following code in my main.cpp
std::thread t1(&AgentsSourcesManager::Run, &sim.GetAgentSrcManager());
doSomething(); // in the main Thread
t1.join();
I was expecting t1 to start immediately and start along the main thread.
However, this is not the case. I measure the execution time of my program, repeat this 100 times and make some plots.
See the peak in the following picture.
Now if I wait a bit after the creation of t1
std::this_thread::sleep_for(std::chrono::milliseconds(100));
I get better results. See the following picture.
(Still with a peak there, but well ..)
Obviously my questions are:
Why a peak?
Why I don't have a straight line?
EDIT
Ok, from the comments I understand by now, that there might be some scheduler magic going on.
Here is a working example
#include <thread>
#include <chrono>
#include <iostream>
#include <pthread.h>
#include <functional>
int main() {
float x = 0; float y = 0;
std::chrono::time_point<std::chrono::system_clock> start, stop;
start= std::chrono::system_clock::now();
auto Thread = std::thread([](){std::cout<<"Excuting thread"<<std::endl;});
stop = std::chrono::system_clock::now();
for(int i = 0 ; i<10000 ; i++)
y += x*x*x*x*x;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
Thread.join();
std::chrono::duration<double> elapsed_time = stop - start;
std::cout << "Taken time: " << std::to_string(elapsed_time.count()) << " "<< std::endl;
return 0;
}
Compiling:
g++-7 -lpthread threads.cpp -o out2.out
For Analysis I use this code
import subprocess
import matplotlib.pyplot as plt
import numpy as np
RUNS = 1000
factor = 1000
times = []
for i in range(RUNS):
p = subprocess.run(["./out2.out"], stdout=subprocess.PIPE)
line = p.stdout
times.append(float(line.split()[-1]))
print(i, RUNS)
times = np.array(times) * factor
plt.plot(times, "-")
plt.ylabel("time * %d" % factor)
plt.xlabel("#runs")
plt.title("mean %.3f (+- %.3f), min = %.3f, max = %.3f" %
(np.mean(times), np.std(times), np.min(times), np.max(times)))
plt.savefig("log2.png")
Result
I think I should better ask: How can I reduce this latency and tell my OS, that this thread is really important to me and should have a higher priority?
You are not measuring what you think you are measuring here:
start= std::chrono::system_clock::now();
auto Thread = std::thread([](){std::cout<<"Excuting thread"<<std::endl;});
stop = std::chrono::system_clock::now();
The stop timestamp only gives you an upper bound on how long it takes main to spawn that thread and it actually tells you nothing about when that thread will start doing any actual work (for that you would need to take a timestamp inside the thread).
Also, system_clock is not the best clock for such measurements on most platforms, you should use steady_clock by default and resort to high_resolution_clock if that one doesn't give you enough precision (but note that you will have to deal with the non-monotonic nature of that clock by yourself then, which can easily mess up the gained precision for you).
As was mentioned already in the comments, spawning a new thread (and thus also constructing a new std::thread) is a very complex and time-consuming operation. If you need high responsiveness, what you want to do is spawn a couple of threads during startup of your program and then have them wait on a std::condition_variable that will get signalled as soon as work for them becomes available. That way you can be sure that on an otherwise idle system a thread will start processing the work that was assigned to him very quickly (immediately is not possible on most systems due to how the operating system schedules threads, but the delay should be well under a millisecond).

Inconsistent timings when passing data between two threads

I have a piece of code that I use to test various containers (e.g. deque and a circular buffer) when passing data from a producer (thread 1) to a consumer (thread 2). A data is represented by a struct with a pair of timestamps. First timestamp is taken before push in the producer, and the second one is taken when data is popped by the consumer.
The container is protected with a pthread spinlock.
The machine runs redhat 5.5 with 2.6.18 kernel (old!), it is a 4-core system with hyperthreading disabled. gcc 4.7 with -std=c++11 flag was used in all tests.
Producer acquires the lock, timestamps the data and pushes it into the queue, unlocks and sleeps in a busy loop for 2 microseconds (the only reliable way I found to sleep for precisely 2 micros on that system).
Consumer locks, pops the data, timestamps it and generates some statistics (running mean delay and standard deviation). The stats is printed every 5 seconds (M is the mean, M2 is the std dev) and reset. I used gettimeofday() to obtain the timestamps, which means that the mean delay number can be thought of as the percentage of delays that exceed 1 microsecond.
Most of the time the output looks like this:
CNT=2500000 M=0.00935 M2=0.910238
CNT=2500000 M=0.0204112 M2=1.57601
CNT=2500000 M=0.0045016 M2=0.372065
but sometimes (probably 1 trial out of 20) like this:
CNT=2500000 M=0.523413 M2=4.83898
CNT=2500000 M=0.558525 M2=4.98872
CNT=2500000 M=0.581157 M2=5.05889
(note the mean number is much worse than in the first case, and it never recovers as the program runs).
I would appreciate thoughts on why this could happen. Thanks.
#include <iostream>
#include <string.h>
#include <stdexcept>
#include <sys/time.h>
#include <deque>
#include <thread>
#include <cstdint>
#include <cmath>
#include <unistd.h>
#include <xmmintrin.h> // _mm_pause()
int64_t timestamp() {
struct timeval tv;
gettimeofday(&tv, 0);
return 1000000L * tv.tv_sec + tv.tv_usec;
}
//running mean and a second moment
struct StatsM2 {
StatsM2() {}
double m = 0;
double m2 = 0;
long count = 0;
inline void update(long x, long c) {
count = c;
double delta = x - m;
m += delta / count;
m2 += delta * (x - m);
}
inline void reset() {
m = m2 = 0;
count = 0;
}
inline double getM2() { // running second moment
return (count > 1) ? m2 / (count - 1) : 0.;
}
inline double getDeviation() {
return std::sqrt(getM2() );
}
inline double getM() { // running mean
return m;
}
};
// pause for usec microseconds using busy loop
int64_t busyloop_microsec_sleep(unsigned long usec) {
int64_t t, tend;
tend = t = timestamp();
tend += usec;
while (t < tend) {
t = timestamp();
}
return t;
}
struct Data {
Data() : time_produced(timestamp() ) {}
int64_t time_produced;
int64_t time_consumed;
};
int64_t sleep_interval = 2;
StatsM2 statsm2;
std::deque<Data> queue;
bool producer_running = true;
bool consumer_running = true;
pthread_spinlock_t spin;
void producer() {
producer_running = true;
while(producer_running) {
pthread_spin_lock(&spin);
queue.push_back(Data() );
pthread_spin_unlock(&spin);
busyloop_microsec_sleep(sleep_interval);
}
}
void consumer() {
int64_t count = 0;
int64_t print_at = 1000000/sleep_interval * 5;
Data data;
consumer_running = true;
while (consumer_running) {
pthread_spin_lock(&spin);
if (queue.empty() ) {
pthread_spin_unlock(&spin);
// _mm_pause();
continue;
}
data = queue.front();
queue.pop_front();
pthread_spin_unlock(&spin);
++count;
data.time_consumed = timestamp();
statsm2.update(data.time_consumed - data.time_produced, count);
if (count >= print_at) {
std::cerr << "CNT=" << count << " M=" << statsm2.getM() << " M2=" << statsm2.getDeviation() << "\n";
statsm2.reset();
count = 0;
}
}
}
int main(void) {
if (pthread_spin_init(&spin, PTHREAD_PROCESS_PRIVATE) < 0)
exit(2);
std::thread consumer_thread(consumer);
std::thread producer_thread(producer);
sleep(40);
consumer_running = false;
producer_running = false;
consumer_thread.join();
producer_thread.join();
return 0;
}
EDIT:
I believe that 5 below is the only thing that can explain 1/2 second latency. When on the same core, each would run for a long time and only then switch to the other.
The rest of the things on the list are too small to cause a 1/2 second delay.
You can use pthread_setaffinity_np to pin your threads to specific cores. You can try different combinations and see how performance changes.
EDIT #2:
More things you should take care of: (who said testing was simple...)
1. Make sure the consumer is already running when the producer starts producing. Not too important in your case as the producer is not really producing in a tight loop.
2. This is very important: you divide by count every time, which is not the right thing to do for your stats. This means that the first measurement in every stats window weight a lot more than the last. To measure the median you have to collect all the values. Measuring the average and min/max, without collecting all numbers, should give you a good enough picture of the latency.
It's not surprising, really.
1. The time is taken in Data(), but then the container spends time calling malloc.
2. Are you running 64 bit or 32? In 32 bit gettimeofday is a system call while in 64 bit it's a VDSO that doesn't get into the kernel... you may want to time gettimeofday itself and record the variance. Or enroll your own using rdtsc.
The best would be to use cycles instead of micros because micros are really too big for this scenario... only the rounding to micros gets you very much skewed when dealing with such a small scale of things
3. Are you guaranteed to not get preempted between producer and consumer? I guess that not. But this should not happen very frequently on a box dedicated to testing...
4. Is it 4 cores on a single socket or 2? if it's a 2 socket box, you want to have the 2 threads on the same socket, or you pay (at least) double for data transfer.
5. Make sure the threads are not running on the same core.
6. If the Data you transfer and the additional data (container node) are sharing cache lines (kind of likely) with other Data+node, the producer would be delayed by the consumer when it writes to the consumed timestamp. This is called false sharing. You can eliminate this by padding/aligning to 64 bytes and using an intrusive container.
gettimeofday is not a good way to profile computation overhead. It is the wall clock and your computer is multiprocessing. Even you think you are not running anything else, the OS scheduler always has some other activities to keep the system running. To profile your process overhead, you have to at least raise the priority of the process you are profiling. Also use high resolution timer or cpu ticks to do the timing measure.