C++ performance degradation (or code must be improved ?) - c++

I am developping a simple program that copies the same string into another one in a loop. I use Visual Studio C++ 2019 Community Edition, and the project type is "Command line".
If I run it for 3,42 seconds then the calculated number of copies per second is 130 601 397, but if I run it for 77,97 seconds then the number of copies per second is 47 469 296...
The more time the program is running, the more performance degradation there is...
Here is the code :
#include <iostream>
#include <cstdlib>
#include <signal.h>
#include <chrono>
#include <string>
using namespace std;
unsigned long repeats_counter = 0;
std::chrono::steady_clock::time_point t1;
std::chrono::steady_clock::time_point t2;
// When CTRL+C (SIGINT), this is executed
signal_callback_handler(int signum) {
t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = t2 - t1;
std::cout << "Total execution time " << diff.count() << " s\n";
unsigned long average_repeats_per_sec = (unsigned long)(repeats_counter / diff.count());
std::cout << "Number of average repeats per second was " <<
std::to_string(average_repeats_per_sec) << "\n";
std::cout << "Number of average repeats per minute was " <<
std::to_string(average_repeats_per_sec * 60) << "\n";
cout << "Number of effective repeats = " << repeats_counter << endl;
// Terminate program
exit(signum);
}
int main()
{
signal(SIGINT, signal_callback_handler);
signal(SIGTERM, signal_callback_handler);
std::string from_str, to_str;
cout << "Start copying. CTRL+C to stop." << endl;
t1 = std::chrono::high_resolution_clock::now();
from_str = "the string to be copied";
while (true) {
to_str = from_str;
repeats_counter++;
}
return 0;
}

This could be caused by integer overflow. unsigned long is at least 32 bit and on on some platforms it is equal to unsigned int. unsigned long long partially alleviates the issue, but technically the loop should have some kind of defense against that, albeit it adds to the cost of loop.
There are two problems with code portability, omitted by compiler due to implementation:
std::chrono::steady_clock::time_point should be std::chrono::high_resolution_clock::time_point
signal_callback_handler should have return type void

Related

Parallel version of the `std::generate` performs worse than the sequential one

I'm trying to parallelize some old code using the Execution Policy from the C++ 17. My sample code is below:
#include <cstdlib>
#include <chrono>
#include <iostream>
#include <algorithm>
#include <execution>
#include <vector>
using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::duration<double>;
constexpr auto NUM = 100'000'000U;
double func()
{
return rand();
}
int main()
{
std::vector<double> v(NUM);
// ------ feature testing
std::cout << "__cpp_lib_execution : " << __cpp_lib_execution << std::endl;
std::cout << "__cpp_lib_parallel_algorithm: " << __cpp_lib_parallel_algorithm << std::endl;
// ------ fill the vector with random numbers sequentially
auto const startTime1 = Clock::now();
std::generate(std::execution::seq, v.begin(), v.end(), func);
Duration const elapsed1 = Clock::now() - startTime1;
std::cout << "std::execution::seq: " << elapsed1.count() << " sec." << std::endl;
// ------ fill the vector with random numbers in parallel
auto const startTime2 = Clock::now();
std::generate(std::execution::par, v.begin(), v.end(), func);
Duration const elapsed2 = Clock::now() - startTime2;
std::cout << "std::execution::par: " << elapsed2.count() << " sec." << std::endl;
}
The program output on my Linux desktop:
__cpp_lib_execution : 201902
__cpp_lib_parallel_algorithm: 201603
std::execution::seq: 0.971162 sec.
std::execution::par: 25.0349 sec.
Why does the parallel version performs 25 times worse than the sequential one?
Compiler: g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
The thread-safety of rand is implementation-defined. Which means either:
Your code is wrong in the parallel case, or
It's effectively serial, with a highly contended lock, which would dramatically increase the overhead in the parallel case and get incredibly poor performance.
Based on your results, I'm guessing #2 applies, but it could be both.
Either way, the answer is: rand is a terrible test case for parallelism.

Broken timer or something else

I have some program what must help me, but i cant handle timing.
Hire is a code:
#include <iostream>
using namespace std;
#include <time.h>
#include <Windows.h>
double diffclock(clock_t clock1) {
clock_t clock2 = clock();
double diffticks = clock1 - clock2;
double diffms = diffticks / (CLOCKS_PER_SEC / 1000);
return diffms;
}
int main()
{
int wait = 134;
clock_t fullbetween = clock();
for (int i = 0; i < 5; i++) {
Sleep(wait / 5);
cout << wait / 5 << " ";
}
cout << endl << "finish in " << diffclock(fullbetween) << " ms" << endl;
return 0;
}
C++ version. same result:
#include <iostream>
#include <chrono>
#include <ctime>
#include <thread>
#include <Windows.h>
int main()
{
int wait = 134;
auto start = std::chrono::system_clock::now();
for (int i = 0; i < 5; i++) {
std::this_thread::sleep_for(std::chrono::milliseconds(wait/5));
}
auto end = std::chrono::system_clock::now();
auto int_ms = std::chrono::duration_cast<std::chrono::milliseconds> (end - start);
std::cout << std::endl << "finish in " << int_ms.count() << " ms" << std::endl;
return 0;
}
134 / 5 = 26 is ok. But in last "cout" it shows that all that iteration taked about ~170ms, not 130 as expected. Why this is happening ?
Sry about my engl.
The documentation for the Sleep function at https://learn.microsoft.com/en-gb/windows/win32/api/synchapi/nf-synchapi-sleep says
Suspends the execution of the current thread until the time-out interval elapses.
The system clock "ticks" at a constant rate. If dwMilliseconds is less than the resolution of the system clock, the thread may sleep for less than the specified length of time. If dwMilliseconds is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on.
Ticks are typically 15.6 ms on Windows systems (64 ticks per second), so 26 becomes 31.2.
This is the time after which it is possible for the suspended thread to become active again, there is no guarantee that it will start executing immediately. So your five sleeps become 156ms plus a little overhead.
The documentation continues with mitigations for this behaviour, and warnings that the mitigations will affect system power usage and so on.
To increase the accuracy of the sleep interval, call the timeGetDevCaps function to determine the supported minimum timer resolution and the timeBeginPeriod function to set the timer resolution to its minimum.
In std::this_thread::sleep_for documentation (found here)
It is stated that the function blocks the execution of the current
thread for at least the specified sleep_duration.
It may block for
longer than sleep_duration due to scheduling or resource contention
delays.
So your code will take at least 135ms to execute.

Create a time_point at seconds::max results in negative value

I'm trying to use a time_point to effectively represent forever by setting it to seconds::max which, I believe, should represent that much time since epoch. When doing this, though, I get -1 as the time since epoch in the resulting time_point. What am I not understanding?
#include <iostream>
#include <chrono>
using namespace std;
using namespace std::chrono;
int main() {
auto tp1 = system_clock::time_point( seconds::zero() );
auto tp2 = system_clock::time_point( seconds::max() );
cout << "tp1: " << duration_cast<seconds>(tp1.time_since_epoch()).count() << endl;
cout << "tp2: " << duration_cast<seconds>(tp2.time_since_epoch()).count() << endl;
return 0;
}
The output running that is:
tp1: 0
tp2: -1
Here's a little quick&dirty program to explore the limits of system_clock time_points at different precisions:
#include <chrono>
#include <iostream>
using days = std::chrono::duration
<int, std::ratio_multiply<std::ratio<24>, std::chrono::hours::period>>;
using years = std::chrono::duration
<double, std::ratio_multiply<std::ratio<146097, 400>, days::period>>;
template <class Rep, class Period>
void
max_limit(std::chrono::duration<Rep, Period> d)
{
std::cout << "[" << Period::num << '/' << Period::den << "] ";
std::cout << years{d.max()}.count() + 1970 << '\n';
}
int
main()
{
using namespace std;
using namespace std::chrono;
max_limit(nanoseconds{});
max_limit(microseconds{});
max_limit(milliseconds{});
max_limit(seconds{});
}
This will output the year (in floating point) that time_point<system_clock, D> will max out at for any duration D. This program outputs:
[1/1000000000] 2262.28
[1/1000000] 294247
[1/1000] 2.92279e+08
[1/1] 2.92277e+11
Meaning system_clock based on nanoseconds overflows in the year 2262. If you coarsen that to microseconds, you overflow in the year 294,247. And so on.
Once you coarsen to seconds, the max goes out to a ridiculous range. But when you convert that back to system_clock::time_point, which is at least as fine as microseconds, and perhaps as fine as nanoseconds (depending on your platform), you just blow it out of the water.
To solve your problem I recommend:
auto M = system_clock::time_point::max();
Adding a few more diagnostics shows the issue (on my system):
#include <iostream>
#include <chrono>
using namespace std;
using namespace std::chrono;
int main() {
auto tp1 = system_clock::time_point( seconds::zero() );
auto tp2 = system_clock::time_point( seconds::max() );
using type = decltype(system_clock::time_point(seconds::zero()));
cout << type::duration::max().count() << endl;
cout << type::duration::period::den << endl;
cout << type::duration::period::num << endl;
cout << seconds::max().count() << endl;
cout << milliseconds::max().count() << endl;
cout << "tp1: " << duration_cast<seconds>(tp1.time_since_epoch()).count() << endl;
cout << "tp2: " << duration_cast<seconds>(tp2.time_since_epoch()).count() << endl;
return 0;
}
For me, the denominator value is 1,000,000 for the system_clock's time_point. Thus max seconds is going to overflow it when converted up.

Proper method of using std::chrono

While I realize this is probably one of many identical questions, I can't seem to figure out how to properly use std::chrono. This is the solution I cobbled together.
#include <stdlib.h>
#include <iostream>
#include <chrono>
typedef std::chrono::high_resolution_clock Time;
typedef std::chrono::milliseconds ms;
float startTime;
float getCurrentTime();
int main () {
startTime = getCurrentTime();
std::cout << "Start Time: " << startTime << "\n";
while(true) {
std::cout << getCurrentTime() - startTime << "\n";
}
return EXIT_SUCCESS;
}
float getCurrentTime() {
auto now = Time::now();
return std::chrono::duration_cast<ms>(now.time_since_epoch()).count() / 1000;
}
For some reason, this only ever returns integer values as the difference, which increments upwards at rate of 1 per second, but starting from an arbitrary, often negative, value.
What am I doing wrong? Is there a better way of doing this?
Don't escape the chrono type system until you absolutely have to. That means don't use .count() except for I/O or interacting with legacy API.
This translates to: Don't use float as time_point.
Don't bother with high_resolution_clock. This is always a typedef to either system_clock or steady_clock. For more portable code, choose one of the latter.
.
#include <iostream>
#include <chrono>
using Time = std::chrono::steady_clock;
using ms = std::chrono::milliseconds;
To start, you're going to need a duration with a representation of float and the units of seconds. This is how you do that:
using float_sec = std::chrono::duration<float>;
Next you need a time_point which uses Time as the clock, and float_sec as its duration:
using float_time_point = std::chrono::time_point<Time, float_sec>;
Now your getCurrentTime() can just return Time::now(). No fuss, no muss:
float_time_point
getCurrentTime() {
return Time::now();
}
Your main, because it has to do the I/O, is responsible for unpacking the chrono types into scalars so that it can print them:
int main () {
auto startTime = getCurrentTime();
std::cout << "Start Time: " << startTime.time_since_epoch().count() << "\n";
while(true) {
std::cout << (getCurrentTime() - startTime).count() << "\n";
}
}
This program does a similar thing. Hopefully it shows some of the capabilities (and methodology) of std::chrono:
#include <iostream>
#include <chrono>
#include <thread>
int main()
{
using namespace std::literals;
namespace chrono = std::chrono;
using clock_type = chrono::high_resolution_clock;
auto start = clock_type::now();
for(;;) {
auto first = clock_type::now();
// note use of literal - this is c++14
std::this_thread::sleep_for(500ms);
// c++11 would be this:
// std::this_thread::sleep_for(chrono::milliseconds(500));
auto last = clock_type::now();
auto interval = last - first;
auto total = last - start;
// integer cast
std::cout << "we just slept for " << chrono::duration_cast<chrono::milliseconds>(interval).count() << "ms\n";
// another integer cast
std::cout << "also known as " << chrono::duration_cast<chrono::nanoseconds>(interval).count() << "ns\n";
// floating point cast
using seconds_fp = chrono::duration<double, chrono::seconds::period>;
std::cout << "which is " << chrono::duration_cast<seconds_fp>(interval).count() << " seconds\n";
std::cout << " total time wasted: " << chrono::duration_cast<chrono::milliseconds>(total).count() << "ms\n";
std::cout << " in seconds: " << chrono::duration_cast<seconds_fp>(total).count() << "s\n";
std::cout << std::endl;
}
return 0;
}
example output:
we just slept for 503ms
also known as 503144616ns
which is 0.503145 seconds
total time wasted: 503ms
in seconds: 0.503145s
we just slept for 500ms
also known as 500799185ns
which is 0.500799 seconds
total time wasted: 1004ms
in seconds: 1.00405s
we just slept for 505ms
also known as 505114589ns
which is 0.505115 seconds
total time wasted: 1509ms
in seconds: 1.50923s
we just slept for 502ms
also known as 502478275ns
which is 0.502478 seconds
total time wasted: 2011ms
in seconds: 2.01183s

How to use clock() in C++

How do I call clock() in C++?
For example, I want to test how much time a linear search takes to find a given element in an array.
#include <iostream>
#include <cstdio>
#include <ctime>
int main() {
std::clock_t start;
double duration;
start = std::clock();
/* Your algorithm here */
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
std::cout<<"printf: "<< duration <<'\n';
}
An alternative solution, which is portable and with higher precision, available since C++11, is to use std::chrono.
Here is an example:
#include <iostream>
#include <chrono>
typedef std::chrono::high_resolution_clock Clock;
int main()
{
auto t1 = Clock::now();
auto t2 = Clock::now();
std::cout << "Delta t2-t1: "
<< std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count()
<< " nanoseconds" << std::endl;
}
Running this on ideone.com gave me:
Delta t2-t1: 282 nanoseconds
clock() returns the number of clock ticks since your program started. There is a related constant, CLOCKS_PER_SEC, which tells you how many clock ticks occur in one second. Thus, you can test any operation like this:
clock_t startTime = clock();
doSomeOperation();
clock_t endTime = clock();
clock_t clockTicksTaken = endTime - startTime;
double timeInSeconds = clockTicksTaken / (double) CLOCKS_PER_SEC;
On Windows at least, the only practically accurate measurement mechanism is QueryPerformanceCounter (QPC). std::chrono is implemented using it (since VS2015, if you use that), but it is not accurate to the same degree as using QueryPerformanceCounter directly. In particular it's claim to report at 1 nanosecond granularity is absolutely not correct. So, if you're measuring something that takes a very short amount of time (and your case might just be such a case), then you should use QPC, or the equivalent for your OS. I came up against this when measuring cache latencies, and I jotted down some notes that you might find useful, here;
https://github.com/jarlostensen/notesandcomments/blob/master/stdchronovsqcp.md
#include <iostream>
#include <ctime>
#include <cstdlib> //_sleep() --- just a function that waits a certain amount of milliseconds
using namespace std;
int main()
{
clock_t cl; //initializing a clock type
cl = clock(); //starting time of clock
_sleep(5167); //insert code here
cl = clock() - cl; //end point of clock
_sleep(1000); //testing to see if it actually stops at the end point
cout << cl/(double)CLOCKS_PER_SEC << endl; //prints the determined ticks per second (seconds passed)
return 0;
}
//outputs "5.17"
You can measure how long your program works. The following functions help measure the CPU time since the start of the program:
C++ (double)clock() / CLOCKS_PER_SEC with ctime included.
Python time.clock() returns floating-point value in seconds.
Java System.nanoTime() returns long value in nanoseconds.
My reference: algorithms toolbox week 1 course part of data structures and algorithms specialization by University of California San Diego & National Research University Higher School of Economics
So you can add this line of code after your algorithm:
cout << (double)clock() / CLOCKS_PER_SEC;
Expected Output: the output representing the number of clock ticks per second
Probably you might be interested in timer like this :
H : M : S . Msec.
the code in Linux OS:
#include <iostream>
#include <unistd.h>
using namespace std;
void newline();
int main() {
int msec = 0;
int sec = 0;
int min = 0;
int hr = 0;
//cout << "Press any key to start:";
//char start = _gtech();
for (;;)
{
newline();
if(msec == 1000)
{
++sec;
msec = 0;
}
if(sec == 60)
{
++min;
sec = 0;
}
if(min == 60)
{
++hr;
min = 0;
}
cout << hr << " : " << min << " : " << sec << " . " << msec << endl;
++msec;
usleep(100000);
}
return 0;
}
void newline()
{
cout << "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
}