+++ See update below +++
This is a code for reverse printing the content of an array. I used 3 slightly different methods for doing it: directly putting the dimension of the array in the for loop, using iterator and using reverse_iterator and measured the execution time of printing the for loop.
#include <iostream>
#include <vector>
#include <chrono>
using get_time = std::chrono::high_resolution_clock;
int main() {
std::cout << "Enter the array dimension:";
int N;
std::cin >> N;
//Read the array elements
std::cout << "Enter the array elements:" <<'\n';
std::vector <int> v;
int input;
for(size_t i=0; i<N; i++){
std::cin >> input;
v.push_back(input);
}
auto start = get_time::now();
for(int i=N-1; i>=0; i--){
std::cout << v[i] <<" ";
}
auto finish = get_time::now();
auto time_diff=finish-start;
std::cout << "Elapsed time,non-iterator= " << std::chrono::duration<double>
(time_diff).count() << " Seconds" << '\n';
auto start2 = get_time::now();
std::vector <int>::reverse_iterator ri;
for(ri=v.rbegin(); ri!=v.rend(); ri++){
std::cout << *ri <<" ";
}
auto finish2 = get_time::now();
auto time_diff2=finish2-start2;
std::cout << "Elapsed time, reverse iterator= " << std::chrono::duration<double>
(time_diff2).count() << " Seconds" << '\n';
auto start3 = get_time::now();
std::vector <int>::iterator i;
for(i=v.end()-1; i>=v.begin(); i--){
std::cout << *i <<" ";
}
auto finish3 = get_time::now();
auto time_diff3=finish3-start3;
std::cout << "Elapsed time, iterator= " << std::chrono::duration<double>
(time_diff3).count() << " Seconds" << '\n';
return 0;
}
The output is as follows:
Output:
5 4 3 2 1 Elapsed time,non-iterator= 2.7913e-05 Seconds
5 4 3 2 1 Elapsed time, reverse iterator= 5.57e-06 Seconds
5 4 3 2 1 Elapsed time, iterator= 4.56e-06 Seconds
My question is:
Why the direct method is almost 5 times slower than both iterator and reverse_iterator methods? Also, is this faster execution of iterator machine dependent?
This is a prototype, but I will need to deal with much bigger matrices; that is why I am asking this question. Thank you.
+++ Update +++
I am posting the updated results after incorporating the comments. It was too big for a comment.
I changed the for loop to evaluate the sum of an array with 100000 elements. I evaluated the same sum using the above mentioned methods (compiled with -O3 in clang++) and I have averaged the execution time for 3 methods over 10000 runs. Here are the results:
Average (10000 runs) elapsed time, non-iterator= 2.50183e-05
Average (10000 runs) elapsed time, reverse-iterator= 3.48299e-05
Average (10000 runs) elapsed time, iterator= 7.35307e-05
The results are much more uniform now, and now the non-iterator method is the fastest! Any insights? Or even this result is meaningless and I should do some more test?
the updated code:
#include <iostream>
#include <vector>
#include <chrono>
using get_time = std::chrono::high_resolution_clock;
int main() {
double time1,time2,time3;
int run=10000;
for(int k=0; k<run; k++){
//Read the array elements
std::vector <int> v;
int input,N=100000;
for(size_t i=0; i<N; i++){
v.push_back(i);
}
int sum1{0},sum2{0},sum3{0};
auto start = get_time::now();
for(int i=N-1; i>=0; i--){
sum1+=v[i];
}
auto finish = get_time::now();
auto time_diff=finish-start;
std::cout << "Sum= " << sum1 << " " << "Elapsed time,non-iterator= " << std::chrono::duration<double>
(time_diff).count() << " Seconds" << '\n';
auto start2 = get_time::now();
std::vector <int>::reverse_iterator ri;
for(ri=v.rbegin(); ri!=v.rend(); ri++){
sum2+=*ri;
}
auto finish2 = get_time::now();
auto time_diff2=finish2-start2;
std::cout << "Sum= " << sum2 <<" Elapsed time, reverse iterator= " << std::chrono::duration<double>
(time_diff2).count() << " Seconds" << '\n';
auto start3 = get_time::now();
std::vector <int>::iterator i;
for(i=v.end()-1; i>=v.begin(); i--){
sum3+=*i;
}
auto finish3 = get_time::now();
auto time_diff3=finish3-start3;
std::cout << "Sum= " <<sum3 << " Elapsed time, iterator= " << std::chrono::duration<double>
(time_diff3).count() << " Seconds" << '\n';
time1+=std::chrono::duration<double>(time_diff).count();
time2+=std::chrono::duration<double>(time_diff2).count();
time3+=std::chrono::duration<double>(time_diff3).count();
}
std::cout << "Average (" << run << " runs)" << " elapsed time, non-iterator= " << time1/double(run) <<'\n';
std::cout << "Average (" << run << " runs)" << " elapsed time, reverse-iterator= " << time2/double(run) <<'\n';
std::cout << "Average (" << run << " runs)" << " elapsed time, iterator= " << time3/double(run) <<'\n';
return 0;
}
Related
I'm trying to write a simple single header benchmarker and I understand that std::clock will give me the time that a process (thread) is in actual use.
So, given the following simplified program:
nt main() {
using namespace std::literals::chrono_literals;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
std::this_thread::sleep_for(1s);
// clobber();
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
We get the following output:
cpu: 4820 4839 1.9e-05 s
wall: 1.00007 s
I just want to clarify that the cpu time is the time that it executes the code that is not actually the sleep_for code as that is actually done by the kernel which std::clock doesn't track. So to confirm, I changed what I was timing:
int main() {
using namespace std::literals::chrono_literals;
int value = 0;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
for (int i = 0; i < 1000000; ++i) {
srand(value);
value = rand();
}
// clobber();
std::cout << "value = " << value << std::endl;
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
This gave me an output of:
cpu: 4949 1398224 1.39328 s
wall: 2.39141 s
value = 354531795
So far, so good. I then tried this on my windows box running MSYS2's g++ compiler. The output for the last program gave me:
value = 0
cpu: 15 15 0 s
wall: 0.0080039 s
std::clock() is always outputting 15? Is the compiler implementation of std::clock() broken?
Seems that I assumed that CLOCKS_PER_SEC would be the same. However, on the MSYS2 compiler, it was 1000x less then on godbolt.org.
#include <iostream>
#include <future>
auto gClock = clock();
char threadPool(char c) {
std::cout << "enter thread :" << c << " cost time:" << clock() - gClock << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
for (int i = 0; i < 10; i++)
std::cout << c;
std::cout << std::endl;
return c;
}
void fnTestAsync(){
auto begin = clock();
std::future<char> futures[10];
for (int i = 0; i < 10; ++i){
futures[i] = std::async(std::launch::async,threadPool, 'a' + i);
}
for (int i = 0; i < 10; ++i){
std::cout << futures[i].get() << " back ,cost time: " << clock() - begin << std::endl;
}
std::cout << "fnTestAsync: " << clock() - begin << std::endl;
}
int main(){
std::thread testAsync(fnTestAsync);
testAsync.detach();
std::this_thread::sleep_for(std::chrono::seconds(10));
return 0;
}
run result
I'm trying to get these 10 threads to execute together and all return immediately after a two second delay, but I output the time spent and find that it takes about 2900ms, much larger than the 2000ms I expected.
What is the cause of this increase?
How should he fix it?
I am trying to make a text game where there is a timer and once the game was finished before or in 60 seconds, there is a bonus points. However, I have no idea how can I get the value or the time from using the chrono without cout-ing it. I want to use the value for calculating the bonus point. i can cout the value through the .count() but I cannot get that value to use for the condition part.
here's my code for the scoring part:
void Game::score(auto start, auto end) {
int bonus = 0;
int total = 0;
string name;
box();
gotoxy(10,8); cout << "C O N G R A T U L A T I O N S";
gotoxy(15,10); cout << "You have successfully accomplished all the levels!";
gotoxy(15,11); cout << "You are now a certified C-O-N-N-E-C-T-o-r-I-s-T" << char(002) << char(001);
gotoxy(20,13); cout << "= = = = = = = = = = GAME STATS = = = = = = = = = =";
gotoxy(25,15); cout << "Time Taken: " << chrono::duration_cast<chrono::seconds>(end - start).count() << " seconds";
gotoxy(25,16); cout << "Points: " << pts << " points";
if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 60) {
bonus == 5000;
} else if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 90) {
bonus == 3000;
} else if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 120) {
bonus == 1000;
}
gotoxy(30,17); cout << "Bonus Points (Time Elapsed): " << bonus;
total = pts + bonus;
gotoxy(25,18); cout << "Total Points: " << total << " points";
gotoxy(20,20); cout << "Enter your name: ";
cin >> name;
scoreB.open("scoreboard.txt",ios::app);
scoreB << name << "\t" << total << "\n";
scoreB.close();
}
You should really use the chrono literals for comparing durations. See example here:
#include <chrono>
#include <iostream>
#include <thread>
using Clock = std::chrono::system_clock;
void compareTimes(std::chrono::time_point<Clock> startTime,
std::chrono::time_point<Clock> finishTime) {
using namespace std::chrono_literals;
std::chrono::duration<float> elapsed = finishTime - startTime;
std::cout << "elapsed = " << elapsed.count() << "\n";
if (elapsed > 10ms) {
std::cout << "over 10ms\n";
}
if (elapsed < 60s) {
std::cout << "under 60s\n";
}
}
int main() {
using namespace std::chrono_literals;
auto startTime = Clock::now();
std::this_thread::sleep_for(20ms);
auto finishTime = Clock::now();
compareTimes(startTime, finishTime);
return 0;
}
Demo: https://godbolt.org/z/hqv58acoY
I have implemented a c++ method that calculates the maximum ulp error between an approximation and a reference function on a given interval. The approximation as well as the reference are calculated as single-precision floating point values. The method starts with the low bound of the interval and iterates over each existing single-precision value within the range.
Since there are a lot of existing values depending on the range that is chosen, I would like to estimate the total runtime of this method, and print it to the user.
I tried to execute the comparison several times to calculate the runtime of one iteration. My approach was to multiply the duration of one iteration with the total number of floats existing in the range. But obviously the execution time for one iteration is not constant but depends on the number of iterations, therefore my estimated duration is not accurate at all... Maybe one could adapt the total runtime calculation in the main loop?
My question is: Is there any other way to estimate the total runtime for this particular case?
Here is my code:
void FloatEvaluateMaxUlp(float(*testFunction)(float), float(*referenceFunction)(float), float lowBound, float highBound)
{
/*initialization*/
float x = lowBound, output, output_ref;
int ulp = 0;
long long duration = 0, numberOfFloats=0;
/*calculate number of floats between lowBound and highBound*/
numberOfFloats = *(int*)&highBound - *(int*)&lowBound;
/*measure execution time of 10 iterations*/
int iterationsToEstimateTime = 1000;
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterationsToEstimateTime; i++)
{
printProgressInteger(i+1, iterationsToEstimateTime);
output = testFunction(x);
output_ref = referenceFunction(x);
int ulp_local = FloatCompareULP(output, output_ref);
if (abs(ulp_local) > abs(ulp))
ulp = ulp_local;
x= std::nextafter(x, highBound + 0.001f);
}
auto t2 = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
duration /= iterationsToEstimateTime;
x = lowBound;
/*output of estimated time*/
std::cout <<std::endl<<std::endl<< " Number of floats: " << numberOfFloats << " Time per iteration: " << duration << " Estimated total time: " << numberOfFloats * duration << std::endl;
std::cout << " Starting test in range [" << lowBound << "," << highBound << "]." << std::endl;
long long count = 0;
/*record start time*/
t1 = std::chrono::high_resolution_clock::now();
for (count; x < highBound; count++)
{
printProgressInteger(count, numberOfFloats);
output = testFunction(x);
output_ref = referenceFunction(x);
int ulp_local = FloatCompareULP(output, output_ref);
if (abs(ulp_local) > abs(ulp))
ulp = ulp_local;
x = std::nextafter(x, highBound + 0.001f);
}
/*record stop time and compute duration*/
t2 = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
/*result output*/
std::cout <<std::endl<< std::endl << std::endl << std::endl << "*********************************************************" << std::endl;
std::cout << " RESULT " << std::endl;
std::cout << "*********************************************************" << std::endl;
std::cout << " Iterations: " << count << " Total execution time: " << duration << std::endl;
std::cout << " Max ulp: " << ulp <<std::endl;
std::cout << "*********************************************************" << std::endl;
}
I am trying to get rid of an STL list fast. So I have declared a pointer to that list.
I do the all manipulations and then I delete the pointer to free up the RAM.
But the process of deletion the pointer to the list is slow and as slow as when I do list.clear(). So it is very slow. Why does that happen? How can I delete the allocated RAM fast? When I am dealing with vector and deque the deletion is fast. Below is a program which demonstrates that.
//============//
// STL delete //
//============//
#include <iostream>
#include <algorithm>
#include <vector>
#include <list>
#include <deque>
#include <cmath>
#include <iomanip>
#include <ctime>
using std::cout;
using std::cin;
using std::endl;
using std::list;
using std::vector;
using std::deque;
using std::fixed;
using std::setprecision;
using std::showpoint;
using std::sort;
// the main program
int main()
{
// variables and parameters
const long int I_MAX = static_cast<long int>(pow(10.0, 7.5));
const long int K_MAX = static_cast<long int>(pow(10.0, 6.0));
long int i;
long int k;
clock_t t1;
clock_t t2;
double tall;
// set the output
cout << fixed;
cout << setprecision(5);
cout << showpoint;
// main bench loop
for (k = 0; k < K_MAX; k++)
{
list<double> * listA = new list<double> [1];
vector<double> * vecA = new vector<double> [1];
deque<double> * deqA = new deque<double> [1];
cout << endl;
cout << "------------------------------->>> " << k << endl;
cout << endl;
// build the vector
t1 = clock();
cout << " 1 --> build the vector ..." << endl;
for (i = 0; i < I_MAX; i++)
{ vecA->push_back(static_cast<double>(cos(i))); }
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 2 --> done with the vector --> " << tall << endl;
// build the list
t1 = clock();
cout << " 3 --> build the list ..." << endl;
for (i = 0; i < I_MAX; i++)
{ listA->push_back(static_cast<double>(cos(i))); }
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 4 --> done with the list --> " << tall << endl;
// build the deque
t1 = clock();
cout << " 5 --> build the deque ..." << endl;
for (i = 0; i < I_MAX; i++)
{ deqA->push_back(static_cast<double>(cos(i))); }
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 6 --> done with the deque --> " << tall << endl;
// sort the vector
t1 = clock();
cout << " 7 --> sort the vector ..." << endl;
sort(vecA->begin(), vecA->end());
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 8 --> done with the vector --> " << tall << endl;
// sort the list
t1 = clock();
cout << " 9 --> sort the list ..." << endl;
listA->sort();
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 10 --> done with the list --> " << tall << endl;
// sort the deque
t1 = clock();
cout << " 11 --> sort the deque ..." << endl;
sort(deqA->begin(), deqA->end());
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 12 --> done with the deque --> " << tall << endl;
// delete the vector
t1 = clock();
cout << " 13 --> delete the vector ..." << endl;
delete [] vecA;
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 14 --> done with the vector --> " << tall << endl;
// delete the list
t1 = clock();
cout << " 15 --> delete the list ..." << endl;
delete [] listA;
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 16 --> done with the list --> " << tall << endl;
t1 = clock();
// delete the deque
cout << " 17 --> delete the deque ..." << endl;
delete [] deqA;
t2 = clock();
tall = (t2-t1)/static_cast<double>(CLOCKS_PER_SEC);
cout << " 18 --> done with the deque --> " << tall << endl;
}
int sentinel;
cin >> sentinel;
return 0;
}
Every element in the list has its own node, meaning an extra allocation which has to be freed.
If you want to get rid of it all really fast and use members with trivial destructors (no call needed), use a custom allocator for the list, which is optimized for that.
BTW: Allocating the container on the heap is a pessimisation.
Anyway, depending on your use-case another container like std::vector might make sense instead.
The problem is not so much deleting the list, as understanding how a list is represented as a chain of heap-allocated nodes.
So basically when you deleted the list, it also has to delete the ~30M nodes as well, which will absolutely be a noticeably slow operation.
Generally speaking a list is not a great container for small builtin types anyway due to the node overhead possibly taking more space than the data themselves.
Can you give us more information about the real problem you're trying to solve?