I'm trying to time my algorithm but the time always shows up as zero. Here is my code:
steady_clock::time_point start = steady_clock::now();
ReverseArray(list1, 1000000);
steady_clock::time_point end = steady_clock::now();
auto totalTime = end - start;
cout << duration_cast<microseconds>(totalTime).count()<< "\n";
Just to elaborate on Jack's answer: on Windows, using Visual Studio up to 2013, there is a bug in the standard library which makes chrono clocks unprecise. The high resolution clock is an alias to the steady clock which is ok, but the bug is that it have a very low resolution (around 8 ms last time I checked).
Therefore, to this day (04/2014), VS standard library can't be used for measuring time or even for time-sensitive concurrent data sharing mechanisms (like when relying on condition_variable.wait_for() or this_thread::sleep_for() functions). Personally I fixed this problem by using Boost implementation instead until it's fixed. As pointed by STL (the person who is in charge of the standard library in Microsoft) in the bug report this should be fixed for the Visual Studio versions higher than 2013 (not the CTPs as they don't change the standard library).
What I mean is that your code is actually correct and cross-platform (except I would have used high_precision_clock instead), that specific implementation is buggy, but it works well with other implementations (GCC, Clang, Boost version of chrono and thread).
For example this works perfectly well on GCC 4.8 (Coliru):
#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
auto my_array = []{
std::vector<long long> values( 100000 );
long long last_value = 0;
for( auto& value : values )
value = last_value++;
return values;
}();
void some_work()
{
for( int i = 0; i < 10000; ++i ) // play with this count
std::reverse( begin(my_array), end(my_array) );
}
int main()
{
using namespace std;
using namespace chrono;
auto start = high_resolution_clock::now();
some_work();
auto end = high_resolution_clock::now();
auto totalTime = end - start;
cout << duration_cast<microseconds>(totalTime).count()<< " microsecs \n";
return 0;
}
Coliru return "95 microsecs" for me when trying with only one cycle in some_work().
Maybe you need more resolution:
auto t1 = std::chrono::high_resolution_clock::now();
// Your code to be measured
auto t2 = std::chrono::high_resolution_clock::now();
auto t = std::chrono::duration<double,std::milli>(t2-t1).count();
std::cout << "Time (ms):" << t << std::endl;
If you're on Windows, use this resource.
It is good for sub-microsecond accuracy. Use this:
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);
ReverseArray(list1, 1000000);
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
Related
I am currently trying to create a way to display the elapsed seconds (not the difference between cycles). My code is following:
#include <iostream>
#include <vector>
#include <chrono>
#include <Windows.h>
typedef std::chrono::high_resolution_clock::time_point TIME;
#define TIMENOW() std::chrono::high_resolution_clock::now()
#define TIMECAST(x) std::chrono::duration_cast<std::chrono::duration<double>>(x).count()
int main()
{
std::chrono::duration<double> ms;
double t = 0;
while (1)
{
TIME begin = TIMENOW();
int c = 0;
for (int i = 0; i < 10000000; i++)
{
c += i*100000;
}
TIME end = TIMENOW();
ms= std::chrono::duration_cast<std::chrono::duration<double>>(end - begin);
t =t+ ms.count();
std::cout << t << std::endl;
}
I expected adding the delta time over and over again to roughly give me the elapsed time in seconds, however I noticed that only if I do i < big number it sort of is fairly accurate. If its only 10,000 or so, t seems to accumulate slower and gradually faster. Maybe I am missing something but isnt the difference my delta time(the elapsed time between this and last cycle) and if I keep adding the delta times up, it should spit out seconds? Any help is appreciated.
Update 2: Actually it's the regex(".{40000}");. That alone already takes that much time. Why?
regex_match("", regex(".{40000}")); takes almost 8 seconds on my PC. Why? Am I doing something wrong? I'm using gcc 4.9.3 from MinGW on Windows 10 on an i7-6700.
Here's a full test program:
#include <iostream>
#include <regex>
#include <ctime>
using namespace std;
int main() {
clock_t t = clock();
regex_match("", regex(".{40000}"));
cout << double(clock() - t) / CLOCKS_PER_SEC << endl;
}
How I compile and run it:
C:\Users\ ... \coding>g++ -std=c++11 test.cpp
C:\Users\ ... \coding>a.exe
7.643
Update: Looks like the time is quadratic in the given number. Doubling it roughly quadruples the time:
10000 0.520 seconds (factor 1.000)
20000 1.922 seconds (factor 3.696)
40000 7.810 seconds (factor 4.063)
80000 31.457 seconds (factor 4.028)
160000 128.904 seconds (factor 4.098)
320000 536.358 seconds (factor 4.161)
The code:
#include <regex>
#include <ctime>
using namespace std;
int main() {
double prev = 0;
for (int i=10000; ; i*=2) {
clock_t t0 = clock();
regex_match("", regex(".{" + to_string(i) + "}"));
double t = double(clock() - t0) / CLOCKS_PER_SEC;
printf("%7d %7.3f seconds (factor %.3f)\n", i, t, prev ? t / prev : 1);
prev = t;
}
}
Still no idea why. It's a very simple regex and the empty string (though it's the same with short non-empty strings). It should fail instantly. Is the regex engine just weird and bad?
Because it want be fast...
It very possible that transform this regex to another representation (state machine or something else) that is easier and faster to run. C# allow even generating runtime code that represents regex.
In your case you probably hit some bug in that transformation that have O(n^2) complexity.
Measuring the construction and matching separately:
clock_t t1 = clock();
regex r(".{40000}");
clock_t t2 = clock();
regex_match("", r);
clock_t t3 = clock();
cout << double(t2 - t1) / CLOCKS_PER_SEC << '\n'
<< double(t3 - t2) / CLOCKS_PER_SEC << endl;
I see:
0.077336
0.000613
I am currently trying to benchmark various implementations of large loop performing arbitrary jobs, and I found myself with a very slow version when using boost transform iterators and boost counting_iterators.
I designed a small code that benchmark two loops that sums the product of all integers between 0 and SIZE-1 with an arbitrary integer (that I choose to be 1 in my example in order to avoid overflow).
Her's my code:
//STL
#include <iostream>
#include <algorithm>
#include <functional>
#include <chrono>
//Boost
#include <boost/iterator/transform_iterator.hpp>
#include <boost/iterator/counting_iterator.hpp>
//Compile using
// g++ ./main.cpp -o test -std=c++11
//Launch using
// ./test 1
#define NRUN 10
#define SIZE 128*1024*1024
struct MultiplyByN
{
MultiplyByN( size_t N ): m_N(N){};
size_t operator()(int i) const { return i*m_N; }
const size_t m_N;
};
int main(int argc, char* argv[] )
{
int N = std::stoi( argv[1] );
size_t sum = 0;
//Initialize chrono helpers
auto start = std::chrono::steady_clock::now();
auto stop = std::chrono::steady_clock::now();
auto diff = stop - start;
double msec=std::numeric_limits<double>::max(); //Set min runtime to ridiculously high value
MultiplyByN op(N);
//Perform multiple run in order to get minimal runtime
for(int k = 0; k< NRUN; k++)
{
sum = 0;
start = std::chrono::steady_clock::now();
for(int i=0;i<SIZE;i++)
{
sum += op(i);
}
stop = std::chrono::steady_clock::now();
diff = stop - start;
//Compute minimum runtime
msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
}
std::cout << "First version : Sum of values is "<< sum << std::endl;
std::cout << "First version : Minimal Runtime was "<< msec << " msec "<< std::endl;
msec=std::numeric_limits<double>::max(); //Reset min runtime to ridiculously high value
//Perform multiple run in order to get minimal runtime
for(int k = 0; k< NRUN; k++)
{
start = std::chrono::steady_clock::now();
//Functional way to express the summation
sum = std::accumulate( boost::make_transform_iterator(boost::make_counting_iterator(0), op ),
boost::make_transform_iterator(boost::make_counting_iterator(SIZE), op ),
(size_t)0, std::plus<size_t>() );
stop = std::chrono::steady_clock::now();
diff = stop - start;
//Compute minimum runtime
msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
}
std::cout << "Second version : Sum of values is "<< sum << std::endl;
std::cout << "Second version version : Minimal Runtime was "<< msec << " msec "<< std::endl;
return EXIT_SUCCESS;
}
And the output I get:
./test 1
First version : Sum of values is 9007199187632128
First version : Minimal Runtime was 433.142 msec
Second version : Sum of values is 9007199187632128
Second version version : Minimal Runtime was 10910.7 msec
The "functional" version of my loop that uses std::accumulate is 25 times slower than the simple loop version, why so ?
Thank you in advance for your help
Based on your comment in the code, you've compiled this with
g++ ./main.cpp -o test -std=c++11
Since you didn't specify the optimization level, g++ used the default setting, which is -O0 i.e. no optimization.
That means that the compiler didn't inline anything. Template libraries like the standard library or boost depend on inlining for performance. Additionally, the compiler will produce a lot of extra code, that's far from optimal -- it doesn't make any sense to make performance comparisons on such binaries.
Recompile with optimization enabled, and try your test again to get meaningful results.
I have app, where i must count time of executing part of C++ function and ASM function. Actually i have problem, times which i get are weird - 0 or about 15600. O ocurs more often. And sometimes, after executing, times looks good, and values are different than 0 and ~15600. Anybody knows why it occurs ? And how to fix it ?
Fragment of counting time for executing my app for C++:
auto start = chrono::system_clock::now();
for (int i = 0; i < nThreads; i++)
xThread[i]->Start(i);
for (int i = 0; i < nThreads; i++)
xThread[i]->Join();
auto elapsed = chrono::system_clock::now() - start;
long long milliseconds = chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
cppTimer = milliseconds;
What you're seeing there is the resolution of your timer. Apparently, chrono::system_clock ticks every 1/64th of a second, or 15,625 microseconds, on your system.
Since you're in C++/CLI and have the .Net library available, I'd switch to using the Stopwatch class. It will generally have a much higher resolution than 1/64th of a second.
Looks good to me. Except for cast to std::chrono::microseconds and naming it milliseconds.
The snippet I have used for many months now is:
class benchmark {
private:
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::milliseconds milliseconds;
clock::time_point start;
public:
benchmark(bool startCounting = true) {
if(startCounting)
start = clock::now();
}
void reset() {
start = clock::now();
}
// in milliseconds
double elapsed() {
milliseconds ms = std::chrono::duration_cast<milliseconds>(clock::now() - start);
double elapsed_secs = ms.count() / 1000.0;
return elapsed_secs;
}
};
// usage
benchmark b;
...
cout << "took " << b.elapsed() << " ms" << endl;
New to SO. I am test-driving Armadillo+OpenBLAS, and a simple Monte-Carlo geometric Brownian motion logic shows much longer runtime than MATLAB. I believe something must be wrong.
Environment:
Intel i-5 4 core,
8GB ram,
VS 2012 Express,
Armadillo 4.2,
OpenBLAS (official x64 binary) v0.2.9.rc2,
MATLAB takes 2 seconds for the same logic, but Armadillo+OB takes 12 seconds. I also noticed that the program is running on single thread, but I turned to OpenBLAS because I heard of its multi-core capability.
Thanks for any advice.
#include <iostream>
#include <armadillo>
#include <ctime>
using namespace std;
using namespace arma;
int main()
{
clock_t start;
start = clock();
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
for (unsigned int iT=0; iT<T; ++iT)
{
S = Spre%exp((r-0.5*Vol*Vol)*dt + Vol*sqrt(dt)*randn(R));
Spre = S;
}
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
system("pause");
return 0;
}
First, the bottleneck is not exp(), though std::exp is slow. The problem is randn().
on my machine, randn() takes most of the time. And when I use MKL_VSL 's implementation of randn, the time cost dropped from 12s to 4s, comparable to matlab's 3s or so.
#include <iostream>
#include <armadillo>
#include <ctime>
#include "mkl_vml.h"
#include "mkl_vsl.h"
using namespace std;
using namespace arma;
#define SEED 0
#define BRNG VSL_BRNG_MCG31
#define METHOD 0
int main()
{
clock_t start;
VSLStreamStatePtr stream;
start = clock();
vslNewStream(&stream, BRNG, SEED);
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
double tmp = sqrt(dt);
vec tmp2=100*zeros<vec>(R);
vec tmp3=100*zeros<vec>(R);
for (unsigned int iT=0; iT<T; ++iT)
{
vdRngGaussian(METHOD,stream, R, tmp3.memptr(), 0, 1);
tmp2 =(r - 0.5 * Vol * Vol) * dt + Vol * tmp * tmp3;
vdExp(R, tmp2.memptr(), tmp3.memptr());
S = Spre%tmp3;
Spre = S;
}
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
vslDeleteStream(&stream);
//system("pause");
return 0;
}
Key observation is that Armadillo exp() function is way slower than MATLAB.
Similar overhead is observed in log(), pow() and sqrt().
Just a guess, but it looks like you need to set the number of threads to use in OpenBLAS via the OPENBLAS_NUM_THREADS environment variable.
Try something like:
set OPENBLAS_NUM_THREADS=4
...on the command line before you run your program. Substitute the number of cores in your system where I put "4" (some would say set it to twice the number of cores in your system--YMMV).
Make sure you have Streaming SIMD Extensions enabled when you compile your code. In Visual Studio, check your project C/C++ compiler code generation options.