I have a code written in C++ in Visual Studio:
auto start = std::chrono::high_resolution_clock::now();
result = function(-1, 1, 9999999);
auto end = std::chrono::high_resolution_clock::now();
double time_taken = chrono::duration_cast<chrono::microseconds>(end - start).count();
time_taken *= 1e-6;
std::cout << "result: " << result << "time : " << fixed << time_taken << setprecision(6) << " sec" << endl;
The problem is: I run the code in Release mode and time_taken always equals 0. When I switch to the Debug mode time_taken is between 1 and 2 seconds. I tried different ways to pinpoint time but time_taken always equals 0. How can I fix this?
Thanks in advance for your help!
Apparently you are lacking clock resolution. Or the function was partly or fully optimized away.
Generally it is not trivial to profile small functions.
One thing you should do is to call it a lot of times, and measure time of whole run, then divide this time by number of calls.
Also make sure the function is actually fully computed at runtime by stroing result in a volatile variable, and taking input from volatile variables.
Related
My question is about the difference of the elapsed time according to the point.
For finding the largest portion of the total elapsed time when executing in my code, I used clock function.
source : calculating time elapsed in C++
First, I put the clock function at the start and end of the main function.
(Actually, there are some declaration of variables but I deleted them for readability of my questions). Then I think I will be able to measure the total elapsed time.
int main(){
using clock = std::chrono::system_clock;
using sec = std::chrono::duration<double>;
const auto before = clock::now();
...
std::cin >> a >> b;
lgstCommSubStr findingLCSS(a,b,numberofHT,cardi,SubsA);
const sec duration = clock::now() - before;
std::cout << "It took " << duration.count() << "s in main function" << std::endl;
return 0;
}
Second, I put the clock function at the class findingLCSS. This class is for finding longest common sub-string between two string. It is the class that actually do my algorithm. I write the code for finding it in its constructor. Therefore, when making this class, it returns longest common sub-string information. I think this elapsed time will be the actual algorithm running time.
public:
lgstCommSubStr(string a, string b, int numHT, int m, vector <int> ** SA):
strA(a), strB(b), hashTsize(numHT), SubstringsA(SA),
primeNs(numHT), xs(numHT),
A_hashValues(numHT), B_hashValues(numHT),
av(numHT), bv(numHT), cardi(m)
{
using clock = std::chrono::system_clock;
using sec = std::chrono::duration<double>;
const auto before = clock::now();
...
answer ans=binarySearch(a,b, numHT);
std::cout << ans.i << " " << ans.j << " " << ans.length << "\n";
const sec duration = clock::now() - before;
std::cout << "It took " << duration.count() << "s in the class" << std::endl;
}
The output is as below.
tool coolbox
1 1 3
It took 0.002992s in inner class
It took 4.13945s in main function
It means 'tool' and 'coolbox' have a substring 'ool'
But I am confused that there is a big difference between two times.
Because the first time is total time and the second time is algorithm running time, I have to think its difference time is the elapsed time for declaration variables.
But it looks weird because I think declaration variables time is short.
Is there a mistake in measuring the elapsed time?
Please give me a hint for troubleshoot. Thank you for reading!
Taking a snapshot of the time before std::cin >> a >> b; leads to an inaccurate measurement as you're likely starting the clock before you type in the values for a and b. Generally you want to put your timing as close as possible to the thing you're actually measuring.
I'm trying to figure out how to time the execution of part of my program, but when I use the following code, all I ever get back is 0. I know that can't be right. The code I'm timing recursively implements mergesort of a large array of ints. How do I get the time it takes to execute the program in milliseconds?
//opening input file and storing contents into array
index = inputFileFunction(inputArray);
clock_t time = clock();//start the clock
//this is what needs to be timed
newRecursive.mergeSort(inputArray, 0, index - 1);
//getting the difference
time = clock() - time;
double ms = double(time) / CLOCKS_PER_SEC * 1000;
std::cout << "\nTime took to execute: " << std::setprecision(9) << ms << std::endl;
You can use the chrono library in C++11. Here's how you can modify your code:
#include <chrono>
//...
auto start = std::chrono::steady_clock::now();
// do whatever you're timing
auto end = std::chrono::steady_clock::now();
auto durationMS = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "\n Time took " << durationMS.count() << " ms" << std::endl;
If you're developing on OSX, this blog post from Apple may be useful. It contains code snippets that should give you the timing resolution you need.
Referring to Obtaining Time in milliseconds
Why does below code produce zero as output?
int main()
{
steady_clock::time_point t1 = steady_clock::now();
//std::this_thread::sleep_for(std::chrono::milliseconds(1500));
steady_clock::time_point t2 = steady_clock::now();
auto timeC = t1.time_since_epoch().count();
auto timeD = t2.time_since_epoch().count();
auto timeA = duration_cast<std::chrono::nanoseconds > ( t1.time_since_epoch()).count();
auto timeB = duration_cast<std::chrono::nanoseconds > ( t2.time_since_epoch()).count();
std::cout << timeC << std::endl;
std::cout << timeB << std::endl;
std::cout << timeD << std::endl;
std::cout << timeA << std::endl;
std::cout << timeB - timeA << std::endl;
system("Pause");
return 0;
}
The output:
14374083030139686
1437408303013968600
14374083030139686
1437408303013968600
0
Press any key to continue . . .
I suppose there should be a difference of few nanoseconds, because of instruction execution time.
Under VS2012, steady_clock (and high_resolution_clock) uses GetSystemTimeAsFileTime, which has a very low resolution (and is non-steady to boot). This is acknowledged as a bug by Microsoft.
Your workaround is to use VS2015, use Boost.Chrono, or implement your own clock using QueryPerformanceCounter (see: https://stackoverflow.com/a/16299576/567292).
Just because you ask it to represent the value in nanoseconds, doesn't mean that the precision of the measurement is in nanoseconds.
When you look at your output you can see that the count are nanoseconds / 100. That that means that the count is representing time in units of 100 nanoseconds.
But even that does not tell you the period of the underlying counter on which steady_clock is built. All you know is it can't be better than 100 nanoseconds.
You can tell the actual period used for the counter by using the periodmember of the steady_clock
double periodInSeconds = double(steady_clock::period::num)
/ double(steady_clock::period::den);
Back to your question: "Why does below code produce zero as output?"
Since you haven't done any significant work between the two calls to now() it is highly unlikely that you have used up 100 nanoseconds, so the answers are the same -- hence the zero.
Here it says that sleep_for "Blocks the execution of the current thread for at least the specified sleep_duration."
Here it says that sleep_until "Blocks the execution of the current thread until specified sleep_time has been reached."
So with this in mind I was simply doing my thing, until I noticed that my code was sleeping a lot shorter than the specified time. To make sure it was the sleep_ code being odd instead off me making dumb code again, I created this online example: https://ideone.com/9a9MrC
(code block below the edit line) When running the online example, it does exactly what it should be doing, but running the exact same code sample on my machine gives me this output: Output on Pastebin
Now I'm truly confused and wondering what the bleep is going wrong on my machine. I'm using Code::Blocks as IDE on a Win7 x64 machine in combination with This toolchain containing GCC 4.8.2.
*I have tried This toolchain before the current one, but this one with GCC 4.8.0 strangely enough wasn't even able to compile the example code.
What could create this weird behaviour? My machine? Windows? GCC? Something else in the toolchain?
p.s. The example also works as it should on Here, which states that it uses GCC version 4.7.2
p.p.s. using #include <windows.h> and Sleep( 1 ); also sleeps a lot shorter than the specified 1 millisecond on my machine.
EDIT: code from example:
#include <iostream> // std::cout, std::fixed
#include <iomanip> // std::setprecision
//#include <string> // std::string
#include <chrono> // C++11 // std::chrono::steady_clock
#include <thread> // C++11 // std::this_thread
std::chrono::steady_clock timer;
auto startTime = timer.now();
auto endTime = timer.now();
auto sleepUntilTime = timer.now();
int main() {
for( int i = 0; i < 10; ++i ) {
startTime = timer.now();
sleepUntilTime = startTime + std::chrono::nanoseconds( 1000000 );
std::this_thread::sleep_until( sleepUntilTime );
endTime = timer.now();
std::cout << "Start time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( startTime.time_since_epoch() ).count() << "\n";
std::cout << "End time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( endTime.time_since_epoch() ).count() << "\n";
std::cout << "Sleep till: " << std::chrono::duration_cast<std::chrono::nanoseconds>( sleepUntilTime.time_since_epoch() ).count() << "\n";
std::cout << "It took: " << std::chrono::duration_cast<std::chrono::nanoseconds>( endTime - startTime ).count() << " nanoseconds. \n";
std::streamsize prec = std::cout.precision();
std::cout << std::fixed << std::setprecision(9);
std::cout << "It took: " << ( (float) std::chrono::duration_cast<std::chrono::nanoseconds>( endTime - startTime ).count() / 1000000 ) << " milliseconds. \n";
std::cout << std::setprecision( prec );
}
std::cout << "\n\n";
for( int i = 0; i < 10; ++i ) {
startTime = timer.now();
std::this_thread::sleep_for( std::chrono::nanoseconds( 1000000 ) );
endTime = timer.now();
std::cout << "Start time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( startTime.time_since_epoch() ).count() << "\n";
std::cout << "End time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( endTime.time_since_epoch() ).count() << "\n";
std::cout << "It took: " << std::chrono::duration_cast<std::chrono::nanoseconds>( endTime - startTime ).count() << " nanoseconds. \n";
std::streamsize prec = std::cout.precision();
std::cout << std::fixed << std::setprecision(9);
std::cout << "It took: " << ( (float) std::chrono::duration_cast<std::chrono::nanoseconds>( endTime - startTime ).count() / 1000000 ) << " milliseconds. \n";
std::cout << std::setprecision( prec );
}
return 0;
}
Nothing is wrong with your machine, it is your assumptions that are wrong.
Sleeping is a very system-dependent and unreliable thing. Generally, on most operating systems, you have a more-or-less-guarantee that a call to sleep will delay execution for at least the time you ask for. The C++ thread library necessarily uses the facilities provided by the operating system, hence the wording in the C++ standard that you quoted.
You will have noted the wording "more-or-less-guarantee" in the above paragraph. First of all, the way sleeping works is not what you might think. It generally does not block until a timer fires and then resumes execution. Instead, it merely marks the thread as "not ready", and additionally does something so this can be undone later (what exactly this is isn't defined, it might be setting a timer or something else).
When the time is up, the operating system will set the thread to "ready to run" again. This doesn't mean it will run, it only means it is a candidate to run, whenever the OS can be bothered and whenever a CPU core is free (and nobobdy with higher priority wants it).
On traditional non-tickless operating systems, this will mean that the thread will probably (or more precisely, maybe) run at the next scheduler tick. That is, if CPU is available at all. On more modern operating systems (Linux 3.x or Windows 8) which are "tickless", you're a bit closer to reality, but you still do not have any hard guarantees.
Further, under Unix-like systems, sleep may be interrupted by a signal and may actually wait less than the specified time. Also, under Windows, the interval at which the scheduler runs is configurable, and to make it worse, different Windows versions behave differently [1] [2] in respect of whether they round the sleep time up or down.
Your system (Windows 7) rounds down, so indeed yes, you may actually wait less than what you expected.
tl;dr
Sleep is unreliable and only a very rough "hint" (not a requirement) that you wish to pass control back to the operating system for some time.
I have a program which reads 2 input files. First file contains some random words which are put into an BST and AVL tree. Then the program looks for the words listed in the second read file and says if they exist in the trees, then writes an output file with the information gathered. While doing this the program prints out the time spent for finding a certain item. However the program does not seem to be measuring the time spent.
BST* b = new BST();
AVLTree* t = new AVLTree();
string s;
ifstream in;
in.open(argv[1]);
while(!in.eof())
{
in >> s;
b->insert(s);
t->insert(s);
}
ifstream q;
q.open(argv[2]);
ofstream out;
out.open(argv[3]);
int bstItem = 0;
int avlItem = 0;
float diff1 = 0;
float diff2 = 0;
clock_t t1, t1e, t2, t2e;
while(!q.eof())
{
q >> s;
t1 = clock();
bstItem = b->findItem(s);
t1e = clock();
diff1 = (float)(t1e - t1)/CLOCKS_PER_SEC;
t2 = clock();
avlItem = t->findItem(s);
t2e = clock();
diff2 = (float)(t2e - t2)/CLOCKS_PER_SEC;
if(avlItem == 0 && bstItem == 0)
cout << "Query " << s << " not found in " << diff1 << " microseconds in BST, " << diff2 << " microseconds in AVL" << endl;
else
cout << "Query " << s << " found in " << diff1 << " microseconds in BST, " << diff2 << " microseconds in AVL" << endl;
out << bstItem << " " << avlItem << " " << s << "\n";
}
The clock() value I get just before entering while and just after finishing it is exactly the same. So it appears as if the program does not even run the while loop at all, so it print 0. I know that this is not the case since it takes around 10 seconds for the program the finish as it should. Also the output file contains correct results, so the possibility of having bad findItem() functions is also not true.
I did a little bit research in Stack Overflow, and saw that many people experience the same problem as me. However none of the answers I read solved it.
I solved my problem using a higher resolution clock, though the clock resolution was not my problem. I used clock_gettime() from time.h. As far as I know higher clock resolutions than clock() is platform dependent and this particular method I used in my code is only available for Linux. I still haven't figured out why I wasn't able to obtain healthy results from clock(), but I suspect platform dependency again.
An important note, the use of clock_gettime() requires you to include POSIX real time extension when compiling the code.
So you should do:
g++ a.cpp b.cpp c.cpp -lrt -o myProg
where -lrt is the parameter to include POSIX extensions.
If (t1e - t1) is < CLOCKS_PER_SEC your result will always be 0 because integer division is truncated. Cast CLOCKS_PER_SEC to float.
diff1 = (t1e - t1)/((float)CLOCKS_PER_SEC);