I'm doing a article about GPU speed up in cluster environment
To do that, I'm programming in CUDA, that is basically a c++ extension.
But, as I'm a c# developer I don't know the particularities of c++.
There is some concern about logging elapsed time? Some suggestion or blog to read.
My initial idea is make a big loop and run the program several times. 50 ~ 100, and log every elapsed time to after make some graphics of velocity.
Depending on your needs, it can be as easy as:
time_t start = time(NULL);
// long running process
printf("time elapsed: %d\n", (time(NULL) - start));
I guess you need to tell how you plan this to be logged (file or console) and what is the precision you need (seconds, ms, us, etc). "time" gives it in seconds.
I would recommend using the boost timer library . It is platform agnostic, and is as simple as:
#include <boost/timer/timer.hpp>
boost::timer t;
// do some stuff, up until when you want to start timing
t.restart();
// do the stuff you want to time.
std::cout << t.elapsed() << std::endl;
Of course t.elapsed() returns a double that you can save to a variable.
Standard functions such as time often have a very low resolution. And yes, a good way to get around this is to run your test many times and take an average. Note that the first few times may be extra-slow because of hidden start-up costs - especially when using complex resources like GPUs.
For platform-specific calls, take a look at QueryPerformanceCounter on Windows and CFAbsoluteTimeGetCurrent on OS X. (I've not used POSIX call clock_gettime but that might be worth checking out.)
Measuring GPU performance is tricky because GPUs are remote processing units running separate instructions - often on many parallel units. You might want to visit Nvidia's CUDA Zone for a variety of resources and tools to help measure and optimize CUDA code. (Resources related to OpenCL are also highly relevant.)
Ultimately, you want to see how fast your results make it to the screen, right? For that reason, a call to time might well suffice for your needs.
Related
I have looked into several topics to try to get some ideas on how to make a reliable clock with C or C++. However, I also saw some functions used the processor's ticks and ticks per second to calculate the end result, which I think could be a problem on a CPU with auto-overclock like the one I have. I also saw one of them reset after a while, thus is not really reliable.
The idea is to make a (preferably cross-platform) clock like an in-game one, with a precision better than a second in order to be able to add the elapsed time in the "current session" with the saved time at the end of the program. This would be to count the time spent on a console game that does not have an in-game clock, and on the long run to perhaps integrate it to actual PC games.
It should be able to run without taking too much or all of the CPU's time (or a single core's time for multi-core CPUs) as it would be quite bad to use all these resources just for the clock, and also on systems with auto-overclock (which could otherwise cause inaccurate results).
The program I would like to implement this feature into currently looks like this, but I might re-code it in C (since I have to get back to learning how to code in C++):
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
cout << "In game" << endl;
system("PAUSE");
return 0;
}
On a side-note, I still need to get rid of the PAUSE feature which is Windows-specific, but I think that can be taken care of with a simple "while (char != '\n')" loop.
What I have already skimmed through:
Using clock() to measure execution time
Calculating elapsed time in a C program in milliseconds
Time stamp in the C programming language
Execution time of C program
C: using clock() to measure time in multi-threaded programs
Is gettimeofday() guaranteed to be of microsecond resolution?
How to measure time in milliseconds using ANSI C?
C++ Cross-Platform High-Resolution Timer
Timer function to provide time in nano seconds using C++
How to measure cpu time and wall clock time?
How can I measure CPU time and wall clock time on both Linux/Windows?
how to measure time?
resolution of std::chrono::high_resolution_clock doesn't correspond to measurements
C++ How to make timer accurate in Linux
http://gameprogrammingpatterns.com/game-loop.html
clock() accuracy
std::chrono doesn't seem to be giving accurate clock resolution/frequency
clock function in C++ with threads
(Edit: Extra research, in particular for a C implementation:
Cross platform C++ High Precision Event Timer implementation (no real answer)
Calculating Function time in nanoseconds in C code (Windows)
How to print time difference in accuracy of milliseconds and nanoseconds? (could be the best answer for a C implementation)
How to get duration, as int milli's and float seconds from <chrono>? (C++ again) )
The problem is that it is not clear whether some of the mentioned methods, like Boost or SDL2, behave properly with auto-overclock in particular.
TL;DR : What cross-platform function should I use to make an accurate, sub-second precise counter in C/C++ that could work on multi-core and/or auto-overclocking processors please?
Thanks in advance.
The std::chrono::high_resolution_clock seems to be what you are looking for. On most modern CPUs it is going to be steady monotonically increased clock which would not be affected by overclocking of the CPU.
Just keep in mind that it can't be used to tell time. It is only good for telling the time intervals, which is a great difference. For example:
using clock = std::chrono::high_resolution_clock;
auto start = clock::now();
perform_operation();
auto end = clock::now();
auto us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "Operation took " << us << " microseconds.\n";
If the clock checking itself is a performance-sensitive operation, you will have to resort to platform-specific tricks, of which the most popular is reading CPU tick counter directly (RDTSC in Intel family). This is very fast, and on modern CPUs very accurate way of measuring time intervals.
I need to measure the time difference between allocating normal CPU memory with new and a call to cudaMallocManaged. We are working with unified memory and are trying to figure out the trade-offs of switching things to cudaMallocManaged. (The kernels seem to run a lot slower, likely due to a lack of caching or something.)
Anyway, I am not sure the best way to time these allocations. Would one of boost's process_real_cpu_clock, process_user_cpu_clock, or process_system_cpu_clock give me the best results? Or should I just use the regular system time call in C++11? Or should I use the cudaEvent stuff for timing?
I figure that I shouldn't use the cuda events, because they are for timing GPU processes and would not be acurate for timing cpu calls (correct me if I am wrong there.) If I could use the cudaEvents on just the mallocManaged one, what would be most accurate to compare against when timing the new call? I just don't know enough about memory allocation and timing. Everything I read seems to just make me more confused due to boost's and nvidia's shoddy documentation.
You can use CUDA events to measure the time of functions executed in the host.
cudaEventElapsedTime computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
Read more at: http://docs.nvidia.com/cuda/cuda-runtime-api/index.html
In addition, if you are also interested in timing your kernel execution time, you will find that the CUDA event API automatically blocks the execution of your code and waits until any asynchronous call ends (like a kernel call).
In any case, you should use the same metrics (always CUDA events, or boost, or your own timing) to ensure the same resolution and overhead.
The profiler `nvprof' shipped with the CUDA toolkit may help to understand and optimize the performance of your CUDA application.
Read more at: http://docs.nvidia.com/cuda/profiler-users-guide/index.html
I recommend:
auto t0 = std::chrono::high_resolution_clock::now();
// what you want to measure
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration<double>(t1-t0).count() << "s\n";
This will output the difference in seconds represented as a double.
Allocation algorithms usually optimize themselves as they go along. That is, the first allocation is often more expensive than the second because caches of memory are created during the first in anticipation of the second. So you may want to put the thing you're timing in a loop, and average the results.
Some implementations of std::chrono::high_resolution_clock have been less than spectacular, but are improving with time. You can assess your implementation with:
auto t0 = std::chrono::high_resolution_clock::now();
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration<double>(t1-t0).count() << "s\n";
That is, how fast can your implementation get the current time? If it is slow, two successive calls will demonstrate a large time in-between. On my system (at -O3) this outputs on the order of:
1.2e-07s
which means I can time something that takes on the order of 1 microsecond. To get a finer measurement than that I have to loop over many operations, and divide by the number of operations, subtracting out the loop overhead if that would be significant.
If your implementation of std::chrono::high_resolution_clock appears to be unsatisfactory, you may be able to build your own chrono clock along the lines of this. The disadvantage is obviously a bit of non-portable work. However you get the std::chrono duration and time_point infrastructure for free (time arithmetic and units conversion).
I am currently implementing a PID controller for a project I am doing, but I realized I don't know how to ensure a fixed interval for each iteration. I want the PID controller to run at a frequency of 10Hz, but I don't want to use any sleep functions or anything that would otherwise slow down the thread it's running in. I've looked around but I cannot for the life of me find any good topics/functions that simply gives me an accurate measurement of milliseconds. Those that I have found simply uses time_t or clock_t, but time_t only seems to give seconds(?) and clock_t will vary greatly depending on different factors.
Is there any clean and good way to simply see if it's been >= 100 milliseconds since a given point in time in C++? I'm using the Qt5 framework and OpenCV library and the program is running on an ODROID X-2, if that's of any helpful information to anyone.
Thank you for reading, Christian.
I don't know much about the ODROID X-2 platform but if it's at all unixy you may have access to gettimeofday or clock_gettime either one of which would provide a higher resolution clock if available on your hardware.
I am currently stumbled into this problem and I'd love to hear some suggestions from you.
I have a C++ program that uses a precompiled library to make some queries to PostgreSQL database. Now the problem is I want to find out the total (combined) cpu time it takes to do all routines described in the source code of the program and also the time it spends waiting for the database-related activities.
I used the time command in Linux, but it seems that it didn't measure the time the program spent on the database.
And in my situation, it won't be possible for me to recompile the library provided to me, so I don't think things like gprof would work.
Any suggestions?
Thank you.
Try the clock function in ctime.
clock_t start, end;
double cpu_time_used
start = clock();
// Do stuff
end = clock();
cpu_time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
Use POSIX's times, it measures real, user and system time of a process and its children.
There is an example on the linked Opengroup page: "Timing a Database Lookup"
Of course you'll get the wall-clock time anyway, but presumably you're trying to get the CPU time.
This is nontrivial when you have subprocesses (or unrelated processes) involved. However, you may want to try to have a more holistic approach to benchmarking.
Measuring the latency of an application is easy enough (just watch the wall-clock) but throughput is generally harder.
To get an idea of how an application behaves under load, you need to put it under load (on production-grade hardware), in a reproducible way.
This normally means hitting it with lots of tasks concurrently, as modern hardware tends to be able to do several things at once. Moreover, if anything in your app ever waits for any external data source (including the hard drive of your own machine potentially), you can get better throughput even on a single core by having multiple requests being served at once.
You may want to look at tools like oprofile, which is designed for profiling, not benchmarking.
You can turn on log_statement and log_duration and set log_min_duration_statement=0 in postgresql.conf, run your program, and then analyze Postgres logs using for example PQA.
I am curious if there is a build-in function in C++ for measuring the execution time?
I am using Windows at the moment. In Linux it's pretty easy...
The best way on Windows, as far as I know, is to use QueryPerformanceCounter and QueryPerformanceFrequency.
QueryPerformanceCounter(LARGE_INTEGER*) places the performance counter's value into the LARGE_INTEGER passed.
QueryPerformanceFrequency(LARGE_INTEGER*) places the frequency the performance counter is incremented into the LARGE_INTEGER passed.
You can then find the execution time by recording the counter as execution starts, and then recording the counter when execution finishes. Subtract the start from the end to get the counter's change, then divide by the frequency to get the time in seconds.
LARGE_INTEGER start, finish, freq;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&start);
// Do something
QueryPerformanceCounter(&finish);
std::cout << "Execution took "
<< ((finish.QuadPart - start.QuadPart) / (double)freq.QuadPart) << std::endl;
It's pretty easy under Windows too - in fact it's the same function on both std::clock, defined in <ctime>
You can use the Windows API Function GetTickCount() and compare the values at start and end. Resolution is in the 16 ms ballpark. If for some reason you need more fine-grained timings, you'll need to look at QueryPerformanceCounter.
C++ has no built-in functions for high-granularity measuring code execution time, you have to resort to platform-specific code. For Windows try QueryPerformanceCounter: http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
The functions you should use depend on the resolution of timer you need. Some of them give 10ms resolutions. Those functions are easier to use. Others require more work, but give much higher resolution (and might cause you some headaches in some environments. Your dev machine might work fine, though).
http://www.geisswerks.com/ryan/FAQS/timing.html
This articles mentions:
timeGetTime
RDTSC (a processor feature, not an OS feature)
QueryPerformanceCounter
C++ works on many platforms. Why not use something that also works on many platforms, such as the Boost libraries.
Look at the documentation for the Boost Timer Library
I believe that it is a header-only library, which means that it is simple to setup and use...