Measuring new/old code CPU usages, on specific functions - c++

Long story short, I'm recoding a very CPU hungry app, going to restructure it in an entirely different way, and change alot of it's inner workings. I was looking for a good way to compare old and new results.
Say I start by changing how function foo() works:
I want to have the program run for say, 60 seconds, and measure the % of CPU that function is using within the program's total CPU usage. If it's at a constant 25%, I want to know how much of those 25% is my function. Then I'll test after changing code, and have two good indicators of whether I had a good improvement.
I've tried Very Sleepy but I can't get access to the functions I wanted access to; they do not show. I want to be able to see the % usage of the function I CODED MYSELF that uses the library's functions (SDL), yet it will only show me SDL Functions.

There are a few different ways, one of which is to simply add a high precision timer call at the start and end of the function. Depending on the number of calls to your function, you can either accumulate the time, e.g:
typedef type_of_time_source tt;
tt total = 0;
void my_func(....)
{
tt time = gettime();
... lots of your code ...
time = gettime() - time;
total += time;
}
Or you can store the individual intervals, e.g.
tt array[LARGE_NUMBER];
int index = 0;
... same code as above ...
time = gettime() - time;
if (index >= LARGE_NUMBER) index = 0; // [or LARGE_NUMBER-1?]
array[index++] = time;
Of course, if your calls to SDL are in the middle of your function, you need to one way or another discount that time.
Another method would be to measure the individual timing of several functions:
enum {
FUNCA,
FUNCB,
....
MAX_TIMINGS
}
struct timing_val
{
tt start, end;
char *name;
}
struct timing_val timing_values[MAX_TIMINGS];
#define START(f) do { timing_values[f].name = #f; timing_values[f].start = gettime(); } while (0);
#define END(f) do { timing_values[f].end = gettime(); } while(0);
void report()
{
for(int i = 0; i < MAX_TIMING; i++)
{
if (timing_values[i].start == 0 && timing_vlaues[i].end
cout << timing_values[i].name <<< " time = " <<
timing_values[i].end - timing_values[i].start << endl;
}
}
void big_function()
{
START(FUNCA);
funca();
END(FUNCA);
START(FUNCB);
funcb();
END(FUNCB)
...
report();
}
I've certainly used all of these functions, and as long-running as the functions are reasonably large, it shouldn't add much overhead.
You can also measure several functions at once, e.g. if we want to have the WHOLE function, we could just add the enum "BIG_FUNC" to the enum list above, and do this:
void big_function()
{
START(BIG_FUNCTION);
START(FUNCA);
funca();
END(FUNCA);
START(FUNCB);
funcb();
END(FUNCB)
...
END(BIG_FUNCTION);
report();
}

Related

Is Increment Speed Affected By Clock Rate

Consider the loop below. This is a simplified example of a problem I am trying to solve. I want to limit the number of times doSomething function is called in each second. Since the loop works very fast, I thought I could use a rate limiter. Let's assume that I have found an appropriate value by running it with different x numbers.
unsigned int incrementionRate = x;
unsigned int counter == 0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
counter = (counter + 1) % incrementionRate;
if (counter == 0) {
doSomething();
}
}
I wonder if the number of calls to doSomething function would be less if I was working on a lower clock rate. In that case, I would like to limit the number of calls to doSomething function to once for each second. The second loop I have written is below.
float epsilon = 0.0001;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if (abs(seconds - floor(seconds)) <= epsilon) {
doSomething();
}
}
Would that do the trick for different clock cycles or are there still problems? Also, I would like to know if there is a better way of doing this. I have never worked with clock rates before and trying to understand how concerns differ when working with limited resources.
Note: Using sleep is not an option.
If I understand the issue proberly, you could use a std::chrono::steady_clock that you just add a second to every time a second has passed.
Example:
#include <chrono>
auto end_time = std::chrono::steady_clock::now();
while (true) {
// only call doSomething once a second
if(end_time < std::chrono::steady_clock::now()) {
doSomething();
// set a new end time a second after the previous one
end_time += std::chrono::seconds(1);
}
// do something else
}
Ted's answer is fine if you are really doing something else in the loop; if not, though, this results in a busy wait which is just consuming up your CPU for nothing.
In such a case you should rather prefer letting your thread sleep:
std::chrono::milliseconds offset(200);
auto next = std::chrono::steady_clock::now();
for(;;)
{
doSomething();
next += offset;
std::this_thread::sleep_until(next);
}
You'll need to include chrono and thread headers for.
I decided to go with a much more simple approach at the end. Used an adjustable time interval and just stored the latest update time, without introducing any new mechanism. Honestly, now I don't know why I couldn't think of it at first. Overthinking is a problem. :)
double lastUpdateTimestamp = 0;
const double updateInterval = 1.0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if ((elapsedSeconds - lastUpdateTimestamp) >= updateInterval) {
doSomething();
lastUpdateTimestamp = elapsedSeconds;
}
}

SetLocalTime causing PC to lag, how can I optimize it?

I want to create a program that will slow down Windows' time. I will be using SetLocalTime() for this. However, when I open the program, my PC starts to micro-stutter and game performances drops even though the process isn't using nearly any CPU.
#include <iostream>
#include "Windows.h"
#include <thread>
#include <chrono>
using namespace std;
SYSTEMTIME st;
WORD hour;
WORD minute;
WORD second = 0;
int main()
{
GetLocalTime(&st);
hour = st.wHour;
minute = st.wMinute;
second = st.wSecond;
for (;;)
{
for (int i = 0; i < 4; i++)
{
this_thread::sleep_for(chrono::milliseconds(500));
st.wHour = hour;
st.wMinute = minute;
st.wSecond = second;
SetLocalTime(&st);
}
second++;
if (second == 60)
{
second = 0;
minute++;
}
if (minute == 60)
{
minute = 0;
hour++;
}
}
}
If you change the system clock, all the programs that use it for timing will also slow down.
From your comments, I could gather that you wish to time scale an application that uses time. So far, you didn't get more specific, so I cannot suggest anything more than a general approach.
Create a time manager class that, when you start your application, gets the current system time and store it as the base time of your application. Instead of using GetLocalTime() or GetSystemTime(), create a method in your class that will return the current time based on a time dilatation factor.
class TimeManager
{
private:
SYSTEMTIME _BaseTime;
double _TimeDilatation;
public:
TimeManager();
void SetTimeDilatation(double timeDilatation);
void GetTime(LPSYSTEMTIME lpSystemTime);
};
// Constructor will get the current local time.
TimeManager::TimeManager()
{
GetLocalTime(&_BaseTime);
}
// Sets the time dilatation factor.
// 0.0 to 0.9 time will slow down
// 1.0 normal flow of time
// 1.1 to max double, time will go faster
void TimeManager::SetTimeDilatation(double timeDilatation)
{
_TimeDilatation = timeDilatation;
}
// Get the current time taking into account time dilatation
void TimeManager::GetTime(LPSYSTEMTIME lpSystemTime)
{
SYSTEMTIME resultingTime;
SYSTEMTIME realTime;
FILETIME ftime;
ULARGE_INTEGER uliTime;
__int64 lowerValue, higherValue, result;
// Get the current local time
GetLocalTime(&realTime);
// Translate the base time into a large integer for subtraction
SystemTimeToFileTime(&_BaseTime, &ftime);
uliTime.LowPart = ftime.dwLowDateTime;
uliTime.HighPart = ftime.dwHighDateTime;
lowerValue = uliTime.QuadPart;
// Translate the current time into a large integer for subtraction
SystemTimeToFileTime(&realTime, &ftime);
uliTime.LowPart = ftime.dwLowDateTime;
uliTime.HighPart = ftime.dwHighDateTime;
higherValue = uliTime.QuadPart;
// Get the time difference and multiply the dilatation factor
result = (higherValue - lowerValue) * _TimeDilatation;
// Apply the difference to the base time value
result = lowerValue + result;
// Convert the new time back into a SYSTEMTIME value
uliTime.QuadPart = result;
ftime.dwLowDateTime = uliTime.LowPart;
ftime.dwHighDateTime = uliTime.HighPart;
FileTimeToSystemTime(&ftime,&resultingTime);
// Assign it to the pointer passed in parameter, and feel like a Time Lord.
*lpSystemTime = resultingTime;
}
int main()
{
TimeManager TM;
TM.SetTimeDilatation(0.75f); // the time will pass 75% slower
for (;;)
{
SYSTEMTIME before, after;
TM.GetTime(&before);
// Do something that should take exactly one minute to process.
TM.GetTime(&after);
// Inspect the value of before and after, you'll see
// that only 45 secondes has passed
}
}
Note that it's a general idea to push you in the right direction. I haven't compile that code, so there maybe an error or five. Feel free to point them out, and I'll fix my post. I just didn't want to be too specific since your question is a bit broad ; this code may or may not help you depending on your use case. But that's generally how you slow down time without affecting system time.

I can't make my function calculate how much time has passed and print stuff accordingly

bool IsGameEnded()
{
static int i = 0;
i++;
if (i == 10)
return true;
return false;
}
int main()
{
bool GameEnd = false;
float ElapsedTime = 0;
while(!GameEnd)
{
chrono::steady_clock::time_point StartingTime = chrono::steady_clock::now();
if (ElapsedTime > 10)
{
ElapsedTime = 0;
draw();
}
GameEnd = IsGameEnded();
chrono::steady_clock::time_point EndingTime = chrono::steady_clock::now();
ElapsedTime = ElapsedTime + chrono::duration_cast<chrono::milliseconds>(EndingTime - StartingTime).count();
}
return 0;
}
I wan't to make a snake game. It will be based on time. For example screen will update every 5 seconds or so. For that I used chrono library. I am not used to this trying o learn it so I might have missed something obvious. But the problem is main function doesn't get get into the if block. So it draws nothing to the console.
I tried debugging (with running line by line). It is not actually like a running program becasue time intervals get long but it enters if block every time. Also if I make the if condition 2 nanoseconds it also works but since cout function can not print so fast I need it to be a lot longer than that. While Debugging I also realised that "StartingTime" and "EndingTime" variables don't get initiated (unless I directly stop on them) . The interesting part is If ı add cout after if block, after a while program starts entering the If block.
When you do:
chrono::duration_cast<chrono::milliseconds>(EndingTime - StartingTime).count();
not enough time has passed, and the count of milliseconds always returns 0. This means you always add 0 to ElapsedTime and it never crosses 10.
One fix is to use a smaller resolution:
chrono::duration_cast<chrono::nanoseconds>(EndingTime - StartingTime).count();
as you mentioned in the question, and adjust the if condition appropriately.
However, the best fix would be to change ElapsedTime from a float to a chrono::duration (of the appropriate unit) since that is the unit that the variable represents. This would let you avoid having to do .count() on the duration as well.

Function consuming wall time for specific amount of time

I have a function which has a factor that needs to be adjusted according to the load on the machine to consume exactly the wall time passed to the function. The factor can vary according to the load of the machine.
void execute_for_wallTime(int factor, int wallTime)
{
double d = 0;
for (int n = 0; n<factor; ++n)
for (int m = 0; wall_time; ++m)
d += d * n*m;
}
Is there a way to dynamically check the load on the machine and adjust the factor accordingly in order to consume the exact wall time passed to the function?
The wall time is read from the file and passed to this function. The values are in micro seconds, e.g:
73
21
44
According to OP comment:
#include <sys/time.h>
int deltaTime(struct timeval *tv1, struct timeval *tv2){
return ((tv2->tv_sec - tv1->tv_sec)*1000000)+ tv2->tv_usec - tv1->tv_usec;
}
//might require longs anyway. this is time in microseconds between the 2 timevals
void execute_for_wallTime(int wallTime)
{
struct timeval tvStart, tvNow;
gettimeofday(&tvStart, NULL);
double d = 0;
for (int m = 0; wall_time; ++m){
gettimeofday(&tvNow, NULL);
if(deltaTime(tvStart,tvNow) >=wall_time) { // if timeWall is 1000 microseconds,
// this function returns after
// 1000 microseconds (and a
// little more due to overhead)
return;
}
d += d*m;
}
}
Now deal with timeWall by increasing or decreasing it in a logic outside this function depending on your performance calculations. This function simply runs for timeWall microseconds.
For C++ style, you can use std::chrono.
I must comment that I would handle things differently, for example by calling nanosleep(). The operations make no sense unless in actual code you plan to substitute these "fillers" with actual operations. In that case you might consider threads and schedulers. Besides the clock calls add overhead.

Would a pre-calculated variable faster than calculating it every time in a loop?

In a function that updates all particles I have the following code:
for (int i = 0; i < _maxParticles; i++)
{
// check if active
if (_particles[i].lifeTime > 0.0f)
{
_particles[i].lifeTime -= _decayRate * deltaTime;
}
}
This decreases the lifetime of the particle based on the time that passed.
It gets calculated every loop, so if I've 10000 particles, that wouldn't be very efficient because it doesn't need to(it doesn't get changed anyways).
So I came up with this:
float lifeMin = _decayRate * deltaTime;
for (int i = 0; i < _maxParticles; i++)
{
// check if active
if (_particles[i].lifeTime > 0.0f)
{
_particles[i].lifeTime -= lifeMin;
}
}
This calculates it once and sets it to a variable that gets called every loop, so the CPU doesn't have to calculate it every loop, which would theoretically increase performance.
Would it run faster than the old code? Or does the release compiler do optimizations like this?
I wrote a program that compares both methods:
#include <time.h>
#include <iostream>
const unsigned int MAX = 1000000000;
int main()
{
float deltaTime = 20;
float decayRate = 200;
float foo = 2041.234f;
unsigned int start = clock();
for (unsigned int i = 0; i < MAX; i++)
{
foo -= decayRate * deltaTime;
}
std::cout << "Method 1 took " << clock() - start << "ms\n";
start = clock();
float calced = decayRate * deltaTime;
for (unsigned int i = 0; i < MAX; i++)
{
foo -= calced;
}
std::cout << "Method 2 took " << clock() - start << "ms\n";
int n;
std::cin >> n;
return 0;
}
Result in debug mode:
Method 1 took 2470ms
Method 2 took 2410ms
Result in release mode:
Method 1 took 0ms
Method 2 took 0ms
But that doesn't work. I know it doesn't do exactly the same, but it gives an idea.
In debug mode, they take roughly the same time. Sometimes Method 1 is faster than Method 2(especially at fewer numbers), sometimes Method 2 is faster.
In release mode, it takes 0 ms. A little weird.
I tried measuring it in the game itself, but there aren't enough particles to get a clear result.
EDIT
I tried to disable optimizations, and let the variables be user inputs using std::cin.
Here are the results:
Method 1 took 2430ms
Method 2 took 2410ms
It will almost certainly make no difference what so ever, at least if
you compile with optimization (and of course, if you're concerned with
performance, you are compiling with optimization). The opimization in
question is called loop invariant code motion, and is universally
implemented (and has been for about 40 years).
On the other hand, it may make sense to use the separate variable
anyway, to make the code clearer. This depends on the application, but
in many cases, giving a name to the results of an expression can make
code clearer. (In other cases, of course, throwing in a lot of extra
variables can make it less clear. It's all depends on the application.)
In any case, for such things, write the code as clearly as possible
first, and then, if (and only if) there is a performance problem,
profile to see where it is, and fix that.
EDIT:
Just to be perfectly clear: I'm talking about this sort of code optimization in general. In the exact case you show, since you don't use foo, the compiler will probably remove it (and the loops) completely.
In theory, yes. But your loop is extremely simple and thus likeley to be heavily optimized.
Try the -O0 option to disable all compiler optimizations.
The release runtime might be caused by the compiler statically computing the result.
I am pretty confident that any decent compiler will replace your loops with the following code:
foo -= MAX * decayRate * deltaTime;
and
foo -= MAX * calced ;
You can make the MAX size depending on some kind of input (e.g. command line parameter) to avoid that.