Help with code optimization - c++

I've written a little particle system for my 2d-application. Here is raining code:
// HPP -----------------------------------
struct Data
{
float x, y, x_speed, y_speed;
int timeout;
Data();
};
std::vector<Data> mData;
bool mFirstTime;
void processDrops(float windPower, int i);
// CPP -----------------------------------
Data::Data()
: x(rand()%ScreenResolutionX), y(0)
, x_speed(0), y_speed(0), timeout(rand()%130)
{ }
void Rain::processDrops(float windPower, int i)
{
int posX = rand() % mWindowWidth;
mData[i].x = posX;
mData[i].x_speed = WindPower*0.1; // WindPower is float
mData[i].y_speed = Gravity*0.1; // Gravity is 9.8 * 19.2
// If that is first time, process drops randomly with window height
if (mFirstTime)
{
mData[i].timeout = 0;
mData[i].y = rand() % mWindowHeight;
}
else
{
mData[i].timeout = rand() % 130;
mData[i].y = 0;
}
}
void update(float windPower, float elapsed)
{
// If this is first time - create array with new Data structure objects
if (mFirstTime)
{
for (int i=0; i < mMaxObjects; ++i)
{
mData.push_back(Data());
processDrops(windPower, i);
}
mFirstTime = false;
}
for (int i=0; i < mMaxObjects; i++)
{
// Sleep until uptime > 0 (To make drops fall with randomly timeout)
if (mData[i].timeout > 0)
{
mData[i].timeout--;
}
else
{
// Find new x/y positions
mData[i].x += mData[i].x_speed * elapsed;
mData[i].y += mData[i].y_speed * elapsed;
// Find new speeds
mData[i].x_speed += windPower * elapsed;
mData[i].y_speed += Gravity * elapsed;
// Drawing here ...
// If drop has been falled out of the screen
if (mData[i].y > mWindowHeight) processDrops(windPower, i);
}
}
}
So the main idea is: I have some structure which consist of drop position, speed. I have a function for processing drops at some index in the vector-array. Now if that's first time of running I'm making array with max size and process it in cycle.
But this code works slower that all another I have. Please, help me to optimize it.
I tried to replace all int with uint16_t but I think it doesn't matter.

Replacing int with uint16_t shouldn't do any difference (it'll take less memory, but shouldn't affect running time on most machines).
The shown code already seems pretty fast (it's doing only what it's needed to do, and there are no particular mistakes), I don't see how you could optimize it further (at most you could remove the check on mFirstTime, but that should make no difference).
If it's slow it's because of something else. Maybe you've got too many drops, or the rest of your code is so slow that update gets called little times per second.
I'd suggest you to profile your program and see where most time is spent.
EDIT:
one thing that could speed up such algorithm, especially if your system hasn't got an FPU (! That's not the case of a personal computer...), would be to replace your floating point values with integers.
Just multiply the elapsed variable (and your constants, like those 0.1) by 1000 so that they will represent milliseconds, and use only integers everywhere.

Few points:
Physics is incorrect: wind power should be changed as speed makes closed to wind speed, also for simplicity I would assume that initial value of x_speed is the speed of the wind.
You don't take care the fraction with the wind at all, so drops getting faster and faster. but that depends on your want to model.
I would simply assume that drop fails in constant speed in constant direction because this is really what happens very fast.
Also you can optimize all this very simply as you don't need to solve motion equation using integration as it can be solved quite simply directly as:
x(t):= x_0 + wind_speed * t
y(t):= y_0 - fall_speed * t
This is the case of stable fall when the gravity force is equal to friction.
x(t):= x_0 + wind_speed * t;
y(t):= y_0 - 0.5 * g * t^2;
If you want to model drops that fall faster and faster.

Few things to consider:
In your processDrops function, you pass in windPower but use some sort of class member or global called WindPower, is that a typo? If the value of Gravity does not change, then save the calculation (i.e. mult by 0.1) and use that directly.
In your update function, rather than calculating windPower * elapsed and Gravity * elapsed for every iteration, calculate and save that before the loop, then add. Also, re-organise the loop, there is no need to do the speed calculation and render if the drop is out of the screen, do the check first, and if the drop is still in the screen, then update the speed and render!
Interestingly, you never check to see if the drop is out of the screen interms of it's x co-ordinate, you check the height, but not the width, you could save yourself some calculations and rendering time if you did this check as well!

In loop introduce reference Data& current = mData[i] and use it instead of mData[i]. And use this reference instead of index also in procesDrops.
BTW I think that consulting mFirstTime in processDrops serves no purpose because it will never be true. Hmm, I missed processDrops in initialization loop. Never mind this.

This looks pretty fast to me already.
You could get some tiny speedup by removing the "firsttime" code and putting it in it's own functions to call once rather that testing every calls.
You are doing the same calculation on lots of similar data so maybe you could look into using SSE intrinsics to process several items at once. You'l likely have to rearrange your data structure for that though to be a structure of vectors rather than a vector od structures like now. I doubt it would help too much though. How many items are in your vector anyway?

It looks like maybe all your time goes into ... Drawing Here.
It's easy enough to find out for sure where the time is going.

Related

SDL/C++ movement: time and acceleration

I want to teach myself some basic physics programming by creating a simple 2d platformer with SDL 2. It seems I'm falling at the first hurdle though, because I can't get movement using both velocity and acceleration per time unit, rather than per frame, to work.
I start by calculating the time per frame in the usual way:
previous_time = current_time;
current_time = SDL_GetTicks();
delta_time = current_time - previous_time;
Then, after the movement flag is set to true by pressing a directional button, this is passed to a function to handle the movement.
//Pass the movement flag and the milliseconds per frame to the right movement function.
if ( player.get_x() <= 740 ) {
player.x_movement_right(delta_time, 1, moving_right);
}
The integer that's passed doesn't do anything yet. Anyway, the function then determines the acceleration based on if the movement flag is set to true, and what the current velocity is:
void Player::x_movement_right(float dt, int direction, bool moving_right) {
dt /= 1000;
if (moving_right == true && _x_velocity <= 200 ) {
_x_acceleration = 50;
}
else if ( moving_right == false && _x_velocity > 0 ) {
_x_acceleration = -50;
}
else {
_x_acceleration = 0;
}
_x_velocity += _x_acceleration * dt;
_x_position += _x_velocity * dt;
}
The same process occurs if the left movement flag is activated, with inverted values of course. Yet after compiling I hit the directional keys and nothing happens.
What I've already tried:
I removed the dt's at the bottom of the movement function. The player avatar moves with incredible speed, since the acceleration is now per frame, rather than per second.
Same thing when I don't divide dt at the beginning of the movement function, since it's now per millisecond rather than per second.
I tried rounding the velocity times dt at the bottom, since I suspected SDL might have trouble calculation positions with floating point numbers rather than integers. Still no movement.
Based on this I suspect it has something to do with the numbers being too small, but I can't quite wrap my head around what the problem is or how to solve it. So, does anyone know what undoubtedly obvious thing I'm missing? Thanks in advance!
There is no way to know with the information shown, but there are several points that may help you:
Are your _x_velocity and the like floating point types? In what units are you measuring distance? It may be that your increment has not enough resolution to be nonzero.
Have you printed the values of each variable or run the program in a debugger?
What do you mean by "SDL might have trouble calculation positions with floating point numbers"? If you are using SDL's basic 2D renderer, you just need to give it the type it needs in whatever units they ask. The conversions are up to you.
Overall, I'd recommend trying to code the simulation outside SDL or graphics in general. Getting acquainted with C++, debugging and floating-point is also a plus.

C++ plotting Mandlebrot set, bad performance

I'm not sure if there is an actual performance increase to achieve, or if my computer is just old and slow, but I'll ask anyway.
So I've tried making a program to plot the Mandelbrot set using the cairo library.
The loop that draws the pixels looks as follows:
vector<point_t*>::iterator it;
for(unsigned int i = 0; i < iterations; i++){
it = points->begin();
//cout << points->size() << endl;
double r,g,b;
r = (double)i+1 / (double)iterations;
g = 0;
b = 0;
while(it != points->end()){
point_t *p = *it;
p->Z = (p->Z * p->Z) + p->C;
if(abs(p->Z) > 2.0){
cairo_set_source_rgba(cr, r, g, b, 1);
cairo_rectangle (cr, p->x, p->y, 1, 1);
cairo_fill (cr);
it = points->erase(it);
} else {
it++;
}
}
}
The idea is to color all points that just escaped the set, and then remove them from list to avoid evaluating them again.
It does render the set correctly, but it seems that the rendering takes a lot longer than needed.
Can someone spot any performance issues with the loop? or is it as good as it gets?
Thanks in advance :)
SOLUTION
Very nice answers, thanks :) - I ended up with a kind of hybrid of the answers. Thinking of what was suggested, i realized that calculating each point, putting them in a vector and then extract them was a huge waste CPU time and memory. So instead, the program now just calculate the Z value of each point witout even using the point_t or vector. It now runs A LOT faster!
Edit: I think the suggestion in the answer of kuroi neko is also a very good idea if you do not care about "incremental" computation, but have a fixed number of iterations.
You should use vector<point_t> instead of vector<point_t*>.
A vector<point_t*> is a list of pointers to point_t. Each point is stored at some random location in the memory. If you iterate over the points, the pattern in which memory is accessed looks completely random. You will get a lot of cache misses.
On the opposite vector<point_t> uses continuous memory to store the points. Thus the next point is stored directly after the current point. This allows efficient caching.
You should not call erase(it); in your inner loop.
Each call to erase has to move all elements after the one you remove. This has O(n) runtime. For example, you could add a flag to point_t to indicate that it should not be processed any longer. It may be even faster to remove all the "inactive" points after each iteration.
It is probably not a good idea to draw individual pixels using cairo_rectangle. I would suggest you create an image and store the color for each pixel. Then draw the whole image with one draw call.
Your code could look like this:
for(unsigned int i = 0; i < iterations; i++){
double r,g,b;
r = (double)i+1 / (double)iterations;
g = 0;
b = 0;
for(vector<point_t>::iterator it=points->begin(); it!=points->end(); ++it) {
point_t& p = *it;
if(!p.active) {
continue;
}
p.Z = (p.Z * p.Z) + p.C;
if(abs(p.Z) > 2.0) {
cairo_set_source_rgba(cr, r, g, b, 1);
cairo_rectangle (cr, p.x, p.y, 1, 1);
cairo_fill (cr);
p.active = false;
}
}
// perhaps remove all points where p.active = false
}
If you can not change point_t, you can use an additional vector<char> to store if a point has become "inactive".
The Zn divergence computation is what makes the algorithm slow (depending on the area you're working on, of course). In comparison, pixel drawing is mere background noise.
Your loop is flawed because it makes the Zn computation slow.
The way to go is to compute divergence for each point in a tight, optimized loop, and then take care of the display.
Besides, it's useless and wasteful to store Z permanently.
You just need C as an input and the number of iterations as an output.
Assuming your points array only holds C values (basically you don't need all this vector crap, but it won't hurt performances either), you could do something like that :
for(vector<point_t>::iterator it=points->begin(); it!=points->end(); ++it)
{
point_t Z = 0;
point_t C = *it;
for(unsigned int i = 0; i < iterations; i++) // <-- this is the CPU burner
{
Z = Z * Z + C;
if(abs(Z) > 2.0) break;
}
cairo_set_source_rgba(cr, (double)i+1 / (double)iterations, g, b, 1);
cairo_rectangle (cr, p->x, p->y, 1, 1);
cairo_fill (cr);
}
Try to run this with and without the cairo thing and you should see no noticeable difference in execution time (unless you're looking at an empty spot of the set).
Now if you want to go faster, try to break down the Z = Z * Z + C computation in real and imaginary parts and optimize it. You could even use mmx or whatever to do parallel computations.
And of course the way to go to gain another significant speed factor is to parallelize your algorithm over the available CPU cores (i.e. split your display area is subsets and have different worker threads compute these parts in parallel).
This is not as obvious at it might seem, though, since each sub-picture will have a different computation time (black areas are very slow to compute while white areas are computed almost instantly).
One way to do it is to split the area is a large number of rectangles, and have all worker threads pick a random rectangle from a common pool until all rectangles have been processed.
This simple load balancing scheme that makes sure no CPU core will be left idle while its buddies are busy on other parts of the display.
The first step to optimizing performance is to find out what is slow. Your code mixes three tasks- iterating to calculate whether a point escapes, manipulating a vector of points to test, and plotting the point.
Separate these three operations and measure their contribution. You can optimise the escape calculation by parallelising it using simd operations. You can optimise the vector operations by not erasing from the vector if you want to remove it but adding it to another vector if you want to keep it ( since erase is O(N) and addition O(1) ) and improve locality by having a vector of points rather than pointers to points, and if the plotting is slow then use an off-screen bitmap and set points by manipulating the backing memory rather than using cairo functions.
(I was going to post this but #Werner Henze already made the same point in a comment, hence community wiki)

random access to buffer optimisation

I have colorBuffer Color[width*height] (most likely 800*600)
and during rasterization I call:
void setPixel(int x, int y, Color & color)
{
colorBuffer[y * width + x] = color;
}
It turns out that this random access to color buffer is really ineffective and slows my application down.
I think that it is caused the way I use it. I calculate some pixel (with rasterization algorithms) and call setPixel.
So I think my buffer is not in cache and this is the main problem. When trying to write into the whole buffer at once, it is much much faster.
Is there any way, how to optimize this?
edit
I do not use it to fill buffer with two for cycles.
I use it to paint "random" pixels.
eg when rasterize line I use it like
setPixel(10,10);
calculate next point
setPixel(10,11);
calculate next point
setPixel(next point)
...
They way I see it, the access-pattern to the buffer depends in the order in which your algorithm processes the pixels. Can you not simply change that order so that it creates a sequential access-scheme to your buffer?
Yes, you should try to be cache-friendly,
but the first thing I would do is find out what's taking time.
It's simple enough. Just pause it several times and see what it's doing.
If it's mostly in calculate next point, you should see what it's doing in there, because that's where the time is going.
(I assume you understand that by "in" I mean "on the stack".)
If it's mostly in SetPixel, when you pause it, look at the disassembly window.
If it's spending much time in the prologue/epilogue of the routine, it should be inlined.
If it's spending much time in the actual move instruction into colorBuffer, then you're hitting the cache issue.
If it's spending much time in the code for the index calculation y * width + x, then you might want to see if you could somehow use an initialized pointer that you step along.
If you fix anything, you should do it all again, because you may have uncovered another opportunity to speed it up further.
The first thing to notice is that the way you process your pixels makes a huge difference to speed. If you do
for (int x = 0; x < width;++x)
{
for (int y = 0; y < height; ++y)
{
setPixel(x,y,Color());
}
}
this will be really bad for performance because you're literally jumping around in memory width-wise (note that you do y*width + x).
If you simply change the order of processing to
for (int y = 0; y < height;++y)
{
for (int x = 0; x < width; ++x)
{
setPixel(x,y,Color());
}
}
you already should notice a performance gain as the processor now gets a chance to cache memory accesses (which it didn't before).
Furthermore you should check if you can determine that entire blocks of pixels will have the same color value before actually setting the memory. Then you can copy those constant color values block-wise to your image array which can save you also a good deal of performance.

C++: Calculating Moving FPS

I would like to calculate the FPS of the last 2-4 seconds of a game. What would be the best way to do this?
Thanks.
Edit: To be more specific, I only have access to a timer with one second increments.
Near miss of a very recent posting. See my response there on using exponential weighted moving averages.
C++: Counting total frames in a game
Here's sample code.
Initially:
avgFps = 1.0; // Initial value should be an estimate, but doesn't matter much.
Every second (assuming the total number of frames in the last second is in framesThisSecond):
// Choose alpha depending on how fast or slow you want old averages to decay.
// 0.9 is usually a good choice.
avgFps = alpha * avgFps + (1.0 - alpha) * framesThisSecond;
Here's a solution that might work for you. I'll write this in pseudo/C, but you can adapt the idea to your game engine.
const int trackedTime = 3000; // 3 seconds
int frameStartTime; // in milliseconds
int queueAggregate = 0;
queue<int> frameLengths;
void onFrameStart()
{
frameStartTime = getCurrentTime();
}
void onFrameEnd()
{
int frameLength = getCurrentTime() - frameStartTime;
frameLengths.enqueue(frameLength);
queueAggregate += frameLength;
while (queueAggregate > trackedTime)
{
int oldFrame = frameLengths.dequeue();
queueAggregate -= oldFrame;
}
setAverageFps(frameLength.count() / 3); // 3 seconds
}
Could keep a circular buffer of the frame time for the last 100 frames, and average them? That'll be "FPS for the last 100 frames". (Or, rather, 99, since you won't diff the newest time and the oldest.)
Call some accurate system time, milliseconds or better.
What you actually want is something like this (in your mainLoop):
frames++;
if(time<secondsTimer()){
time = secondsTimer();
printf("Average FPS from the last 2 seconds: %d",(frames+lastFrames)/2);
lastFrames = frames;
frames = 0;
}
If you know, how to deal with structures/arrays it should be easy for you to extend this example to i.e. 4 seconds instead of 2. But if you want more detailed help, you should really mention WHY you haven't access to an precise timer (which architecture, language) - otherwise everything is like guessing...

Simulated time in a game loop using c++

I am building a 3d game from scratch in C++ using OpenGL and SDL on linux as a hobby and to learn more about this area of programming.
Wondering about the best way to simulate time while the game is running. Obviously I have a loop that looks something like:
void main_loop()
{
while(!quit)
{
handle_events();
DrawScene();
...
SDL_Delay(time_left());
}
}
I am using the SDL_Delay and time_left() to maintain a framerate of about 33 fps.
I had thought that I just need a few global variables like
int current_hour = 0;
int current_min = 0;
int num_days = 0;
Uint32 prev_ticks = 0;
Then a function like :
void handle_time()
{
Uint32 current_ticks;
Uint32 dticks;
current_ticks = SDL_GetTicks();
dticks = current_ticks - prev_ticks; // get difference since last time
// if difference is greater than 30000 (half minute) increment game mins
if(dticks >= 30000) {
prev_ticks = current_ticks;
current_mins++;
if(current_mins >= 60) {
current_mins = 0;
current_hour++;
}
if(current_hour > 23) {
current_hour = 0;
num_days++;
}
}
}
and then call the handle_time() function in the main loop.
It compiles and runs (using printf to write the time to the console at the moment) but I am wondering if this is the best way to do it. Is there easier ways or more efficient ways?
I've mentioned this before in other game related threads. As always, follow the suggestions by Glenn Fiedler in his Game Physics series
What you want to do is to use a constant timestep which you get by accumulating time deltas. If you want 33 updates per second, then your constant timestep should be 1/33. You could also call this the update frequency. You should also decouple the game logic from the rendering as they don't belong together. You want to be able to use a low update frequency while rendering as fast as the machine allows. Here is some sample code:
running = true;
unsigned int t_accum=0,lt=0,ct=0;
while(running){
while(SDL_PollEvent(&event)){
switch(event.type){
...
}
}
ct = SDL_GetTicks();
t_accum += ct - lt;
lt = ct;
while(t_accum >= timestep){
t += timestep; /* this is our actual time, in milliseconds. */
t_accum -= timestep;
for(std::vector<Entity>::iterator en = entities.begin(); en != entities.end(); ++en){
integrate(en, (float)t * 0.001f, timestep);
}
}
/* This should really be in a separate thread, synchronized with a mutex */
std::vector<Entity> tmpEntities(entities.size());
for(int i=0; i<entities.size(); ++i){
float alpha = (float)t_accum / (float)timestep;
tmpEntities[i] = interpolateState(entities[i].lastState, alpha, entities[i].currentState, 1.0f - alpha);
}
Render(tmpEntities);
}
This handles undersampling as well as oversampling. If you use integer arithmetic like done here, your game physics should be close to 100% deterministic, no matter how slow or fast the machine is. This is the advantage of increasing the time in fixed time intervals. The state used for rendering is calculated by interpolating between the previous and current states, where the leftover value inside the time accumulator is used as the interpolation factor. This ensures that the rendering is is smooth, no matter how large the timestep is.
Other than the issues already pointed out (you should use a structure for the times and pass it to handle_time() and your minute will get incremented every half minute) your solution is fine for keeping track of time running in the game.
However, for most game events that need to happen every so often you should probably base them off of the main game loop instead of an actual time so they will happen in the same proportions with a different fps.
One of Glenn's posts you will really want to read is Fix Your Timestep!. After looking up this link I noticed that Mads directed you to the same general place in his answer.
I am not a Linux developer, but you might want to have a look at using Timers instead of polling for the ticks.
http://linux.die.net/man/2/timer_create
EDIT:
SDL Seem to support Timers: SDL_SetTimer