Performance OVER TIME problem - c++

Hey so i'm making a simple text game using the pdCurses library and a few other minor things kind of like a vertical scroller in which you avoid the randomly generated walls....
Theres two walls on lef and right made out of 'X' characters and blank, black space in which you can move around and avoid the 'X's your character is an '8' and your forced to continue forward or get touched by the X's each time a new line of the randomly generated "map" is revealed( for performance tests i made a new line shown as fast as possible).
However i'm having some performance problems as the "map" (a vector of strings) gets bigger and bigger. I do not understand the problem however as i am not using all of it at any time, i'm only pulling out parts of it to display (56 lines usually).
I'll show you what i've got and hopefully someone will help or suggest a better way to accomplish my game.
Here's the condensed, important code:
Here's the function that is taking like .25-.75 seconds (new_map is also a vector member of the "Screen" class):
void Insert(const Map& map, int y1, int y2) {for ( int mc = y1, nm= 0; mc< map.Contents().size() && mc< y2; mc++, nm++)
new_map[nm] = map.Contents(mc);};
Here's the Map classes contents functions:
string Contents(int Y) {return contents[Y];};
char Contents(int Y, int X) {return contents[Y][X];};
vector <string> Save() {return save;};
and finally the main() which i have set so the screen updates as fast as possible... which isn't turning out to be so fast oh and L1 is one of my "maps"...
generate adds on new lines to the map as so it never ends:
double refreshes= 0;
for (bool quit = false; quit != true;)
{ double newTime= myStopwatch.ElapsedTime()- refreshes;
theScreen.Insert(L1, 0+refreshes, nrows+refreshes);
refreshes++;
if(L1.Contents().size()<= nrows+refreshes+2)
L1.generate();}
Thanks for any help or tips!! I know it's pretty terrible but i just started programming 2 months ago haha! =) ask if you need any more info.

The general issue seems to be that as your data set gets larger things like copying it around (which you seem to do all the time, e.g. in your snippet you are copying strings from one vector to the other) take longer.
Things to consider:
Do you need the data that is coming
off the screen?
Can you use C style array? Those would generally be faster and you'd also be able to see more clearly where the inefficiencies are.
Try thinking of your scrolling as simply moving the point from where you fetch data to the screen. You shouldn't need to make any copies or move any data around.
As an example for point #1 you may prefer to store your screen lines in a list and do something like:
std::list<string> screen;
// fill up with your initial screen.
generate(line); // generate your new line
screen.push_back(line); // to add a line
screen.pop_front(); // remove a line from the top
It's not perfect (there's memory management and some copying behind the scenes) but it will outperform copying and accumulating all the screen lines.
As an example for point #2, consider this code:
char screen[25][80];
for(int y = 0; y < 25; y++)
{
for(int x = 0; x < 79; x++)
{
screen[y][x] = screen[y][x+1];
}
}
This will "scroll" screen one character to the left. It will keep running at a constant and pretty fast time - on any modern CPU you can expect to be able to do over about a million of these scrolling operations per second.

Try unordered_map instead, its performance characteristics are somewhat better as the size grows. oops, you've got your own Map class that has nothing to do with std::map.
Anyway, the function you showed definitely could get slower as the vector gets larger. You say you don't need all the data at once, so some other data structure would probably be better. But without knowing more about your design (and by that I mean an explanation of what you're trying to accomplish, not just a copy of all the code), it's hard to recommend one.
You might think about keeping track of the indexes somehow, instead of copying the strings themselves around.

Related

C++ speed up method call

I am working on a very time consuming application and I want to speed it up a little. I analyzed the runtime of single parts using the clock() function of the ctime library and found something, which is not totally clear to me.
I have time prints outside and inside of a method, lets call it Method1. The print inside Method1 includes the whole body of it, only the return of a float is exluded of course. Well, the thing is, that the print outside states twice to three times the time of the print inside Method1. It's obvious, that the print outside should state more time, but the difference seems quite big to me.
My method looks as follows, I am using references and pointers as parameters to prevent copying of data. Note, that the data vector includes 330.000 pointers to instances.
float ClassA::Method1(vector<DataClass*>& data, TreeClass* node)
{
//start time measurement
vector<Mat> offset_vec_1 = vector<Mat>();
vector<Mat> offset_vec_2 = vector<Mat>();
for (int i = 0; i < data.size(); i++)
{
DataClass* cur_data = data.at(i);
Mat offset1 = Mat();
Mat offset2 = Mat();
getChildParentOffsets(cur_data, node, offset1, offset2);
offset_vec_1.push_back(offset1);
offset_vec_2.push_back(offset2);
}
float ret = CalculateCovarReturnTrace(offset_vec_1) + CalculateCovarReturnTrace(offset_vec_2);
//end time measurement
return ret;
}
Is there any "obvious" way to increase the call speed? I would prefer to keep the method for readability reasons, thus, can I change anything to gain a speed up?
I am appreciating any suggestions!
Based on your updated code, the only code between the end time measurement and the measurement after the function call is the destructors for constructed objects in the function. That being the two vectors of 330,000 Mats each. Which will likely take some time depending on the resources used by each of those Mats.
Without trying to lay claim to any of the comments made by others to the OP ...
(1) The short-answer might well be, "no." This function appears to be quite clear, and it's doing a lot of work 30,000 times. Then, it's doing a calculation over "all that data."
(2) Consider re-using the "offset1" and "offset2" matrices, instead of creating entirely new ones for each iteration. It remains to be seen, of course, whether this would actually be faster. (And in any case, see below, it amounts to "diddling the code.")
(3) Therefore, borrowing from The Elements of Programming Style: "Don't 'diddle' code to make it faster: find a better algorithm." And in this case, there just might not be one. You might need to address the runtime issue by "throwing silicon at it," and I'd suggest that the first thing to do would be to add as much RAM as possible to this computer. A process that "deals with a lot of data" is very-exposed to virtual memory page-faults, each of which requires on the order of *milli-*seconds to resolve. (Those hundredths of a second add-up real fast.)
I personally do not see anything categorically wrong with this code, nor anything that would categorically cause it to run faster. Nor would I advocate re-writing ("diddling") the code from the very-clear expression of it that you have right now.

Most efficient way to roll back data. To turn back time

So, I have a 3d platformer. And I want to have a button that if you hold it it makes you "go back in time". Thankfully the game is rather simple and only has one entity so the only thing that would have to be saved for each frame is.
struct Coord {
float x;
float y;
float z;
}
structure Bool6 {
bool front;
bool back;
bool left;
bool right;
bool top;
bool bottom;
}
struct Player {
Coord Pos;
Coord Vel;
Bool6 Col;
}
But I fear that is a lot of data especially since my game theoretically runs somewhere around 60fps and it would be good to have 5 seconds or so (300 frames) of data saved that can be accessed when roll-backed. I have considered each frame doing something like this
Player Data[300];
for (int i = 299; i > 0; i--)
{
Data[i] = Data[(i-1)];
}
Data[0] = "THIS FRAMES DATA";
However that sounds like it means an outrageous amount of processing power is going just in storing each frame.
Is their a more efficient way to store this data keeping all of the data in order?
Also is their a way I can tell an array slot that it has nothing? So that their arent problems if the player tries to rollback before all of the array slots are filled or after rolling back? I believe in C# i would have set it equal to NULL... but that doesn't work in c++ probably because im using structures.
Thanks much!
However that sounds like it means an outrageous amount of processing power
Before making such a statement it can be useful to do the math. It seems the data you are concerned with is about 40 bytes, so 40*300 = 12 kB. This can easily fit in memory and is far from an "outrageous amount of processing power" on modern computers.
Is their a more efficient way to store this data keeping all of the data in order?
Yes. If your game is deterministic, all you have to store is the player's input and one game state 5 seconds ago. When rolling back, reset the game state and replay user inputs to recompute each frame data.
See this question for interesing discussion on how to design your own replay system on gamedev stackexchange.
I don't think an array of 300 relatively small elements will slow you down at all, have you tried profiling it yet?
That said you could store it in a vector and keep an iterator to the "current" and update that.
If you think storing 300 is a lot, store less, for example you can store 1/5 frames :
....|....|....|....|..*
* is your position, `/` the frames you will store and `....` other frames
and you don't have to copy all the saved data each time ... just delete the first and add one at the end, you can maybe use a std::list, you won't have to copy any data
in Every 5 frames you will call myList.pop_front(); the oldest frame and myList.push_back(); the newest
I don't think the storage requirement is THAT harsh - even with 300 frames this small object will take up much less memory than your average texture.
I suggest you avoid using a raw array and look at using a std::vector which is almost as efficient and will automatically resize as you need more buffer space (that way if you suddenly need 8 second or the fps goes up to 100 you aren't going to suddenly run out of buffer space). This also resolves your difficulty with unfilled 'slots' as the vector has a known size that you can efficiently access.
I also might suggest that you don't need to store every frame - games like Prince of Persia that do this trick if you watch them carefully are much less smooth when time runs backwards suggesting that they perhaps only store every few frames or something like twice a second as opposed to every frame.
Unless you're running on an MCU, data sizes (given the structures you've provided) will be negligible compared to the rest of your game (~10K, if I calculated it correctly, is nothing for modern PCs).
CPU-wise, moving data in a manner you've specified and on every frame, MIGHT be sub-optimal (it will move around 10K 60 times per second, or 600K per second, which MIGHT - though probably won't - be noticeable). IF it becomes a concern, I'd go for a circular buffer (for example, as in boost::circular_buffer), or for a std::deque or std::list (with deque probably being my first choice); all of them have O(1) insertion/deletion time, and insertion/deletion is what you need most of the time. Theoretically, there is also an option to use memmove() to speed things up without changing things much, but it is quite error-prone, and still has O(N) complexity, so I'd rather not do it.
Indeed, you could do less operations per frame to store the state.
You could use a std::vector. Then, at each frame, push_back() the new state and if(vector.size() > 300), then do a pop_front().
If you think it's not enough, just save less frequently (each half a second). When you roll back your info, you could do an interpolation between values.
Edit:
you're damn right Othman, vector doesn't have pop_front, so you can use vector.erase(vector.begin()) instead :) So you don't have to use linked lists.

Tic Tac Toe Random AI

I am working on building a Tic Tac Toe game with varying AI implementations for a computer opponent for the sake of learning different algorithms and how to implement them. The first I am trying which should be the easiest is just having the computer choose a random space each time.
This is working for me to a certain extent, the issue becomes run time. Every time the aiRandMove() method is called, it takes longer and longer to pick a move to the point where after 5 moves have been made on board (cpu + user combined) the program appears to hang (although this isn't technically the case).
Upon further debugging on my part I realize that this should be expected as the aiRandMove() method is randomly choosing an X and Y coordinate and then the move is tested to see if it is legal. As less and less spaces are open, there are fewer and fewer legal moves, thus many more failed attempts by the randomizer to generate a legal move.
My questions is, Is there any way I can modify this that would at least reduce the time taken by the function? As far as I can tell from googling and just running through the problem myself, I cannot think of a way to optimize this without compromising the "randomness" of the function. I thought about keeping an array of moves the computer attempted but that would not resolve the problem because that would not affect the amount of times rand() generated duplicate numbers. Here is the code for this function which is all that is really relevant to this issue:
//Function which handles the AI making a random move requires a board
//object to test moves legality and player object to make a move with
//both are passed by reference because changes to board state and the player's
//evaluation arrays must be saved
char aiRandMove(Player &ai, Board &game){
int tryX;
int tryY; //Variables to store computer's attempted moves
bool moveMade = false;
char winner;
while(!moveMade){
srand(time(NULL));//Randomizes the seed for rand()
tryX = rand() % 3;
tryY = rand() % 3; //coordinates are random numbers between X and Y
cout << "Trying move " << tryX << ", " << tryY << endl;
if(game.isLegalMove(tryX, tryY)){
winner = game.makeMove(tryX, tryY, ai);
moveMade = true;
}
}
return winner;
}
I have also tried moving the seed function out of the while loop (this was put inside the while to "increase randomness" even though that is something of a logical folly and this has also not improved results.
If all else fails I may just label this method "Easy" and only have random moves until I can tell if I need to block or make the winning move. But perhaps there are other random functions which may assist in this endeavor. Any and all thoughts and comments are more than appreciated!
You need to remove the invalid moves from the equation, such as with the following pseudo-code, using an array to collect valid moves:
possibleMoves = []
for each move in allMoves:
if move is valid:
add move to possibleMoves
move = possibleMoves[random (possibleMoves.length)]
That removes the possibility that you will call random more than once per attempted move since all possibilities in the array are valid.
Alternatively, you can start the game with all moves in the possibleMoves array and remove each possibility as it's used.
You also need to learn that it's better to seed a random number generator once and then just use the numbers it generates. Seeding it with time(0) every time you try to get a random number will ensure that you get the same number for an entire second.
Given that there is only at most 9 choices, even using your random picking, this would not cause a long delay. What is causing the long delay is calling srand inside the loop. This is causing your program to get the same random numbers for the duration of a second. The loop is probably being executed millions of times in that second (or would be without the cout call)
Move the srand call outside of the loop (or better yet, just call it once at the start of your program).
That is not to say you shouldn't look at ways of removing the unavailable moves from the random selection, as it may make a difference for other types of games.
You could reduce that to very acceptable levels by creating a list of free coordinates and getting a random index in that collection. Conceptually:
#include <vector>
struct tictactoe_point
{
int x, y;
};
vector<tictactoe_point> legal_points;
tictactoe_point point;
for (point.x = 0; point.x < 3; point.x++)
{
for (point.y = 0; point.y < 3; point.y++)
{
if (game.isLegalMove(point.x, point.y))
{
legal_points.push_back(point);
}
}
}
point = legal_points[rand() % legal_points.size()];
game.makeMove(point.x, point.y, ai);
moveMade = true;
This solution is not optimal, but it's a significant improvement: now, the time it takes to make a move is fully predictable. This algorithm will complete with one single call to rand.
The fact that you call srand each time you pick a number makes the process even slower, but then again, the major problem is that your current solution has to try over and over again. It's not bounded: it may even never complete. Even if srand is considerably slow, if you know that it'll run just one time, and not an indefinite number of times, it should be viable (though not optimal either).
There are many ways to improve on this:
Keep a list of valid coordinates to play, and remove the coordinates when either the player or the AI plays it. This way you don't have to rebuild the list at every turn. It won't make a big difference for a tic-tac-toe game, but it would make a big difference if you had a larger board.
Use the standard C++ random function. This isn't really an algorithm improvement, but rand() in C is pretty crappy (I know, I know, it's a long video, but this guy really really knows his stuff).
The reason why it seems slower every move is because the AI is picking moves that have already been made so it randomly re-picks either another illegal move(Could be recurring) or it picks the correct square.
To speed this part of your program up you could have a collection(eg linkedlist) that contains the positions, use your random function over this list. When a move is picked by you or the AI remove the element from the list.
This will remove the recurring process of the AI picking the same squares.

A clean way to render things

I don't really have any problems with the way I'm rendering now, but I don't feel like it's a very good way of handling rendering. I'm using SDL.
It boils down to this I have some abstract class
class Renderable
With two functions.
virtual void update() = 0;
virtual void doRender(SDL_Surface* surface) = 0;
I have another class
class RenderManager
With 1 std::vector
std::vector<Renderable*> _world;
and 2 std::queue
std::queue<Renderable*> _addQueue;
std::queue<Renderable*> _delQueue;
The two queues hold the renderables that need to be added in the next tick and the ones that need to be removed. Doing everything in one shot gave me problems and now that I think about it, it makes sense (at least the way I did it).
Renderables can add and remove themselves from the RenderManager statically.
Here's more or less the function handling everything.
void renderAll() {
std::vector<Renderable*>::iterator begin, end;
begin = _world.begin();
end = _world.end();
for (;begin != end; ++begin) {
(*begin)->update();
(*begin)->doRender(_mainWindow); // _mainWindow is the screen of course
}
begin = world.begin();
if (_delQueue.size() > 0) {
for (unsigned int i = 0; i < _delQueue.size(); i++) {
std::vector<Renderable*>::iterator del;
del = std::find(begin, end, _delQueue.front());
if (del != end) {
delete *del;
_world.erase(del);
}
_delQueue.pop();
}
}
if (_addQueue.size() > 0) {
for (unsigned int i = 0; i < _addQueue.size(); i++) {
Renderable* front = _addQueue.front();
// _placement is a property of Renderable calculated by RenderManager
// where they can choose the level they want to be rendered on.
_world.insert(begin + front->_placement, front);
_addQueue.pop();
}
}
}
I'm kinda sorta newish to C++, but I think I know my way around it on an average scale at least. I'm even newer to SDL, but it seems pretty simple and easy to learn. I'm concerned because I have 3 big loops together. I tried one shotting it but I was having problems with _world resizing during the loop causing massive amounts of destruction. But I'm not claiming I did it right! :)
I was thinking maybe something involving threads?
EDIT:
Ahh, sorry for the ambiguity. By "cleaner" I mean more efficient. Also there is no "problem" with my approach, I just feel there's a more efficient way.
Firstly, I'd say don't fix something which isn't broken. Are you experiencing performance issues? Unless you're adding and removing 'renderables' in huge quantities every frame, I can't see a huge problem with what you have. Of course, in terms of an overall application it could be a clumsy design, but you haven't stated what sort of application this is for, so it's hard, if not impossible to judge.
However, I can guess and say that because you're using SDL, there's a chance you're developing a game. Personally I've always rendered game objects by having a render method for each active object and use an object manager to cycle through pointers to each object every tick and call this render method. Because constantly removing an item from the middle of a vector can cause slowdowns due to internal copying of memory (vectors guarantee contiguous memory), you could have a flag in every object which is set when it is meant to be removed, and periodically the object manager performs 'garbage collection', removing all objects with this flag set at the same time, hence reducing the amount of copying that needs to be done. In the mean time before garbage collection occurs, the manager simply ignores the flagged object, not calling its render method each tick - it's as if it has gone. It's actually not too dissimilar to what you have here with your queue system, in fact if game objects are derived from your 'renderable' class it could be deemed the same.
By the way is there any reason you're querying the queue sizes before accessing their elements? If size() is 0, the for loops won't operate anyway.

Std::vector fill time goes from 0ms to 16ms after a certain threshold?

Here is what I'm doing. My application takes points from the user while dragging and in real time displays a filled polygon.
It basically adds the mouse position on MouseMove. This point is a USERPOINT and has bezier handles because eventually I will do bezier and this is why I must transfer them into a vector.
So basically MousePos -> USERPOINT. USERPOINT gets added to a std::vector<USERPOINT> . Then in my UpdateShape() function, I do this:
DrawingPoints is defined like this:
std::vector<std::vector<GLdouble>> DrawingPoints;
Contour[i].DrawingPoints.clear();
for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
SetCubicBezier(
Contour[i].UserPoints[x],
Contour[i].UserPoints[x + 1],
i);
SetCubicBezier() currently looks like this:
void OGLSHAPE::SetCubicBezier(USERFPOINT &a,USERFPOINT &b, int &currentcontour )
{
std::vector<GLdouble> temp(2);
if(a.RightHandle.x == a.UserPoint.x && a.RightHandle.y == a.UserPoint.y
&& b.LeftHandle.x == b.UserPoint.x && b.LeftHandle.y == b.UserPoint.y )
{
temp[0] = (GLdouble)a.UserPoint.x;
temp[1] = (GLdouble)a.UserPoint.y;
Contour[currentcontour].DrawingPoints.push_back(temp);
temp[0] = (GLdouble)b.UserPoint.x;
temp[1] = (GLdouble)b.UserPoint.y;
Contour[currentcontour].DrawingPoints.push_back(temp);
}
else
{
//do cubic bezier calculation
}
So for the reason of cubic bezier, I need to make USERPOINTS into GlDouble[2] (since GLUTesselator takes in a static array of double.
So I did some profiling. At ~ 100 points, the code:
for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
SetCubicBezier(
Contour[i].UserPoints[x],
Contour[i].UserPoints[x + 1],
i);
Took 0 ms to execute. then around 120, it jumps to 16ms and never looks back. I'm positive this is due to std::vector. What can I do to make it stay at 0ms. I don't mind using lots of memory while generating the shape then removing the excess when the shape is finalized, or something like this.
0ms is no time...nothing executes in no time. This should be your first indicator that you might want to check your timing methods over timing results.
Namely, timers typically don't have good resolution. Your pre-16ms results are probably just actually 1ms - 15ms being incorrectly reported at 0ms. In any case, if we could tell you how to keep it at 0ms, we'd be rich and famous.
Instead, find out which parts of the loop take the longest, and optimize those. Don't work towards an arbitrary time measure. I'd recommend getting a good profiler to get accurate results. Then you don't need to guess what's slow (something in the loop), but can actually see what part is slow.
You could use vector::reserve() to avoid unnecessary reallocations in DrawingPoints:
Contour[i].DrawingPoints.reserve(Contour[i].size());
for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x) {
...
}
If you actually timed the second code snippet only (as you stated in your post), then you're probably just reading from the vector. This means, the cause can not be the re-allocation cost of the vector. In that case, it may due to cache issues of the CPU (i.e. the small datasets can be read in lightning speed from cpu cache, but whenever the dataset is larger than the cache [or when alternately reading from different memory locations], the cpu has to access ram, which is distinctly slower than cache access).
If the part of the code, which you profiled, appends data to the vector, then use std::vector::reserve() with an appropriate capacity (number of expected entries in vector) before filling it.
However, regard two general rules for profiling/benchmarking:
1) Use time measurement methods with high resolution precision (as others stated, the resolution of your timer IS too low)
2) In any case, run the code snippet more than once (e.g. 100 times), get the total time of all runs and divide it by number of runs. This will give you some REAL numbers.
There's a lot of guessing going on here. Good guesses, I imagine, but guesses nevertheless. And when you try to measure the time functions take, that doesn't tell you how they take it. You can see if you try different things that the time will change, and from that you can have some suggestion of what was taking the time, but you can't really be certain.
If you really want to know what's taking the time, you need to catch it when it's taking that time, and find out for certain what it's doing. One way is to single-step it at the instruction level through that code, but I suspect that's out of the question. The next best way is to get stack samples. You can find profilers that are based on stack samples. Personally, I rely on the manual technique, for the reasons given here.
Notice that it's not really about measuring time. It's about finding out why that extra time is being spent, which is a very different question.