std::map performance c++ - c++

I have a problem with std::map performance. In my C++ project I have a list of GUIObjects which also includes Windows. I draw everything in for loop, like this:
unsigned int guiObjectListSize = m_guiObjectList.size();
for(unsigned int i = 0; i < guiObjectListSize; i++)
{
GUIObject* obj = m_guiObjectList[i];
if(obj->getParentId() < 0)
obj->draw();
}
In this case when I run a project, it works smoothly. I have 4 windows and few other components like buttons etc.
But I would like to take care of drawing windows separately, so after modifications, my code looks like this:
// Draw all objects except windows
unsigned int guiObjectListSize = m_guiObjectList.size();
for(unsigned int i = 0; i < guiObjectListSize; i++)
{
GUIObject* obj = m_guiObjectList[i];
if((obj->getParentId() < 0) && (dynamic_cast<Window*>(obj) == nullptr))
obj->draw(); // GUIManager should only draw objects which don't have parents specified
// And those that aren't instances of Window class
// Rest objects will be drawn by their parents
// But only if that parent is able to draw children (i.e. Window or Layout)
}
// Now draw windows
for(int i = 1; i <= m_windowList.size(); i++)
{
m_windowList[i]->draw(); // m_windowList is a map!
}
So I created a std::map<int, Window*>, because I need z-indexes of Windows to be set as keys in a map. But the problem is that when I run this code, it's really slow. Even though I have only 4 windows (map size is 4), I can see that fps rate is very low. I can't say an exact number, because I don't have such counter implemented yet.
Could anyone tell me why this approach is so slow?

This is what virtual functions are for. Not only do you eliminate the slow dynamic_cast, but you get a more flexible type check.
// Draw all objects except windows
unsigned int guiObjectListSize = m_guiObjectList.size();
for(unsigned int i = 0; i < guiObjectListSize; i++)
{
GUIObject* obj = m_guiObjectList[i];
if(obj->getParentId() < 0)
obj->drawFirstChance();
}
// Now draw windows
for(int i = 1; i <= m_windowList.size(); i++)
{
m_windowList[i]->drawSecondChance();
}
Where drawFirstChance doesn't do anything for windows and other floating objects.
The next optimization opportunity is to make the window list a vector and perform z-order sorting only when it changes (assuming windows are created/destroyed/reordered much less often than they are drawn).

The problem with this code doesn't seem to be with the use of std::map. Instead, the bottleneck is rather the use of dynamic_cast, which is a very expensive operation as it needs to wander through the inheritance tree of the given class.
This tree is most likely rather large for your GUI components, which would definitely explain why doing so in each iteration slows down the approach as a whole.

Related

How do I keep objects that meet a specific criteria in a vector, and move them out otherwise?

I am making a 2D game in SDL2 with C++. I have made some simple terrain generation so that you can walk on an endless amount of world tiles. Right now I iterate through all of the tiles that resembles the world, check if the current one is on the screen and then render it. But since I am iterating through all of them this might get slow if you explore alot. And when I add enemies I don't want to iterate through all of them either.
This is why I want to have one vector containing all the tiles that are visible (on the screen) and one with all the tiles. This might seem useless since I will still probably need to iterate through all of the tiles one time, but I want to do it like this because some functions can then more easily be kept to the tiles on the screen.
So how do I have a vector full of objects that meet a certain criteria, and only then?
This is something I have right now that resembles the part where I tried to solve this problem. It uses pointers to check whether the element is in the list of visible tiles (a vector of pointers), but with this method I have problems checking conditions on them later.
for (int i = 0; i < tiles.size(); ++i) {
tiles[i].update(camera_x, camera_y);
if (tiles[i].onScreenX(winW, winH, 100)) {
if (tiles[i].onScreen == false) {
tiles[i].onScreen = true;
tilesOnScreen.push_back(&tiles[i]);
}
}
else if (tiles[i].onScreen == true) {
for (int o = 0; o < tilesOnScreen.size(); ++o) {
if (&tiles[i] == tilesOnScreen[o]) {
tiles[i].onScreen = false;
tilesOnScreen.erase(tilesOnScreen.begin() + i);
}
}
}
}

C++ improving palette indexing algorithm

I have a game engine that indexes colors of some bitmaps which allows using some of the crazy effects of olde (color strobing etc.). Sadly, the indexing algorithm is neither slow nor fast, but since the spritesheets these days are gigantic it really adds up. Currently, loading a single large spritesheet can take 150+ milliseconds, which is an eternity, relatively speaking.
This is the algorithm:
auto& palette = p->pal; // vector
auto& lookup = p->lookup; // vector_map
palette.reserve(200); // There are on average ~100 unique colors
palette.push_back(0); // Index zero is the blank color
uint32_t lastColor = 0;
uint32_t lastPalette = 0;
for (size_t i = 0; i < pixels; i++)
{
const auto color = data[i];
if (color == lastColor)
{
data[i] = lastPalette;
continue;
}
else if (color == 0)
{
continue;
}
uint32_t j = 0;
const auto& it = lookup.find(color);
if (it != lookup.end()) {
j = it->second;
}
else
{
j = palette.size();
palette.push_back(color);
lookup.emplace(color, j);
}
lastColor = color;
// Write the index back to the bitmap:
// (this is just a GPU texture encoding, don't mind it)
data[i] = (j & 255) | ((j >> 8) << (8 + 6));
lastPalette = data[i];
}
The base algorithm is fairly straight-forward:
Go through each pixel, find or create an entry for it (the color), write the index back to the image.
Now, can you parallelize this? Probably not. I have tried with OMP and regular threads. It's simply not going to be fast because regardless of how much time you save by going through each portion of the image separately, at the end you have to have a common set of indexes that apply throughout the whole image, and those indexes have to be written back to the image. Sadly, finding the unique colors first and then writing back using parallelization is also slower than doing it once, sequentially. Makes sense, doesn't it?
Using a bitset has no function here. Knowing whether a color exists is useful, but the colors are 32-bit, which makes for 2^32 bits (aka. 530MB). In contrast, 24-bits is only ~2MB, which might be a micro-optimization. I'm not really looking for that anyway. I need to cut the time by 10x.
So, any ideas? Would it be possible to process 4 or 8 colors at the same time using SSE/AVX?

Issues turning loaded meshes into cloth simulation

I'm having a bit of issue trying to get meshes I import into my program to have cloth simulation physics using a particle/spring system. I'm kind of a beginner into graphics programming, so sorry if this is super obvious and I'm just missing something. I'm using C++ with OpenGL, as well as Assimp to import the models. I'm fairly sure my code to calculate the constraints/springs and step each particle is correct, as I tested it out with generated meshes (with quads instead of triangles), and it looked fine, but idk.
I've been using this link to study up on how to actually do this: https://nccastaff.bournemouth.ac.uk/jmacey/MastersProjects/MSc2010/07LuisPereira/Thesis/LuisPereira_Thesis.pdf
What it looks like in-engine: https://www.youtube.com/watch?v=RyAan27wryU
I'm pretty sure it's an issue with the connections/springs, as the imported model thats just a flat plane seems to work fine, for the most part. The other model though.. seems to just fall apart. I keep looking at papers on this, and from what I understand everything should be working right, as I connect the edge/bend springs seemingly correctly, and the physics side seems to work from the flat planes. I really can't figure it out for the life of me! Any tips/help would be GREATLY appreciated! :)
Code for processing Mesh into Cloth:
// Container to temporarily hold faces while we process springs
std::vector<Face> faces;
// Go through indices and take the ones making a triangle.
// Indices come from assimp, so i think this is the right thing to do to get each face?
for (int i = 0; i < this->indices.size(); i+=3)
{
std::vector<unsigned int> faceIds = { this->indices.at(i), this->indices.at(i + 1), this->indices.at(i + 2) };
Face face;
face.vertexIDs = faceIds;
faces.push_back(face);
}
// Iterate through faces and add constraints when needed.
for (int l = 0; l < faces.size(); l++)
{
// Adding edge springs.
Face temp = faces[l];
makeConstraint(particles.at(temp.vertexIDs[0]), particles.at(temp.vertexIDs[1]));
makeConstraint(particles.at(temp.vertexIDs[0]), particles.at(temp.vertexIDs[2]));
makeConstraint(particles.at(temp.vertexIDs[1]), particles.at(temp.vertexIDs[2]));
// We need to get the bending springs as well, and i've just written a function to do that.
for (int x = 0; x < faces.size(); x++)
{
Face temp2 = faces[x];
if (l != x)
{
verticesShared(temp, temp2);
}
}
}
And heres the code where I process the bending springs as well:
// Container for any indices the two faces have in common.
std::vector<glm::vec2> traversed;
// Loop through both face's indices, to see if they match eachother.
for (int i = 0; i < a.vertexIDs.size(); i++)
{
for (int k = 0; k < b.vertexIDs.size(); k++)
{
// If we do get a match, we push a vector into the container containing the two indices of the faces so we know which ones are equal.
if (a.vertexIDs.at(i) == b.vertexIDs.at(k))
{
traversed.push_back(glm::vec2(i, k));
}
}
// If we're here, if means we have an edge in common, aka that we have two vertices shared between the two faces.
if (traversed.size() == 2)
{
// Get the adjacent vertices.
int face_a_adj_ind = 3 - ((traversed[0].x) + (traversed[1].x));
int face_b_adj_ind = 3 - ((traversed[0].y) + (traversed[1].y));
// Turn the stored ones from earlier and just get the ACTUAL indices from the face. Indices of indices, eh.
unsigned int adj_1 = a.vertexIDs[face_a_adj_ind];
unsigned int adj_2 = b.vertexIDs[face_b_adj_ind];
// And finally, make a bending spring between the two adjacent particles.
makeConstraint(particles.at(adj_1), particles.at(adj_2));
}
}

C++ Function not working as expected

So I know this is a very broad topic, but I'm not sure how to describe it and I'm not sure where the bug is. So I'm making a game in the console window, a roguelike-rpg, (I haven't done the random dungeon yet, but I've done it in other languages.) and I'm having problems dealing with walls.
I have a function called placeMeeting(REAL X, REAL Y) that I use to check for collisions, but it appears to be returning bad values and I couldn't tell you why. I have couple of macros defined: #define AND && and #define REAL double.
Here is the function:
bool GlobalClass::placeMeeting(REAL X, REAL Y)
{
//The return value -- False until proven otherwise
bool collision = false;
//Loop through all walls to check for a collision
for(int i = 0; i < wallCount; i++)
{
//If there was a collision, 'say' so
if (X == wallX[ i ] AND Y == wallY[ i ])
{
//Set 'collision' to true
collision = true;
}
}
return collision;
}
But the strange catch is that it only doesn't work when displaying the screen. The player collides with them all the same even though there not displayed. Even stranger, only the first wall is being displayed.
Here is where the walls are defined:
int wallCount;
//Array of walls
REAL wallX[ 1 ];
REAL wallY[ 1 ];
and
wallCount = 1;
//Basic wall stuff; basically just a placeholder
wallX[ 0 ] = 10;
wallY[ 0 ] = 10;
So I have a function used to render the screen (In the console window of course.) and it looks like this:
for (int y = oGlobal.viewY; y < oGlobal.viewY + oGlobal.viewHeight; y++)
{
//The inner 'x' loop of the view
for(int x = oGlobal.viewX; x < oGlobal.viewX + oGlobal.viewWidth; x++)
{
//Call the function to check this spot and print what it returns
screen += oGlobal.checkSpot(x, y);
}
}
That's not the whole function, just the actual screen refreshing. After 'screen' is printed to the screen, to reduce buffer time. And of course, checkSpot:
STRING GlobalClass::checkSpot(REAL x, REAL y)
{
STRING spriteAtSpot;
//First check for the player
if (x == oPlayer.x AND y == oPlayer.y)
{
spriteAtSpot = oPlayer.sprite;
}
else if (placeMeeting(x, y)) //ITS TEH WALL SUCKAS
{
spriteAtSpot = WALL_SPRITE;
}
else //Nothing here, return a space
{
spriteAtSpot = EMPTY_SPRITE;
}
//Return the sprite
return spriteAtSpot;
}
I know it's a lot of code, but I really don't know where I screwed up.
I really appreciate any help!
P.S. Here is an image to help understand
http://i.imgur.com/8XnaHIt.png
I'm not sure if I'm missing something, but since rogue-like games are tile-based, is it necessary to make the X and Y values doubles? I remember being told that doubles are finicky to compare, since even if you assume they should be equal, they could be very slightly off, causing comparison to return false when you'd think it would return true.
I'm not sure we have enough of your code to debug it, but I have developed a Rogue-like console game, and here is my $.02...
Start over. You seem to be doing this in a very non-OO way (GlobalClass?). Consider objects such as Level (aggregates entire level), DungeonObject (essentially each space on the level; it's a base class that can be inherited from into Wall, Player, etc.). Doing this will make the programming much easier.
Embrace the suck. C++ syntax may suck, but the more you fight against it, the harder it will be to learn. Use && and the built-in datatypes. It won't take long to get used to.
Rouge-like locations are essentially integer-based. Use integer for x, y locations, not doubles (the biggest built-in data-type). Not only is it more efficient, you'll find debugging much easier.
Start in the small. Start with a 5 x 5 dungeon level to get the basics down. Then, if you've designed it correctly, scaling up to a 10x10 or 25x25 will be much easier.
That's how I developed my game; I hope it helps.
Apart from the use of double instead of int, I see something strange in your definition of walls:
int wallCount;
//Array of walls
REAL wallX[ 1 ];
REAL wallY[ 1 ];
and
wallCount = 1;
//Basic wall stuff; basically just a placeholder
wallX[ 0 ] = 10;
wallY[ 0 ] = 10;
You are defining a variable called wallCount, which you later use to go through the elements of your array in your placeMeeting function:
//Loop through all walls to check for a collision
for(int i = 0; i < wallCount; i++)
Then why don't you use wallCount to define the size of your arrays? Of course you can't use that syntax, because the size of a static array must be known at compile time, so you should either use new or std::vector, but still you shouldn't have a variable that defines the length of the array and then use another value when you actually create the array, it is a source of bugs if you fail to keep them aligned. So for example you could do this:
const int wallCount = 1;
int* wallX = new int[wallCount];
int* wallY = new int[wallCount];
But there's a bigger problem: why are you creating arrays of size 1? You are having only one wall! It doesn't really make sense to have arrays of size 1, unless you intend to use another value but you have reduced it to 1 for debugging purposes. But, you wrote this:
Even stranger, only the first wall is being displayed.
That's because you only have 1 wall!
By the way, the way you have designed your data isn't the one I would use. From your checkSpot I understand this: oPlayer.x and oPlayer.y are the coordinates of your player, and x and y are the coordinates of the tile you have to draw (and for which you need to choose the appropriate sprite). If in your map you have 3 walls, you have to put 3 values in wallX and 3 in wallY, and you must make sure that you keep the 2 arrays "aligned" (if the coordinates of your second wall are for example x=10 and y=20, you could get confused, or have buggy code, and instead of saving it as
wallX[1] = 10;
wallY[1] = 20;
you might write
wallX[1] = 10;
wallY[2] = 20; // wrong index!
so it's one more source of bugs), and worse, you must check that they are consistent with other arrays of other objects: you could have, for example, doors, and then following your approach you'd have doorX[] and doorY[], and how can you be sure that you don't have a wall and a door at the same place? Like, if you had
doorX[0] = 10;
doorY[0] = 20;
it would be at the same place as the wall, and the error isn't obvious, because you'd have to cross-check all your arrays to find it. So I would suggest to have a level[height][width] instead, and to have a wall at x=10 and y=20 you could use level[10][20] = 'w';. This would ensure that you only have ONE object per tile. Besides, checking for collisions would be faster: with your approach, if you have 50 walls you need 50 checks; with mine, you always only need one. Ok, performance is certainly not an issue in these games, but still I think you should consider my approach (unless there are other reasons to prefer yours, of course).

Memory increase rapidly using gluTess*, display list and frame buffer

I'm programming with OpenGL under MSVC 2010.
One of my goal is to pick objects in the scene. I design it in the way like assigning each object a unique color, rendering them in a framebuffer, then reading the color where the cursor is, and the corresponding object can be acquired.
Now the picking is working well. However, as long as a picking happens, the memory increases rapidly. In detail, the following code render objects into a framebuffer:
for (unsigned i = 0; i < objects.size(); ++i)
{
//some code computing color;
Color color;
for (unsigned j = 0; j < objects[i].listOfPrimitives.size(); ++j)
{
objects[i].listOfPrimitives[j]->color = color;
}
objects[i].Render();
for (unsigned j = 0; j < objects[i].listOfPrimitives.size(); ++j)
{
objects[i].listOfPrimitives[j]->color = colorStorage[i][j];
}
}
where objects are objects to be rendered. Since every object has a certain number of primitives(which may be a cylinder, sphere etc.), this piece of code just changes the color of each object's primitives to a unique computed one, render the object, then change it back (colorSotrage stores the original colors). And there are some code in the following to deal with the object, which I'm sure has nothing to do with this issue.
The render method are implemented as following for most object:
glColor3ub(color[0], color[1], color[2]);
glBegin(GL_TRIANGLES);
for (unsigned i = 0; i < mesh.faces.size(); ++i)
{
glNormal3d(mesh.faces[i].normal.x, mesh.faces[i].normal.y, mesh.faces[i].normal.z);
for (unsigned j = 0; j < 3; ++j)
{
glVertex3d(mesh.vertices[mesh.faces[i].verts[j]].x,
mesh.vertices[mesh.faces[i].verts[j]].y,
mesh.vertices[mesh.faces[i].verts[j]].z);
}
}
glEnd();
But for some object, there are some concave polygons (even with holes), so I use the gluTess* group functions in GLU to render them, and to speed up the rendering procedure, I use display list to that part.
Now, as I've mentioned. this picking procedure increases the memory cost rapidly. There are two more phenomenons I can't explain:
If I comment line 8 in the first piece of code, the memory will not change at all when the piece of code runs (of course, this code will not work);
After the memory increases, if I do some refresh to the scene (I design an interactive trackball), the memory will drop off again.
So I'm wondering which part could be the reason of this issue? The display list? the gluTess*() calling? or even something related to framebuffer?