I'm implementing a tile engine for games using C++. Currently the game is divided into maps, each map has a 2D grid of sprites where each represents a tile.
I am coding a system where if several maps are adjacents you can walk from one to the other.
At startup of the game, all the maps are instancied but are "unloaded" ie the sprites objects are not in memory. When I'm close enough of an adjacent map, the maps sprites are "loaded" in memory by basically doing:
for(int i=0; i < sizeX; i++) {
for(int j=0; j < sizeY; j++) {
Tile *tile_ptr = new Tile(tileset, tilesId[i][j], i + offsetX, j + offsetY);
tilesMap[i][j] = tile_ptr;
}
}
And they are unloaded by being destroyed the same way when I am too far away from the map.
For a 50x50 map of sprites of 32x32 pixels, it takes me roughly 0.3 secs to load or unload which is done during 1 frame. My question is: what is a more efficient way to load/unload maps dynamically, even using a totally different mechanism? thanks
PS : I'm using SFML as a graphic library but I'm not sure this changes anything
A different possibility to improve latency, but will increase overall number of ops needed:
Instead of waiting when you are 'too close' or 'too far' from a map, store in memory the maps for a bigger square around the player [i.e. if the map is 50x50, store 150x150], but show only the 50x50. now, every step - calculate the new 150x150 map, it will require 150 destroy ops, and 150 build ops in each step.
By doing so, you will actually need to calculate and build/destroy elements more times! But, latency will improve, since you don't need to wait 0.3 secs for building 2,500 elements, since you always need a small portion: 150*2 = 300 elements.
I think it's a perfect occasion to learn multithreading and asynchronous calls.
It can seem complex if you're new to it but it's a very useful skill to have.
It will still take 0.3sec to load (well, a bit more actually), but the game will not freeze.
That's what most games do. You can search SO for the various ways to do it in C++.
Related
I'm doing some tutorials for OpenFrameworks (i'm kind of a noob when it comes to coding but have a bit of experience so far w/ tutorials and learning what's going on and stuff over the past few years) and a major part of the code involves grabbing the sound spectrum of an audio sample and throwing the values into an array to control a float value. But I can't seem to wrap my head around what's going on here.
This is the relevant code (it's a VJ shaper that rotates and changes the size of shapes according to input from the sound spectrum):
header:
float * fftSmooth;
int bands;
cpp setup:
fftSmooth = new float[8192];
for (int i = 0; i < 8192; i++) {
fftSmooth[i] = 0;
}
bands = 64;
cpp update:
float * value = ofSoundGetSpectrum(bands);
for (int i = 0; i < bands; i++) {
fftSmooth[i] *= release; //"release" is a float
if (fftSmooth[i] < value[i]) {
fftSmooth[i] = value[i];
}
}
if anyone could walk me through the steps of what's going on, that would be great. I understand (sort of) that in the setup, an array called "fftSmooth" is being created, with 8192 floats in it, then being filled with zeros in the for loop after which the int "bands" is being assigned a value of 64. Then in the update, another array called "value" is being created with 64 floats in it by looking at "bands", which is also the number of bands in ofSoundGetSpectrum, which is grabbing the frequency levels from a sound file as it plays. I've looked at the openframeworks reference page for the sound spectrum thing and didn't really get any more clues as to what it's doing in this context, and i have no idea what the for loops and if statements in the update section are doing either.
Not knowing what's going on really isn't going to impact whether i can actually use the code or not, but i feel like if i want to actually build on this code (grabbing different frequency ranges etc) i need to know what the for loops and if statements in the update are doing.
ofSoundGetSpectrum(...)
Gets a frequency spectrum sample, taking all current sound players into account.
Each band will be represented as a float between 0 and 1.
This appears to be taking an instantaneous FFT, and returning the "strength" of each of the frequency bands.
I assume the second half of the code is run in a loop. The first time through, it is just going to copy the current band strength into fftSmooth. In subsequent passes, the multiply by release is designed to reduce the value in fftSmooth by some percentage. Then any new band strength greater than the filtered one will overwrite the old value.
If you animate plots of fftSmooth, you should get an image like this (minus the color) :
I'm a new user to openmp, and I try to parallelize a part of my program and use locks for a 2 dimentional array. I won't go over all the details of my real problem, but instead discuss the following simplified example:
Let's say I have a huge group of N>>1 particles, which their x,y locations is stored in some data structure . I want to create a 2d counter array that will represent a grid, and count all the particles in each cell of the grid:
count[j][k]+=1 (for the corresponding j,k of each particle, according to his x,y values)
Using locks is a natural choice (The 2D array is on the scale of 100x100 and above , so when you have a 20 cores machine, the chances of 2 threads updating an array element simultaneously is still pretty low). The relevant parts of the code:
//define a 2d lock:
omp_lock_t **count_lock;
count_lock = new omp_lock_t* [J+1];
for(j=0;j<=J;j++) {
count_lock[j]=new omp_lock_t[K+1];
memset(count_lock[j],0,(K+1)*sizeof(omp_lock_t));
}
//initializing:
for(j=0;j<=J;j++)
for(k=0;k<=K;k++){
omp_init_lock(&(count_lock[j][k]));
omp_unset_lock(&(count_lock[j][k]));
}
//using the lock (after determining the right j,k):
omp_set_lock(&(count_lock[j][k]));
count[j][k] += 1;
omp_unset_lock(&(count_lock[j][k]));
//destroying the lock:
for(i=0;i<=J;i++)
for(j=0;j<=K;j++)
omp_destroy_lock(&(count_lock[i][j]));
for (j=0; j<=J; j++)
delete[] count_lock[j];
delete[] count_lock;
The particles are divided into groups of 500 particles. each group is a linked list of particles, and all the particle groups are also forming a linked list. The paralellization comes from parallelizing the "for" loop that goes over the particle groups.
For some reason I can't seem to get it right...no improvement in performance is obtained, and the simulation get stuck after a few iterations.
I tried using "atomic" but it gave even worse performance than the serial code. Another option I tried is to create a private 2D array for each thread, and then sum them up. I got some improvement that way, but it is pretty costly and I hope there is a better way.
Thanks!
I am building a vision system which can count boxes moving on a variable speed conveyor belt.
Using open_cv and c++, I could separate the blobs and extract the respective centroids.
Now I have to increment the count variable, if the centroid crosses the cutoff boundary line.
This is where I am stuck. I tried 2 alternatives.
Fixing a rectangular strip where a centroid would stay for only one single frame
But since the conveyor is multi speed, I could not fix a constant boundary value.
I tried something like
centroid_prev = centroid_now;
centroid_now = posX;
if (centroid_now >= xLimit && centroid_prev < xLimit)
{
count++;
}
This works fine if just a single box is present on the conveyor.
But for 2 or more blobs in same frame, I do not know how to handle using arrays for contours.
Can you please suggest a simple counting algorithm which can compare
blob properties between previous frame and current frame even if
multiple blobs are present per frame?
PS. Conveyor speed is around 50 boxes/second, so a lightweight algorithm will be very much appreciated else we may end up with a lower frame rate.
Assuming the images you pasted are representative, you can easily solve this by doing some kind of tracking.
The simplest way that comes to mind is to use goodFeaturesToTrack and calcOpticalFlowPyrLK to track the motion of the conveyor.
You'll probably need to do some filtering on the result, but I don't think that would be difficult, as the motion and images are very low in noise.
Once you have that motion, you can calculate for each centroid when it moved beyond a certain X threshold and count it.
With a low number of corners (<100) such as in this image, it should be fast.
Have you tried matching the centroid coordinates from the previous frame with the centroids from the new frame? You can use OpenCV's descriptor matchers for that. (The code samples all match feature vectors, but there's no reason why you shouldn't use them for coordinate matching.)
If you're worried about performance: matching 5-10 coordinate centers should be orders of magnitudes faster than finding blobs in an image.
This is the algorithm for arrays. It's just an extension of what you are doing - you can adjust the specifics
for(i=0; i<centroid.length; i++)
centroid_prev[i] = centroid[i].posX;
for(frame j=0 to ...) {
... recompure centroids
for(i=0; i<centroid.length; i++) {
centroid_now = centroid[i].posX;
if (centroid_now >= xLimit && centroid_prev[i] < xLimit)
{
count++;
}
}
for(i=0; i<centroid.length; i++)
centroid_prev[i] = centroid[i].posX;
}// end j
--
If the objects can move about (and they look about the same) you need to add additional info such as color to locate the same objects.
I'm creating a game in Qt in c++, and I store every coordinate of specific size into a vector like :
std::vector<std::unique_ptr<Tile>> all_tiles = createWorld(bgTile);
for(auto & tile : all_tiles) {
tiles.push_back(std::move(tile));
}
Each level also has some healthpacks which are stored in a vector aswell.
std::vector<std::unique_ptr<Enemy>> all_enemies = getEnemies(nrOfEnemies);
for(auto &healthPackUniquePtr : all_healthpacks) {
std::shared_ptr<Tile> healthPackPtr{std::move(healthPackUniquePtr)};
int x = healthPackPtr->getXPos();
int y = healthPackPtr->getYPos();
int newYpos=checkOverlapPos(healthPackPtr->getXPos(),healthPackPtr->getYPos());
newYpos = checkOverlapEnemy(healthPackPtr->getXPos(),newYpos);
auto healthPack = std::make_shared<HealthPack>(healthPackPtr->getXPos(), newYpos, healthPackPtr->getValue());
healthPacks.push_back(healthPack);
}
But know I'm searching for the fastest way to check if my player position is at an healthpack position. So I have to search on 2 values in a vector : x and y position. Anyone a suggestion how to do this?
Your 'real' question:
I have to search on 2 values in a vector : x and y position. Anyone a
suggestion how to do this?"
Is a classic XY question, so I'm ignoring it!
I'm searching for the fastest way to check if my player position is at
an healthpack position.
Now we're talking. The approach you are using now won't scale well as the number of items increase, and you'll need to do something similar for every pair of objects you are interested in. Not good.
Thankfully this problem has been solved (and improved upon) for decades, you need to use a spacial partitioning scheme such as BSP, BVH, quadtree/octree, etc. The beauty of the these schemes is that a single data structure can hold the entire world in it, making arbitrary item intersection queries trivial (and fast).
You can implement a callback system. Then a player moves a tile, fire a callback to that tile which the player is on. Tiles should know its state and could add health to a player or do nothing if there is nothing on that tile. Using this technique, you don`t need searching at all.
If all_leathpacks has less than ~50 elements I wouldn't bother to improve. Simple loop is going to be sufficiently fast.
Otherwise you can split the vector into sectors and check only for the elements in the same sector as your player (and maybe a few around if it's close to the edge).
If you need something that's better for the memory you and use a KD-tree to index the healtpacks and search for them fast (O(logN) time).
I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.
Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}
Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.
How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa
Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...
Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.
You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...
To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.
The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.