Update index variables in threads - c++

I have objects like balls. These objects are dynamically created and stacked into a vector. For each of these balls, a separate stream is created that updates its coordinates. Each of these streams has a reference to a vector with balls and knows the sequence number of its ball. Then, let's say I need to delete several balls and streams associated with them.
I did it like this:
the sword has a bool killMe variable that becomes true when the ball needs to be removed. The thread that updates the coordinates notices that the ball needs to be removed, removes the ball, and terminates on its own. But when the ball is removed from the vector, the sequence numbers of the subsequent balls change and their streams, trying to refer to them the next time, cause the program to crash.
How to organize a timely update of the ball index in their streams?

Rather than each thread having an index into the vector, why not pass a reference to the object being worked on?
Note that this may still be problematic if your vector is vector<Ball>, as I'm not sure what happens to references to objects that are moved. That sounds like a problem.
But you could store vector<std::shared_ptr<Ball>> and then you're golden.
Another choice if you really want to use indexes is still to use a vector of shared pointers but then you can nullify the pointers you need to delete -- leaving holes in your vector, but at least you aren't moving things around.
The other choice involves mutexes, and you'll be mutex-locked A LOT. This seems less useful.

Related

Is writing to elements within children of a multi-dimensional vector thread-safe?

I'm trying to take a (very) large vector, and reassign all of the values in it into a multidimensional (2D) vector>.
The multidimensional vector has both dimensions resized to the correct size prior to value population, to avoid reallocation.
Currently, I am doing it single-threaded, but it is something that needs to happen repeatedly, and is very slow due to the large size (~7 seconds). The question is whether it is thread-safe for me to use, for instance, a thread per 2D element.
Some pseudocode:
vector<string> source{/*assume that it is populated by 8,000,000 strings
of varying length*/};
vector<vector<string>> destination;
destination.resize(8);
for(loop=0;loop<8;loop++)destination[loop].resize(1000000);
//current style
for(loop=0;loop<source.size();loop++)destination[loop/1000000][loop%1000000]=source[loop];
//desired style
void Populate(int index){
for(loop=0;loop<destination[index].size();loop++)destination[index][loop]=source[index*1000000+loop];
}
for(loop=0;loop<8;loop++)boost::thread populator(populate,loop);
I would think that the threaded version should work, since they're writing to separate 2nd dimensional elements. However, I'm not sure whether writing the strings would break things, since they are being resized.
When considering only thread-safety, this is fine.
Writing concurrently to distinct objects is allowed. C++ considers objects distinct even if they are neighboring fields in a struct or elements in the same array. The data type of the object does not matter here, so this holds true for string just as well as it does for int. The only important thing is that you must ensure that the ranges that you operate on are really fully distinct. If there is any overlap, you have a data race on your hands.
There is however, another thing to take into consideration here and that is performance. This is highly platform dependent, so the language standard does not give you any rules here, but there are some effects to look out for. For instance, neighboring elements in an array might reside on the same cache line. So in order for the hardware to be able to fulfill the thread-safety guarantees of the language, it must synchronize access to such elements. For instance: Partitioning array access in a way that one thread works out all the elements with even indices, while another works on the odd indices is technically thread-safe, but puts a lot of stress on the hardware as both threads are likely to contend for data stored on the same cache line.
Similarly, your case there is contention on the memory bus. If your threads are able to complete calculation of the data much faster than you are able to write them to memory, you might not actually gain anything by using multiple threads, because all the threads will end up waiting for the memory in the end.
Keep these things in mind when deciding whether parallelism is really the right solution to your problem.

Rapidly instantiate objects - good or bad?

Note: Although this question doesn't directly correlate to games, I've molded the context around game development in order to better visualize a scenario where this question is relevant.
tl;dr: Is rapidly creating objects of the same class memory intensive, inefficient, or is it common practice?
Say we have a "bullet" class - and an instance of said class is created every time the player 'shoots' - anywhere between 1 and 10 times every second. These instances may be destroyed (obviously) upon collision.
Would this be a terrible idea? Is general OOP okay here (i.e.: class Bullet { short x; short y; }, etc.) or is there a better way to do this? Is new and delete preferred?
Any input much appreciated. Thank you!
This sounds like a good use-case for techniques like memory-pools or Free-Lists. The idea in both is that you have memory for a certain number of elements pre-allocated. You can override the new operator of your class to use the pool/list or use placement new to instantiate your class in a retrieved address.
The advantages:
no memory fragmentation
pretty quick
The disadvantages:
you must know the maximum number of elements beforehand
Don't just constantly create and delete objects. Instead, an alternative is to have a constant, resizable array or list of object instances that you can reuse. For example, create an array of 100 bullets, they don't all have to be drawn, have a boolean that states whether they are "active" or not.
Then whenever you need a new bullet, "activate" an inactive bullet and set its position where you need it. Then whenever it is off screen, you can mark it inactive again and not have to delete it.
If you ever need more than 100 bullets, just expand the array.
Consider reading this article to learn more: Object Pool. It also has several other game pattern related topics.
The very very least that happens when you allocate an object is a function call (its constructor). If that allocation is dynamic, there is also the cost of memory management which at some point could get drastic due to fragmentation.
Would calling some function 10 times a second be really bad? No. Would creating and destroying many small objects dynamically 10 times a second be bad? Possibly. Should you be doing this? Absolutely not.
Even if the performance penalty is not "felt", it's not ok to have a suboptimal solution while an optimal one is immediately available.
So, instead of for example a std::list of objects that are dynamically added and removed, you can simply have a std::vector of bullets where addition of bullets means appending to the vector (which after it has reached a large enough size, shouldn't require any memory allocation anymore) and deleting means swapping the element being deleted with the last element and popping it from the vector (effectively just reducing the vector size variable).
Think that with every instantiation there is a place in the heap where it need to allocate memory and create a new instance. This may affect the performance. Try using collection and create an instance of that collection with how many bullets you want.

MPI synchronize matrix of vectors

Excuse me if this question is common or trivial, I am not very familiar with MPI so bear with me.
I have a matrix of vectors. Each vector is empty or has a few items in it.
std::vector<someStruct*> partitions[matrix_size][matrix_size];
When I start the program each process will have the same data in this matrix, but as the code progresses each process might remove several items from some vectors and put them in other vectors.
So when I reach a barrier I somehow have to make sure each process has the latest version of this matrix. The big problem is that each process might manipulate any or all vectors.
How would I go about to make sure that every process has the correct updated matrix after the barrier?
EDIT:
I am sorry I was not clear. Each process may move one or more objects to another vector but only one process may move each object. In other words each process has a list of objects it may move, but the matrix may be altered by everyone. And two processes can't move the same object ever.
In that case you'll need to send messages using MPI_Bcast that inform the other processors about this and instruct them to do the same. Alternatively, if the ordering doesn't matter until you hit the barrier, you can only send the messages to the root process which performs the permutations and then after the barrier sends it to all the others using MPI_Bcast.
One more thing: vectors of pointers are usually quite a bad idea, as you'll need to manage the memory manually in there. If you can use C++11, use std::unique_ptr or std::shared_ptr instead (depending on what your semantics are), or use Boost which provides very similar facilities.
And lastly, representing a matrix as a fixed-size array of fixed-size arrays is readlly bad. First: the matrix size is fixed. Second: adjacent rows are not necessarily stored in contiguous memory, slowing your program down like crazy (it literally can be orders of magnitudes). Instead represent the matrix as a linear array of size Nrows*Ncols, and then index the elements as Nrows*i + j where Nrows is the number of rows and i and j are the row and column indices, respectively. If you don't want column-major storage instead, address the elements by i + Ncols*j. You can wrap this index-juggling in inline functions that have virtually zero overhead.
I would suggest to lay out the data differently:
Each process has a map of his objects and their position in the matrix. How that is implemented depends on how you identify objects. If all local objects are numbered, you could just use a vector<pair<int,int>>.
Treat that as the primary structure you manipulate and communicate that structure with MPI_Allgather (each process sends it data to all other processes, at the end everyone has all data). If you need fast lookup by coordinates, then you can build up a cache.
That may or may not be performing well. Other optimizations (like sharing 'transactions') totally depend on your objects and the operations you perform on them.

Threading access to various buffers

I'm trying to figure out the best way to do this, but I'm getting a bit stuck in figuring out exactly what it is that I'm trying to do, so I'm going to explain what it is, what I'm thinking I want to do, and where I'm getting stuck.
I am working on a program that has a single array (Image really), which per frame can have a large number of objects placed on an image array. Each object is completely independent of all other objects. The only dependency is the output, in theory possible to have 2 of these objects placed on the same location on the array. I'm trying to increase the efficiency of placing the objects on the image, so that I can place more objects. In order to do that, I'm wanting to thread the problem.
The first step that I have taken towards threading it involves simply mutex protecting the array. All operations which place an object on the array will call the same function, so I only have to put the mutex lock in one place. So far, it is working, but it is not seeing the improvements that I would hope to have. I am hypothesizing that this is because most of the time, the limiting factor is the image write statement.
What I'm thinking I need to do next is to have multiple image buffers that I'm writing to, and to combine them when all of the operations are done. I should say that obscuration is not a problem, all that needs to be done is to simply add the pixel counts together. However, I'm struggling to figure out what mechanism I need to use in order to do this. I have looked at semaphores, but while I can see that they would limit a number of buffers, I can envision a situation in which two or more programs would be trying to write to the same buffer at the same time, potentially leading to inaccuracies.
I need a solution that does not involve any new non-standard libraries. I am more than willing to build the solution, but I would very much appreciate a few pointers in the right direction, as I'm currently just wandering around in the dark...
To help visualize this, imagine that I am told to place, say, balls at various locations on the image array. I am told to place the balls each frame, with a given brightness, location, and size. The exact location of the balls is dependent on the physics from the previous frame. All of the balls must be placed on a final image array, as quickly as they possibly can be. For the purpose of this example, if two balls are on top of each other, the brightness can simply be added together, thus there is no need to figure out if one is blocking the other. Also, no using GPU cards;-)
Psuedo-code would look like this: (Assuming that some logical object is given for location, brightness, and size). Also, assume, that isValidPoint simply finds if the point should be on the circle, given the location and radius of said circle.
global output_array[x_arrLimit*y_arrLimit)
void update_ball(int ball_num)
{
calc_ball_location(ball_num, *location, *brightness, *size); // location, brightness, size all set inside function
place_ball(location,brightness,size)
}
void place_ball(location,brighness,size)
{
get_bounds(location,size,*xlims,*ylims)
for (int x=xlims.min;x<xlims.max;y++)
{
for (int y=ylims.min;y<ylims.max;y++)
{
if (isValidPoint(location,size,x,y))
{
output_array(x,y)+=brightness;
}
}
}
}
The reason you're not seeing any speed up with the current design is that, with a single mutex for the entire buffer, you might as well not bother with threading, as all the objects have to be added serially anyway (unless there's significant processing being done to determine what to add, but it doesn't sound like that's the case). Depending on what it takes to "add an object to the buffer" (do you use scan-line algorithms, flood fill, or something else), you might consider having one mutex per row or a range of rows, or divide the image into rectangular tiles with one mutex per region or something. That would allow multiple threads to add to the image at the same time as long as they're not trying to update the same regions.
OK, you have an image member in some object. Add the, no doubt complex, code to add other image/objects to it. maipulate it, whatever. Aggregate in all the other objects that may be involved, add some command enun to tell threads what op to do and an 'OnCompletion' event to call when done.
Queue it to a pool of threads hanging on the end of a producer-consumer queue. Some thread will get the *object, perform the operation on the image/set and then call the event, (pass the completed *object as a parameter). In the event, you can do what you like, according to the needs of your app. Maybe you will add the processed images into a (thread-safe!!), vector or other container or queue them off to some other thread - whatever.
If the order of processing the images must be preserved, (eg. video stream), you could add an incrementing sequence-number to each object that is submitted to the pool, so enabling your 'OnComplete' handler to queue up 'later' images until all earlier ones have come in.
Since no two threads ever work on the same image, you need no locking while processing. The only locks you should, (may), need are those internal the queues, and they only lock for the time taken to push/pop object pointers to/from the queue - contention will be very rare.

How should I store and and use gun fire data in a game?

On this game I have 3 defense towers (the number is configurable) which fire a "bullet" every 3 seconds at 30km/h. These defense towers have a radar and they only start firing when the player is under the tower radar. That's not the issue.
My question is how to store the data for the gun fire. I'm not sure exactly what data do I need for each bullet, but one that comes to mind is the position of the bullet of course. Let's assume that I only need to store that (I already have a struct defined for a 3D point) for now.
Should I try to figure it out the maximum bullets the game can have at a particular point and declare an array with that size? Should I use a linked-list? Or maybe something else?
I really have no idea how to do this. I don't need anything fancy or complex. Something basic that just works and it's easy to use and implement is more than enough.
P.S: I didn't post this question on the game development website (despite the tag) because I think it fits better here.
Generally, fixed length arrays aren't a good idea.
Given your game model, I wouldn't go for any data structure that doesn't allow O(1) removal. That rules out plain arrays anyway, and might suggest a linked list. However the underlying details should be abstracted out by using a generic container class with the right attributes.
As for what you should store:
Position (as you mentioned)
Velocity
Damage factor (your guns are upgradeable, aren't they?)
Maximum range (ditto)
EDIT To slightly complicated matters the STL classes always take copies of the elements put in them, so in practise if any of the attributes might change over the object's lifetime you'll need to allocate your structures on the heap and store (smart?) pointers to them in the collection.
I'd probably use a std::vector or std::list. Whatever's easiest.
Caveat: If you are coding for a very constrained platform (slow CPU, little memory), then it might make sense to use a plain-old fixed-size C array. But that's very unlikely these days. Start with whatever is easiest to code, and change it later if and only if it turns out you can't afford the abstractions.
I guess you can start off with std::vector<BulletInfo> and see how it works from there. It provides the array like interface but is dynamically re-sizable.
In instances like this I prefer a slightly more complex method to managing bullets. Since the number of bullets possible on screen is directly related to the number of towers I would keep a small fixed length array of bullets inside each tower class. Whenever a tower goes to fire a bullet it would search through its array, find an un-used bullet, setup the bullet with a new position/velocity and mark it active.
The slightly more complex part is I like to keep a second list of bullets in an outside manager, say a BulletManager. When each tower is created the tower would add all its bullets to the central manager. Then the central manager can be in charge of updating the bullets.
I like this method because it easily allows me to manage memory constrains related to bullets, just tweak the 'number of active towers' number and all of the bullets are created for you. You don't need to allocate bullets on the fly because they are all pooled, and you don't have just one central pool that you constantly need to change the size of as you add/remove towers.
It does involve slightly move overhead because there is a central manager with a list of pointers. And you need to be careful to always remove any bullets from a destroyed tower from the central manager. But for me the benefits are worth it.