in my game, ObjectManager class is manage my game's all objects and it has a list of Objects.
and i use 2 for statement when i check the each object's collision.
for(int i = 0; i < objectnum; ++i )
{
for(int j = 0; j < objectnum; ++j)
{
AABB_CollisionCheck()
}
}
but if there is many objects in the game, the FPS is get lower. (80 Objects == 40frame)
maybe this is because that my collision check method is inefficient.
(if the object num is n , then my method operate n^2 times)
can you give me some tips about this Collision Check Optimizing?
i want to reduce my for loop to check each Object collision.
what is different about using Callback Fucntion or not? for Collision Check.
is there have any operate speed adventage about Callback?
p.s
thanks a lot for read my question. and please excuse for my english skill...
As is often the case, knowing an extra bit of vocabulary does wonders here for finding a wealth of information. What you are looking for is called the broad phase of collision detection, which is any method that prevents you from having to look at each pair of objects and thus hopefully avoid the n^2 complexity.
One popular method is spatial hashing, where you subdivide your space into a grid of cells and assign each object to the cells which contains it. Then you only check objects in each cell against other objects from that one and the neighboring cells.
Another method is called Sweep and Prune, which uses the fact that objects usually don't move much from one frame to the next.
You can find more information in this question: Broad-phase collision detection methods?
can you give me some tips about this Collision Check Optimizing?
Well firstly you could try to optimize your second loop making that way :
for (int i = 0; i < objectnum; i++)
{
for (int j = i+1; j < objectnum; j++)
{
AABB_collisioncheck();
}
}
This will only check collisions in one way, say A collided B, then you won't trigger B collided A.
Related
Sorry for the bad title, but I can actually not think of a better one (open to suggestions).
I got a big grid (1000*1000*1000).
for (int k = 0; k<dims.nz; k++);
{
for (int i = 0; i < dims.nx; i++)
{
for (int j = 0; j < dims.ny; j++)
{
if (inputLabel->evalReg(i, j, 0) == 0)
{
sum = sum + anotherField->evalReg(i, j, 0);
}
}
}
}
I go through all grid points to find which grid points have the value 0 in my labelfield and sum up the corresponding values of another field.
After this I want to set all the points that I detected above to a certain value.
Would it be faster to do basically do the same for loop again (while this time setting values instead of reading them), or should I write all the positions that I got into a separate vector (which would have to change size in ever step of the loop in which we detect something) and simply build a for loop like
for(int p=0; p<size_vec_1,p++)
{
anotherField->set(vec_1[p],vec_2[p],vec_3[p], random value);
}
The point is that I do not know how much of the grid will be affected by my routien due to different data. Might be half of the data or something completly different. Can I do a genereal estimation of the speed of the methods or is it soley depending on the distribution of my values ?
The point is that I do not know how much of the grid will be affected by my routien due to different data.
Here's a trick, which may work: sample inputLabel randomly, to make an approximation how many entries are 0. If a few, then go the "putting indices into a vector" way. If a lot, then go the "scan the array again" way.
It needs fine tuning for a specific computer, what should be the threshold between the two cases, how many samples to take (it should not be too large, as the approximation will take too much time, but should not be too small to have a good approximation), etc.
Bonus trick: take cache-line-aligned and cache-line-sized samples. This way the approximation will take the similar amount of time (because it is memory bound), but the approximation will be better.
I have a contiguous array of particles in 3D space for a fluid simulation and I need to do a lot of neighbor searches on it. I've found that partitioning the search space into cubic cells and sorting the particles in-place by the cell they are in works well for my problem. That way for any given cell, its particles are in a contiguous span so you can iterate over them all easily if you know the begin and end indices. eg cell N might occupy [N_begin, N_end) of the array used to store the particles.
However, no matter how you divide the space, a particle might have neighbors not only in its own cell but also in every neighboring cell (imagine a particle that is almost touching the boundary between cells to understand why). Because of this, a neighbor search needs to return all particles in the same cell as well as all of its neighbors in 3D space, a total of up to 27 cells when not on the edge of the simulation space. There is no ordering of the cells into the array (which is by its nature 1D) that can get all 27 of those spans to be adjacent for any requested cell. To work around this, I keep track of where each cell begins and ends in the particle array and I have a function that determines which cells hold potential neighbors. To represent multiple ranges of indices, it has to return up to 27 pairs of indices signifying the begin and end of those ranges.
std::vector<std::pair<int, int>> get_neighbor_indices(const Vec3f &position);
The index is actually required at some point so it works better this way than a pair of iterators or some other abstraction. The problem is that it forces me to use a loop structure that is tied to the implementation. I would like something like the following, using some pseudocode and omissions to simplify.
for(int i = 0; i < num_of_particles; ++i) {
auto neighbor_indices = get_neighbor_indices(particle[i].position);
for (int j : neighbor_indices) {
// do stuff with particle[i] and particle[j]
}
}
That would only work if neighbor_indices was a complete list of all indices, but that is a significant number of indices that are trivial to calculate on the fly and therefore a huge waste of memory. So the best I can get without compromising the performance of the code is the following.
for(int i = 0; i < num_of_particles; ++i) {
auto neighbor_indices = get_neighbor_indices(particle[i].position);
for (const auto& indices_pair : neighbor_indices) {
for (int j = indices_pair.first; j < indices_pair.second; ++j) {
// do stuff with particle[i] and particle[j]
}
}
}
The loss of genericity is a setback for my project because I have to test and measure performance a lot and make adjustments when I come across a performance problem. Implementation-specific code significantly slows down this process.
I'm looking for something like an iterator, except it will return an index instead of a reference. It will allow me to use it as follows.
for(int i = 0; i < num_of_particles; ++i) {
auto neighbor_indices = get_neighbor_indices(particle[i].position);
for (int j : neighbor_indices) {
// do stuff with particle[i] and particle[j]
}
}
The only issue with this iterator-like approach is that incrementing it is cumbersome. I have to manually keep track of the range I'm in, as well as continuously check when I'm at its end to switch to the next. It's a lot of code to get rid of just that one line that breaks the genericity of the iteration loop. So I'm looking for either a cleaner way to implement the "iterator" or just a better way to iterate over a number of ranges as if they were one.
Keep in mind that this is in a bottleneck computation loop so abstractions have to be zero or negligible cost.
Why is my spatial hash so slow? I am working on a code that uses smooth particle hydrodynamics to model the movement of landslides. In smooth particle hydrodynamics each particle influences the particles that are within a distance of 3 "smoothing lengths". I am trying to implement a spatial hash function in order to have a fast look up of the neighboring particles.
For my implementation I made use of the "set" datatype from the stl. At each time step the particles are hashed into their bucket using the function below. "bucket" is a vector of sets, with one set for each grid cell (the spatial domain is limited). Each particle is identified by an integer.
To look for collisions the function below entitled "getSurroundingParticles" is used which takes an integer (corresponding to a particle) and returns a set that contains all the grid cells that are within 3 support lengths of the particle.
The problem is that this implementation is really slow, slower even than just checking each particle against every other particles, when the number of particles is 2000. I was hoping that someone could spot a glaring problem in my implementation that I'm not seeing.
//put each particle into its bucket(s)
void hashParticles()
{
int grid_cell0;
cellsWithParticles.clear();
for(int i = 0; i<N; i++)
{
//determine the four grid cells that surround the particle, as well as the grid cell that the particle occupies
//using the hash function int grid_cell = ( floor(x/cell size) ) + ( floor(y/cell size) )*width
grid_cell0 = ( floor( (Xnew[i])/cell_spacing) ) + ( floor(Ynew[i]/cell_spacing) )*cell_width;
//keep track of cells with particles, cellsWithParticles is an unordered set so duplicates will automatically be deleted
cellsWithParticles.insert(grid_cell0);
//since each of the hash buckets is an unordered set any duplicates will be deleted
buckets[grid_cell0].insert(i);
}
}
set<int> getSurroundingParticles(int particleOfInterest)
{
set<int> surroundingParticles;
int numSurrounding;
float divisor = (support_length/cell_spacing);
numSurrounding = ceil( divisor );
int grid_cell;
for(int i = -numSurrounding; i <= numSurrounding; i++)
{
for(int j = -numSurrounding; j <= numSurrounding; j++)
{
grid_cell = (int)( floor( ((Xnew[particleOfInterest])+j*cell_spacing)/cell_spacing) ) + ( floor((Ynew[particleOfInterest]+i*cell_spacing)/cell_spacing) )*cell_width;
surroundingParticles.insert(buckets[grid_cell].begin(),buckets[grid_cell].end());
}
}
return surroundingParticles;
}
The code that looks calls getSurroundingParticles:
set<int> nearbyParticles;
//for each bucket with particles in it
for ( int i = 0; i < N; i++ )
{
nearbyParticles = getSurroundingParticles(i);
//for each particle in the bucket
for ( std::set<int>::iterator ipoint = nearbyParticles.begin(); ipoint != nearbyParticles.end(); ++ipoint )
{
//do stuff to check if the smaller subset of particles collide
}
}
Thanks a lot!
Your performance is getting eaten alive by the sheer amount of stl heap allocations caused by repeatedly creating and populating all those Sets. If you profiled the code (say with a quick and easy non-instrumenting tool like Sleepy), I'm certain you'd find that to be the case. You're using Sets to avoid having a given particle added to a bucket more than once - I get that. If Duck's suggestion doesn't give you what you need, I think you could dramatically improve performance by using preallocated arrays or vectors, and getting uniqueness in those containers by adding an "added" flag to the particle that gets set when the item is added. Then just check that flag before adding, and be sure to clear the flags before the next cycle. (If the number of particles is constant, you can do this extremely efficiently with a preallocated array dedicated to storing the flags, then memsetting to 0 at the end of the frame.)
That's the basic idea. If you decide to go this route and get stuck somewhere, I'll help you work out the details.
I'm trying to write a program that handles detection of various objects. The objects have an origin, width, height, and velocity. Is there a way to set up a data structure/algorithm so that every object isn't checking with every other object?
Some sample code of the problem I'm trying to avoid:
for (int i = 0; i < ballCount; i++)
{
for (int j = i + 1; j < ballCount; j++)
{
if (balls[i].colliding(balls[j]))
{
balls[i].resolveCollision(balls[j]);
}
}
}
You can use a quadtree to quickly find all rectangles that intersect with another rectangle. If you need to handle non-rectangular shapes, you can first find objects whose bounding boxes intersect.
Some common uses of quadtrees
...
Efficient collision detection in two dimensions
...
As mentioned by other answer(s), you can use a quadtree structure to make your collision detection faster.
I would recommend the GEOS open-source C++ library, which has a good quadtree implementation. Here are the docs for their quadtree class.
So your pseudo code would look like this:
Quadtree quadtree;
// Create and populate the quadtree.
// Change it whenever the balls move.
// Here's the intersection loop:
for (int i=0; i<ballCount; ++i) {
Envelope envelope = ...; // Get the bounds (envelope) of ball i
std::vector<void*> possiblyIntersectingBalls;
quadtree.query(envelope, possiblyIntersectingBalls);
// Now loop over the members of possiblyIntersectingBalls to check
// if they really intersect, since quadtree only checks bounding
// box intersection.
}
I am writing a program to generate a graph and check whether it is connected or not. Below is the code. Here is some explanation: I generate a number of points on the plane at random locations. I then connect the nodes, NOT based on proximity only. By that I mean to say that a node is more likely to be connected to nodes that are closer, and this is determined by a random variable that I use in the code (h_sq) and the distance. Hence, I generate all links (symmetric, i.e., if i can talk to j the viceversa is also true) and then check with a BFS to see if the graph is connected.
My problem is that the code seems to be working properly. However, when the number of nodes becomes greater than ~2000 it is terribly slow, and I need to run this function many times for simulation purposes. I even tried to use other libraries for graphs but the performance is the same.
Does anybody know how could I possibly speed everything up?
Thanks,
int Graph::gen_links() {
if( save == true ) { // in case I want to store the structure of the graph
links.clear();
links.resize(xy.size());
}
double h_sq, d;
vector< vector<luint> > neighbors(xy.size());
// generate links
double tmp = snr_lin / gamma_0_lin;
// xy is a std vector of pairs containing the nodes' locations
for(luint i = 0; i < xy.size(); i++) {
for(luint j = i+1; j < xy.size(); j++) {
// generate |h|^2
d = distance(i, j);
if( d < d_crit ) // for sim purposes
d = 1.0;
h_sq = pow(mrand.randNorm(0, 1), 2.0) + pow(mrand.randNorm(0, 1), 2.0);
if( h_sq * tmp >= pow(d, alpha) ) {
// there exists a link between i and j
neighbors[i].push_back(j);
neighbors[j].push_back(i);
// options
if( save == true )
links.push_back( make_pair(i, j) );
}
}
if( neighbors[i].empty() && save == false ) {
// graph not connected. since save=false i dont need to store the structure,
// hence I exit
connected = 0;
return 1;
}
}
// here I do BFS to check whether the graph is connected or not, using neighbors
// BFS code...
return 1;
}
UPDATE:
the main problem seems to be the push_back calls within the inner for loops. It's the part that takes most of the time in this case. Shall I use reserve() to increase efficiency?
Are you sure the slowness is caused by the generation but not by your search algorithm?
The graph generation is O(n^2) and you can't do too much to it. However, you can apparently use memory in exchange of some of the time if the point locations are fixed for at least some of the experiments.
First, distances of all node pairs, and pow(d, alpha) can be precomputed and saved into memory so that you don't need to compute them again and again. The extra memory cost for 10000 nodes will be about 800mb for double and 400mb for float..
In addition, sum of square of normal variable is chi-square distribution if I remember correctly.. Probably you can have some precomputed table lookup if the accuracy allowed?
At last, if the probability that two nodes will be connected are so small if the distance exceeds some value, then you don't need O(n^2) and probably you can only calculate those node pairs that have distance smaller than some limits?
As a first step you should try to use reserve for both inner and outer vectors.
If this does not bring performance up to your expectations I believe this is because memory allocations that are still happening.
There is a handy class I've used in similar situations, llvm::SmallVector (find it in Google). It provides a vector with few pre-allocated items, so you can have decrease number of allocations by one per vector.
It still can grow when it is running out of items in pre-allocated space.
So:
1) Examine the number of items you have in your vectors on average during runs (I'm talking about both inner and outer vectors)
2) Put in llvm::SmallVector with a pre-allocation of such size (as vector is allocated on the stack you might need to increase stack size, or reduce pre-allocation if you are restricted on available stack memory).
Another good thing about SmallVector is that it has almost the same interface as std::vector (could be easily put instead of it)