So I have some particles (ellipses) bouncing around the screen. I'm trying to get them to collide rather than pass over each other. In order to do this I must cycle through every particle and compare it's distance to every other particle with a for loop nested within another for loop, then tell their velocity to change when their points are a certain distance from each other like so:
//p.size() returns the size of the particle system (yes it works)
//ofDist() is an open frameworks function that calculates the dist between 2 points
for( int i = 0; i < p.size(); i++){
// cout << i << endl;
for(int j = 0; j < p.size(); j++){
// cout << j << endl;
pDist[i] = ofDist(p[i].pos.x, p[i].pos.y, p[j].pos.x, p[j].pos.y);
// cout << pDist[i] << endl;
if(pDist[i] <= 300){
p[i].vel.x *= -1;
p[i].vel.y *= -1;
p[j].vel.x *= -1;
p[j].vel.y *= -1;
}
}
}
But for some mysterious reason they still pass right over each other like they don't even exist. It does work if I apply this to just 2 particles without the for loops:
pDist[0] = ofDist(p[0].pos.x, p[0].pos.y, p[1].pos.x, p[1].pos.y);
if(pDist[0] <= 300){
cout << "It's colliding" << endl;
p[0].vel.x *= -1;
p[0].vel.y *= -1;
p[1].vel.x *= -1;
p[1].vel.y *= -1;
}
The particles are stored in a vector by the way.
Any ideas how I can get this to work with the for loops?
update
The size of my vector is 3, so p.size() = 3 ( or 2, doesn't really make a difference right now). I substituted p.size() for 2 and 3 in my code and it didn't change anything, so that's not the source of the issue.
update 2
If someone could let me know what I need to do to not get downvoted that would be helpful. :/
A pretty large issue is that by saying:
for( int i = 0; i < p.size(); i++){
for(int j = 0; j < p.size(); j++){
You are actually checking each particle against themselves. You are also checking particles collisions twice. By detecting a single collision twice, and inverting the velocity each time, you are essentially doing nothing( a * -1 * -1 = a ).
A better way to do this would be to use a loop where particles collisions are only checked once, and a particle is not checked against itself. You can do this by starting the nested loop after the current particle (essentially offsetting the index by the indexes that have already been checked), like so:
for( int i = 0; i < p.size()-1; i++){
for(int j = i+1; j < p.size(); j++){
This also has the benefit of being significantly faster for a larger number of particles.
There is also no reason to store the calculated distance in an array (unless your code makes use of this somewhere else). Simply using a double would work fine here.
Edit:
Just to be a bit clearer, I have logged the output of the two arrays to demonstrate. I have used 3 particles in the array.
Original loop
1 compared to 1 (This is a problem. Checking a particle against itself)
1 compared to 2
1 compared to 3
2 compared to 1 (This is a problem. This has already been checked for)
2 compared to 2 (This is a problem. Checking a particle against itself)
2 compared to 3
3 compared to 1 (This is a problem. This has already been checked for)
3 compared to 2 (This is a problem. This has already been checked for)
3 compared to 3 (This is a problem. Checking a particle against itself)
Modified loop
1 compared to 2
1 compared to 3
2 compared to 3
As you can see, there are only three collisions checked for in the modified loop, and there are no double ups.
Related
I am implementing pitch tracking using an autocorrelation method in C++ but I am struggling to write the actual line of code which performs the autocorrelation.
I have an array containing a certain number ('values') of amplitude values of a pre-recorded signal, and I am performing the autocorrelation function on a set number (N) of these values.
In order to perform the autocorrelation I have taken the original array and reversed it so that point 0 = point N, point 1 = point N-1 etc, this array is called revarray
Here is what I want to do mathematically:
(array[0] * revarray[0])
(array[0] * revarray[1]) + (array[1] * revarray[0])
(array[0] * revarray[2]) + (array[1] * revarray[1]) + (array[2] * revarray[0])
(array[0] * revarray[3]) + (array[1] * revarray[2]) + (array[2] * revarray[1]) + (array[3] * revarray[0])
...and so on. This will be repeated for array[900]->array[1799] etc until autocorrelation has been performed on all of the samples in the array.
The number of times the autocorrelation is carried out is:
values / N = measurements
Here is the relevent section of my code so far
for (k = 0; k = measurements; ++k){
for (i = k*(N - 1), j = k*N; i >= 0; i--, j++){
revarray[j] = array[i];
for (a = k*N; a = k*(N - 1); ++a){
autocor[a]=0;
for (b = k*N; b = k*(N - 1); ++b){
autocor[a] += //**Here is where I'm confused**//
}
}
}
}
I know that I want to keep iteratively adding new values to autocor[a], but my problem is that the value that needs to be added to will keep changing. I've tried using an increasing count like so:
for (i = (k*N); i = k*(N-1); ++i){
autocor[i] += array[i] * revarray[i-1]
}
But I clearly know this won't work as when the new value is added to the previous autocor[i] this previous value will be incorrect, and when i=0 it will be impossible to calculate using revarray[i-1]
Any suggestions? Been struggling with this for a while now. I managed to get it working on just a single array (not taking N samples at a time) as seen here but I think using the inverted array is a much more efficient approach, I'm just struggling to implement the autocorrelation by taking sections of the entire signal.
It is not very clear to me, but I'll assume that you need to perform your iterations as many times as there are elements in that array (if it is indeed only half that much - adjust the code accordingly).
Also the N is assumed to mean the size of the array, so the index of the last element is N-1.
The loops would looks like that:
for(size_t i = 0; i < N; ++i){
autocorr[i] = 0;
for(size_t j = 0; j <= i; ++j){
const size_t idxA = j
, idxR = i - j; // direct and reverse indices in the array
autocorr[i] += array[idxA] * array[idxR];
}
}
Basically you run the outer loop as many times as there are elements in your array and for each of those iterations you run a shorter loop up to the current last index of the outer array.
All that is left to be done now is to properly calculate the indices of the array and revarray to perform the calculations and accummulate a running sum in the current outer loop's index.
I'm attempting to learn C++ by creating a small maze generator. To facilitate this, I store a list of Cell(), along with x and y values inside a vector in Maze() (some other pieces of information are included below for completeness, but aren't relevant).
The cells are stored in a vector, with X and Y value of the cells being determined and passed in to each cell as it is created.
The problem I'm having is that each cell appears to have the same x and y values populated.
Here is the relevant code:
vector<Cell*> Maze::cells;
int Maze::width;
int Maze::height;
Maze::Maze(int w, int h)
{
/* Set width and height */
width = w;
height = h;
/* These variables keep track of our position in the maze as we generate it */
int scan_w = 0;
int scan_h = 0;
/* Continue looping until we've visited all cells */
/* Offset by one because the width starts at 1 while the scan is zero-based */
for (int i = 0; i <= (width * height); i++)
{
cells.push_back(new Cell(scan_w,scan_h));
cout << scan_w << "/" << scan_h << endl;
scan_w = (i % w);
if (scan_w == 0)
{
scan_h++;
}
}
for(int i = 0; i <= cells.size(); i++)
{
cout << "[" << cells[i]->x << ", " << cells[i]->y << "] " << &cells[i] << endl;
}
}
Edit: Here is the relevant parts of the Cell class
int Cell::x;
int Cell::y;
Cell::Cell(int location_x, int location_y)
{
x = location_x;
y = location_y;
}
The output of this code (truncated for brevity) is:
Maze maze = Maze(50, 25);
0/0
0/1
1/1
2/1
3/1
4/1
5/1
6/1
7/1
8/1
9/1
...
40/25
41/25
42/25
43/25
44/25
45/25
46/25
47/25
48/25
49/25
[49, 25] 0x632f30
[49, 25] 0x632f38
[49, 25] 0x632f40
[49, 25] 0x632f48
[49, 25] 0x632f50
[49, 25] 0x632f58
[49, 25] 0x632f60
[49, 25] 0x632f68
...
Here are my assumptions:
Based on the output, scan_w and scan_h are being incremented as intended (as if reading a table from left to right, top to bottom).
Based on the flow control documentation/tutorial, my understanding is that the first for loop is properly moving from one element of cells to the next.
Based on the documentation for vector's push_back member, I'm assuming that it is properly inserting the reference to each newly created cell into the vector.
Based on the documentation for the [] operator for vectors, my understanding is that if I access cell[0] and cell[1], I will be accessing different objects (this is confirmed by printing the object's address, as above).
So I'm having trouble understanding why the value of x and y for each cell is 49 and 25, respectively, when any given cell is supposed to be incremented in alignment with scan_w and scan_h.
Lastly, here are a couple of things I considered:
The for loop may be reading the same object (disproved by printing the address of the object).
This may be an issue of scope. I'm used to Python, so my presumption is that scope works the same way, but I'm not well enough versed in C++ to know if that's accurate.
This may have something to do with the way Vectors operate (more or less disproven by the documentation on the [] operator).
The output may be deceptive/the scan_ incrementation code may be buggy. This is the most likely scenario, but I haven't been able to spot anything off yet. Possibly a short break and fresh eyes may reveal something here.
Just a shot in the blue, considering how you have formatted the code for the Cell class. Could it be that x and y are defined static (otherwise you would not write "int Cell:x;" anywhere...)? Then the case is clear, because that is the nature of members defined static. The solution would be to just remove the static keyword.
I see multiple bugs in the shown code.
for (int i = 0; i <= (width * height); i++)
This is iterating one too many times. If, for example, both width and height are 10, this will iterate with i set to the range of 0 to 100 inclusively, or 101 cells, instead of 100.
The next problem is that computation of scan_w and scan_h is unnecessarily complex. This should be a trivial calculation, using simple math, and also fixing the iteration bug at the same time:
for (int i = 0; i < (width * height); i++)
{
int scan_w = i % width;
int scan_h = i / width;
cells.push_back(new Cell(scan_w,scan_h));
}
Another bug is here:
for(int i = 0; i <= cells.size(); i++)
Same problem as the first bug. On the last iteration, i will be equal to cells.size(), and cells[i] will not exist, resulting in undefined behavior.
Again, the iteration should be corrected to:
for(int i = 0; i < cells.size(); i++)
You can start by fixing these problems yourself, then checking to see if the results match your expectations, or if there are still other problems.
if i had given the maximum weight say w=20 .and i had given a set on weights say m=[5,7,12,18] then how could i calculate the max possible weight that we can hold inside the maximum weight using the m. in this case the answer is 19.by adding 12+7=19. and my code is giving me 18.please help me in this.
int weight(int W, vector<int> &m) {
int current_weight = 0;
int temp;
for (int i = 0; i < w.size(); i++) {
for (int j = i + 1; j < m.size(); j++) {
if (m[i] < m[j]) {
temp = m[j];
m[j] = m[i];
m[i] = temp;
}
}
}
for (size_t i = 0; i < m.size(); ++i) {
if (current_weight + m[i] <= W) {
current_weight += m[i];
}
}
return current_weight;
}
The problem you describe looks more like a version of the maximum subset sum problem. Basically, there is nothing wrong with your implementaion in the first place; apparently you have correctly implemented a greedy algorithm for the problem. That being said, this algorithm fails to generate an optimal solution for every input. The instance you have found is such an example.
However, the problem can be solved using a different approach termed dynamic programming, which can be seen as form of organization of a recursive formulation of the solution.
Let m = { m_1, ... m_n } be the set of positive item sizes and W a capscity constraint where n is a positive integer. Organize an array A[n][W] as a state space where
A[i][j] = the maximum weight at most j attainable for the set of items
with indices from 0 to i if such a solution exists and
minus infinity otherwise
for each i in {1,...,n} and j in {1,...,W}; for ease of presentation, suppose that A has a value of minus infinity everywhere else. Note that for each such i and j the recurrence relation
A[i][j] = min { A[i-1][W-m_j] + m_j, A[i-1][W] }
holds, where the first case corresponds to selecting item i into the solution and the second case corresponds to not selecting item i into the solution.
Next, organize a loop which fills this table in an order of increasing values of i and j, where the initialization for i = 1 has to be done before. After filling the state space, the maximum feasible value in the last colum
max{ A[n][j] : j in {1,...,W}, A[n][j] is not minus infinity }
yields the optimal solution. If the associated set of items is also desired, either some backtracking or suitable auxiliary data structures have to be used.
So it feels like this solution can be a trivial change to the commonly existing 0-1 knapsack problem, by passing the copy of the weight array as the value array.
for (int i = 0; i < 5000; i++)
for (int j = 0; j < 5000; j++)
{
for (int ii = 0; ii < 20; ii++)
for (int jj = 0; jj < 20; jj++)
{
int num = matBigger[i+ii][j+jj];
// Extract range from this.
int low = num & 0xff;
int high = num >> 8;
if (low < matSmaller[ii][jj] && matSmaller[ii][jj] > high)
// match found
}
}
The machine is x86_64, 32kb L1 cahce, 256 Kb L2 cache.
Any pointers on how can I possibly optimize this code?
EDIT Some background to the original problem : Fastest way to Find a m x n submatrix in M X N matrix
First thing I'd try is to move the ii and jj loops outside the i and j loops. That way you're using the same elements of matSmaller for 25 million iterations of the i and j loops, meaning that you (or the compiler if you're lucky) can hoist the access to them outside those loops:
for (int ii = 0; ii < 20; ii++)
for (int jj = 0; jj < 20; jj++)
int smaller = matSmaller[ii][jj];
for (int i = 0; i < 5000; i++)
for (int j = 0; j < 5000; j++) {
int num = matBigger[i+ii][j+jj];
int low = num & 0xff;
if (low < smaller && smaller > (num >> 8)) {
// match found
}
}
This might be faster (thanks to less access to the matSmaller array), or it might be slower (because I've changed the pattern of access to the matBigger array, and it's possible that I've made it less cache-friendly). A similar alternative would be to move the ii loop outside i and j and hoist matSmaller[ii], but leave the jj loop inside. The rule of thumb is that it's more cache-friendly to increment the last index of a multi-dimensional array in your inner loops, than earlier indexes. So we're "happier" to modify jj and j than we are to modify ii and i.
Second thing I'd try - what's the type of matBigger? Looks like the values in it are only 16 bits, so try it both as int and as (u)int16_t. The former might be faster because aligned int access is fast. The latter might be faster because more of the array fits in cache at any one time.
There are some higher-level things you could consider with some early analysis of smaller: for example if it's 0 then you needn't examine matBigger for that value of ii and jj, because num & 0xff < 0 is always false.
To do better than "guess things and see whether they're faster or not" you need to know for starters which line is hottest, which means you need a profiler.
Some basic advice:
Profile it, so you can learn where the hot-spots are.
Think about cache locality, and the addresses resulting from your loop order.
Use more const in the innermost scope, to hint more to the compiler.
Try breaking it up so you don't compute high if the low test is failing.
Try maintaining the offset into matBigger and matSmaller explicitly, to the innermost stepping into a simple increment.
Best thing ist to understand what the code is supposed to do, then check whether another algorithm exists for this problem.
Apart from that:
if you are just interested if a matching entry exists, make sure to break out of all 3 loops at the position of // match found.
make sure the data is stored in an optimal way. It all depends on your problem, but i.e. it could be more efficient to have just one array of size 5000*5000*20 and overload operator()(int,int,int) for accessing elements.
What are matSmaller and matBigger?
Try changing them to matBigger[i+ii * COL_COUNT + j+jj]
I agree with Steve about rearranging your loops to have the higher count as the inner loop. Since your code is only doing loads and compares, I believe a significant portion of the time is used for pointer arithmetic. Try an experiment to change Steve's answer into this:
for (int ii = 0; ii < 20; ii++)
{
for (int jj = 0; jj < 20; jj++)
{
int smaller = matSmaller[ii][jj];
for (int i = 0; i < 5000; i++)
{
int *pI = &matBigger[i+ii][jj];
for (int j = 0; j < 5000; j++)
{
int num = *pI++;
int low = num & 0xff;
if (low < smaller && smaller > (num >> 8)) {
// match found
} // for j
} // for i
} // for jj
} // for ii
Even in 64-bit mode, the C compiler doesn't necessarily do a great job of keeping everything in register. By changing the array access to be a simple pointer increment, you'll make the compiler's job easier to produce efficient code.
Edit: I just noticed #unwind suggested basically the same thing. Another issue to consider is the statistics of your comparison. Is the low or high comparison more probable? Arrange the conditional statement so that the less probable test is first.
Looks like there is a lot of repetition here. One optimization is to reduce the amount of duplicate effort. Using pen and paper, I'm showing the matBigger "i" index iterating as:
[0 + 0], [0 + 1], [0 + 2], ..., [0 + 19],
[1 + 0], [1 + 1], ..., [1 + 18], [1 + 19]
[2 + 0], ..., [2 + 17], [2 + 18], [2 + 19]
As you can see there are locations that are accessed many times.
Also, multiplying the iteration counts indicate that the inner content is accessed: 20 * 20 * 5000 * 5000, or 10000000000 (10E+9) times. That's a lot!
So rather than trying to speed up the execution of 10E9 instructions (such as execution (pipeline) cache or data cache optimization), try reducing the number of iterations.
The code is searcing the matrix for a number that is within a range: larger than a minimal value and less than the maximum range value.
Based on this, try a different approach:
Find and remember all coordinates where the search value is greater
than the low value. Let us call these anchor points.
For each anchor point, find the coordinates of the first value after
the anchor point that is outside the range.
The objective is to reduce the number of duplicate accesses. Anchor points allow for a one pass scan and allow other decisions such as finding a range or determining an MxN matrix that contains the anchor value.
Another idea is to create new data structures containing the matBigger and matSmaller that are more optimized for searching.
For example, create a {value, coordinate list} entry for each unique value in matSmaller:
Value coordinate list
26 -> (2,3), (6,5), ..., (1007, 75)
31 -> (4,7), (2634, 5), ...
Now you can use this data structure to find values in matSmaller and immediately know their locations. So you could search matBigger for each unique value in this data structure. This again reduces the number of access to the matrices.
I have a problem. I'm working on a task that tries to find a matrix (vector) inside another matrix(vector) and the size of the matrices are:
Massive Matrix: 1024x768
Small Matrix: 36x49
Basically, my theory was to split the massive matrix into blocks that were the size of the small matrix thus meaning I was able to just see whether the small matrix exists in which block and then output the block. However, it just will not split equally but I need a way to determine if the small matrix does actually exist in the massive matrix.
As an example, I'll use test data:
M1 =
0 1 0 0
1 1 1 1
0 0 0 0
1 0 1 1
M2 =
0 1
1 1
And then I would split the matrices into blocks of 2x2 and then check that way. This is simple as I'm only working with a small matrix AND the matrix can be split equally, whereas the problem above is a lot more complex to understand and figure out.
In essence, I need to be able to split the (1024x768) into block sizes of (36x49) so then I can do the check to determine where that particular matrix is. I have been working with this algorithm:
// Assume:
// matrix1ColSize = 768
// matrix2ColSize = 49
const int ROW_BOUNDS = matrix1.size() - matrix2.size();
const int COL_BOUNDS = matrix1ColSize - matrix2ColSize;
bool found = false;
for(int i=0; (i < ROW_BOUNDS); i++)
{
bool matchFound = false;
for(int j=0; (j < COL_BOUNDS); j++) {
// logic here
}
cout << endl;
}
Could anyone offer any advice please? This is really annoying me now :(!
Two matrices are the same if all their elements are the same. So the following pseudo-code compares the small matrix with a block in the large matrix:
Initialize result to "true"
For each position in the small matrix
Read the value from the large matrix; call it x1
Read the value from the small matrix; call it x2
If x1 is not equal to x2, set result to "false"
(Optional) If x1 is not equal to x2, stop looking at other positions
Here, use the result
This logic is going to be inside your 2 nested loops, so you will have 4 nested loops there! If you fear of getting confused, put the implementation inside a function. If you want to use 4 nested loops, good luck.
In c++:
bool is_equal = true;
for (int y = 0; y < 49; ++y)
{
for (int x = 0; x < 36; ++x)
{
if (matrix1.at(j + x, i + y) != matrix2.at(x, y))
{
is_equal = false;
goto DONE; // optional
}
}
}
DONE:;
Edit: this code assumes using a custom class for matrices; after looking again at your code i realize that you probably use a vector of vectors (std::vector<std::vector<int>>), so use matrix2[y][x] instead of matrix2.at(x, y).