I'm attempting to learn C++ by creating a small maze generator. To facilitate this, I store a list of Cell(), along with x and y values inside a vector in Maze() (some other pieces of information are included below for completeness, but aren't relevant).
The cells are stored in a vector, with X and Y value of the cells being determined and passed in to each cell as it is created.
The problem I'm having is that each cell appears to have the same x and y values populated.
Here is the relevant code:
vector<Cell*> Maze::cells;
int Maze::width;
int Maze::height;
Maze::Maze(int w, int h)
{
/* Set width and height */
width = w;
height = h;
/* These variables keep track of our position in the maze as we generate it */
int scan_w = 0;
int scan_h = 0;
/* Continue looping until we've visited all cells */
/* Offset by one because the width starts at 1 while the scan is zero-based */
for (int i = 0; i <= (width * height); i++)
{
cells.push_back(new Cell(scan_w,scan_h));
cout << scan_w << "/" << scan_h << endl;
scan_w = (i % w);
if (scan_w == 0)
{
scan_h++;
}
}
for(int i = 0; i <= cells.size(); i++)
{
cout << "[" << cells[i]->x << ", " << cells[i]->y << "] " << &cells[i] << endl;
}
}
Edit: Here is the relevant parts of the Cell class
int Cell::x;
int Cell::y;
Cell::Cell(int location_x, int location_y)
{
x = location_x;
y = location_y;
}
The output of this code (truncated for brevity) is:
Maze maze = Maze(50, 25);
0/0
0/1
1/1
2/1
3/1
4/1
5/1
6/1
7/1
8/1
9/1
...
40/25
41/25
42/25
43/25
44/25
45/25
46/25
47/25
48/25
49/25
[49, 25] 0x632f30
[49, 25] 0x632f38
[49, 25] 0x632f40
[49, 25] 0x632f48
[49, 25] 0x632f50
[49, 25] 0x632f58
[49, 25] 0x632f60
[49, 25] 0x632f68
...
Here are my assumptions:
Based on the output, scan_w and scan_h are being incremented as intended (as if reading a table from left to right, top to bottom).
Based on the flow control documentation/tutorial, my understanding is that the first for loop is properly moving from one element of cells to the next.
Based on the documentation for vector's push_back member, I'm assuming that it is properly inserting the reference to each newly created cell into the vector.
Based on the documentation for the [] operator for vectors, my understanding is that if I access cell[0] and cell[1], I will be accessing different objects (this is confirmed by printing the object's address, as above).
So I'm having trouble understanding why the value of x and y for each cell is 49 and 25, respectively, when any given cell is supposed to be incremented in alignment with scan_w and scan_h.
Lastly, here are a couple of things I considered:
The for loop may be reading the same object (disproved by printing the address of the object).
This may be an issue of scope. I'm used to Python, so my presumption is that scope works the same way, but I'm not well enough versed in C++ to know if that's accurate.
This may have something to do with the way Vectors operate (more or less disproven by the documentation on the [] operator).
The output may be deceptive/the scan_ incrementation code may be buggy. This is the most likely scenario, but I haven't been able to spot anything off yet. Possibly a short break and fresh eyes may reveal something here.
Just a shot in the blue, considering how you have formatted the code for the Cell class. Could it be that x and y are defined static (otherwise you would not write "int Cell:x;" anywhere...)? Then the case is clear, because that is the nature of members defined static. The solution would be to just remove the static keyword.
I see multiple bugs in the shown code.
for (int i = 0; i <= (width * height); i++)
This is iterating one too many times. If, for example, both width and height are 10, this will iterate with i set to the range of 0 to 100 inclusively, or 101 cells, instead of 100.
The next problem is that computation of scan_w and scan_h is unnecessarily complex. This should be a trivial calculation, using simple math, and also fixing the iteration bug at the same time:
for (int i = 0; i < (width * height); i++)
{
int scan_w = i % width;
int scan_h = i / width;
cells.push_back(new Cell(scan_w,scan_h));
}
Another bug is here:
for(int i = 0; i <= cells.size(); i++)
Same problem as the first bug. On the last iteration, i will be equal to cells.size(), and cells[i] will not exist, resulting in undefined behavior.
Again, the iteration should be corrected to:
for(int i = 0; i < cells.size(); i++)
You can start by fixing these problems yourself, then checking to see if the results match your expectations, or if there are still other problems.
Related
I managed to reduce the problem to the following code, which uses almost 500MB of memory when it runs on my laptop - which in turn causes a std::bad_alloc in the full program. What is the problem here? As far as I can see, the unordered map only uses something like (32+32)*4096*4096 bits = 134.2MB, which is not even close to what the program uses.
#include<iostream>
#include<unordered_map>
using namespace std;
int main()
{
unordered_map<int,int> a;
long long z = 0;
for (int x = 0; x < 4096; x++)
{
for (int y = 0; y < 4096; y++)
{
z = 0;
for (int j = 0; j < 4; j++)
{
z ^= ((x>>(3*j))%8)<<(3*j);
z ^= ((y>>(3*j))%8)<<(3*j + 12);
}
a[z]++;
}
}
return 0;
}
EDIT: I'm aware that some of the bit shifting here can cause undefined behaviour, but I'm 99% sure that's not what's the problem.
EDIT2: What I need is essentially to count the number of x in a given set that some function maps to each y in a second set (of size 4096*4096). Would it be better to perhaps store these numbers in an array? I.e I have a function f: A to B, and I need to know the size of the set {x in A : f(x) = y} for each y in B. In this case A and B are both the set of non-negative integers less than 2^12=4096. (Ideally I would like to extend this to 2^32).
... which uses almost 500MB of memory ... What is the problem here?
There isn't really a problem, per se, regarding the memory usage you are observing. std::unordered_map is built to run fast for large number of elements. As such, memory isn't a top priority. For example, in order to optimize for resizing, it often allocates upon creation for some pre-calculated hash chains. Also, your measure of the the count of elements multiplied by the element's size is not taking into account the actual memory-footprint, data structure-wise, of each node in this map -- which should at least involve a few pointers to adjacent elements in the list of its bucket.
Having said that, it isn't clear you even need to use std::unorderd_map in this scenario. Instead, given the mapping your trying store is defined as
{x in A : f(x) = y} for each y in B
you could have one fixed-sized array (use std::array for that) that would simply hold for each index i, representing the element in set B, the number of elements from set A that fills the criteria.
So I have some particles (ellipses) bouncing around the screen. I'm trying to get them to collide rather than pass over each other. In order to do this I must cycle through every particle and compare it's distance to every other particle with a for loop nested within another for loop, then tell their velocity to change when their points are a certain distance from each other like so:
//p.size() returns the size of the particle system (yes it works)
//ofDist() is an open frameworks function that calculates the dist between 2 points
for( int i = 0; i < p.size(); i++){
// cout << i << endl;
for(int j = 0; j < p.size(); j++){
// cout << j << endl;
pDist[i] = ofDist(p[i].pos.x, p[i].pos.y, p[j].pos.x, p[j].pos.y);
// cout << pDist[i] << endl;
if(pDist[i] <= 300){
p[i].vel.x *= -1;
p[i].vel.y *= -1;
p[j].vel.x *= -1;
p[j].vel.y *= -1;
}
}
}
But for some mysterious reason they still pass right over each other like they don't even exist. It does work if I apply this to just 2 particles without the for loops:
pDist[0] = ofDist(p[0].pos.x, p[0].pos.y, p[1].pos.x, p[1].pos.y);
if(pDist[0] <= 300){
cout << "It's colliding" << endl;
p[0].vel.x *= -1;
p[0].vel.y *= -1;
p[1].vel.x *= -1;
p[1].vel.y *= -1;
}
The particles are stored in a vector by the way.
Any ideas how I can get this to work with the for loops?
update
The size of my vector is 3, so p.size() = 3 ( or 2, doesn't really make a difference right now). I substituted p.size() for 2 and 3 in my code and it didn't change anything, so that's not the source of the issue.
update 2
If someone could let me know what I need to do to not get downvoted that would be helpful. :/
A pretty large issue is that by saying:
for( int i = 0; i < p.size(); i++){
for(int j = 0; j < p.size(); j++){
You are actually checking each particle against themselves. You are also checking particles collisions twice. By detecting a single collision twice, and inverting the velocity each time, you are essentially doing nothing( a * -1 * -1 = a ).
A better way to do this would be to use a loop where particles collisions are only checked once, and a particle is not checked against itself. You can do this by starting the nested loop after the current particle (essentially offsetting the index by the indexes that have already been checked), like so:
for( int i = 0; i < p.size()-1; i++){
for(int j = i+1; j < p.size(); j++){
This also has the benefit of being significantly faster for a larger number of particles.
There is also no reason to store the calculated distance in an array (unless your code makes use of this somewhere else). Simply using a double would work fine here.
Edit:
Just to be a bit clearer, I have logged the output of the two arrays to demonstrate. I have used 3 particles in the array.
Original loop
1 compared to 1 (This is a problem. Checking a particle against itself)
1 compared to 2
1 compared to 3
2 compared to 1 (This is a problem. This has already been checked for)
2 compared to 2 (This is a problem. Checking a particle against itself)
2 compared to 3
3 compared to 1 (This is a problem. This has already been checked for)
3 compared to 2 (This is a problem. This has already been checked for)
3 compared to 3 (This is a problem. Checking a particle against itself)
Modified loop
1 compared to 2
1 compared to 3
2 compared to 3
As you can see, there are only three collisions checked for in the modified loop, and there are no double ups.
I'm trying to make a sudoku solver in c++. I want to keep an array from [9] by [9] (obviously). I'm now figuring out a way to keep track of the possible values. I thought about a list for every entry in the array. So the list has initially the numbers 1 to 9, and every iteration I would be able to get rid of some values.
Now my question is, can I assign one list to every entry in the 2D array, if so how? And else is there an other/better option?
I'm a starter programmer and this is basicly my first project in c++.
thanks in advance!
One simple solution is to use a set of one bit flags for each square, e.g.
uint16_t board[9][9]; // 16 x 1 bit flags for each square where 9 bits are used
// to represent possible values for the square
Then you can use bitwise operators to set/clear/test each bit, e.g.
board[i][j] |= (1 << n); // set bit n at board position i, j
board[i][j] &= ~(1 << n); // clear bit n at board position i, j
test = (board[i][j] & (1 << n)) != 0; // test bit n at board position i, j
Well you can create an array of sets by doing
std::array<std::set<int>,81> possibleValues;
for example. You can fill this array with all possibilities by writing
const auto allPossible = std::set<int>{ 0, 1, 2, 3, 4, 5, 6, 7, 8 };
std::fill( std::begin(possibleValues), std::end(possibleValues),
allPossible );
if you are using a modern C++11 compiler. This is how you can set/clear and test each entry:
possibleValues[x+9*y].insert( n ); // sets that n is possible at (x,y).
possibleValues[x+9*y].erase( n ); // clears the possibility of having n at (x,y).
possibleValues[x+9*y].count( n ) != 0 // tells, if n is possible at (x,y).
If performance is an issue, you might want to use bit operations rather than (relatively) heavyweight std::set operations. In this case use
std::array<short, 81> possibleValues;
std::fill( begin(possibleValues), end(possibleValues), (1<<10)-1 );
The value n is possible for the field (x,y), if and only if possibleValues[x+9*y] & (1<<n) != 0, where all indices start at 0 in this case.
You can always think of your sudoku as a 3D array making your 3D dimension to store the possible values and mainly:
// set "1" in cell's which index corespond to a possible value for the Sudoku cell
for (int x = 0; x < 9; x++)
for (int y = 0; y < 9; y++)
for (int i = 1; i < 10; i++)
arr[x][y][i] = 1;
and arr[x][y][0] contains the value of your Sudoku.
to remoove for exemple the value of "5" as a possibility for the cell [x][y] just change the value of arr[x][y][5] = 0
for (int i = 0; i < 5000; i++)
for (int j = 0; j < 5000; j++)
{
for (int ii = 0; ii < 20; ii++)
for (int jj = 0; jj < 20; jj++)
{
int num = matBigger[i+ii][j+jj];
// Extract range from this.
int low = num & 0xff;
int high = num >> 8;
if (low < matSmaller[ii][jj] && matSmaller[ii][jj] > high)
// match found
}
}
The machine is x86_64, 32kb L1 cahce, 256 Kb L2 cache.
Any pointers on how can I possibly optimize this code?
EDIT Some background to the original problem : Fastest way to Find a m x n submatrix in M X N matrix
First thing I'd try is to move the ii and jj loops outside the i and j loops. That way you're using the same elements of matSmaller for 25 million iterations of the i and j loops, meaning that you (or the compiler if you're lucky) can hoist the access to them outside those loops:
for (int ii = 0; ii < 20; ii++)
for (int jj = 0; jj < 20; jj++)
int smaller = matSmaller[ii][jj];
for (int i = 0; i < 5000; i++)
for (int j = 0; j < 5000; j++) {
int num = matBigger[i+ii][j+jj];
int low = num & 0xff;
if (low < smaller && smaller > (num >> 8)) {
// match found
}
}
This might be faster (thanks to less access to the matSmaller array), or it might be slower (because I've changed the pattern of access to the matBigger array, and it's possible that I've made it less cache-friendly). A similar alternative would be to move the ii loop outside i and j and hoist matSmaller[ii], but leave the jj loop inside. The rule of thumb is that it's more cache-friendly to increment the last index of a multi-dimensional array in your inner loops, than earlier indexes. So we're "happier" to modify jj and j than we are to modify ii and i.
Second thing I'd try - what's the type of matBigger? Looks like the values in it are only 16 bits, so try it both as int and as (u)int16_t. The former might be faster because aligned int access is fast. The latter might be faster because more of the array fits in cache at any one time.
There are some higher-level things you could consider with some early analysis of smaller: for example if it's 0 then you needn't examine matBigger for that value of ii and jj, because num & 0xff < 0 is always false.
To do better than "guess things and see whether they're faster or not" you need to know for starters which line is hottest, which means you need a profiler.
Some basic advice:
Profile it, so you can learn where the hot-spots are.
Think about cache locality, and the addresses resulting from your loop order.
Use more const in the innermost scope, to hint more to the compiler.
Try breaking it up so you don't compute high if the low test is failing.
Try maintaining the offset into matBigger and matSmaller explicitly, to the innermost stepping into a simple increment.
Best thing ist to understand what the code is supposed to do, then check whether another algorithm exists for this problem.
Apart from that:
if you are just interested if a matching entry exists, make sure to break out of all 3 loops at the position of // match found.
make sure the data is stored in an optimal way. It all depends on your problem, but i.e. it could be more efficient to have just one array of size 5000*5000*20 and overload operator()(int,int,int) for accessing elements.
What are matSmaller and matBigger?
Try changing them to matBigger[i+ii * COL_COUNT + j+jj]
I agree with Steve about rearranging your loops to have the higher count as the inner loop. Since your code is only doing loads and compares, I believe a significant portion of the time is used for pointer arithmetic. Try an experiment to change Steve's answer into this:
for (int ii = 0; ii < 20; ii++)
{
for (int jj = 0; jj < 20; jj++)
{
int smaller = matSmaller[ii][jj];
for (int i = 0; i < 5000; i++)
{
int *pI = &matBigger[i+ii][jj];
for (int j = 0; j < 5000; j++)
{
int num = *pI++;
int low = num & 0xff;
if (low < smaller && smaller > (num >> 8)) {
// match found
} // for j
} // for i
} // for jj
} // for ii
Even in 64-bit mode, the C compiler doesn't necessarily do a great job of keeping everything in register. By changing the array access to be a simple pointer increment, you'll make the compiler's job easier to produce efficient code.
Edit: I just noticed #unwind suggested basically the same thing. Another issue to consider is the statistics of your comparison. Is the low or high comparison more probable? Arrange the conditional statement so that the less probable test is first.
Looks like there is a lot of repetition here. One optimization is to reduce the amount of duplicate effort. Using pen and paper, I'm showing the matBigger "i" index iterating as:
[0 + 0], [0 + 1], [0 + 2], ..., [0 + 19],
[1 + 0], [1 + 1], ..., [1 + 18], [1 + 19]
[2 + 0], ..., [2 + 17], [2 + 18], [2 + 19]
As you can see there are locations that are accessed many times.
Also, multiplying the iteration counts indicate that the inner content is accessed: 20 * 20 * 5000 * 5000, or 10000000000 (10E+9) times. That's a lot!
So rather than trying to speed up the execution of 10E9 instructions (such as execution (pipeline) cache or data cache optimization), try reducing the number of iterations.
The code is searcing the matrix for a number that is within a range: larger than a minimal value and less than the maximum range value.
Based on this, try a different approach:
Find and remember all coordinates where the search value is greater
than the low value. Let us call these anchor points.
For each anchor point, find the coordinates of the first value after
the anchor point that is outside the range.
The objective is to reduce the number of duplicate accesses. Anchor points allow for a one pass scan and allow other decisions such as finding a range or determining an MxN matrix that contains the anchor value.
Another idea is to create new data structures containing the matBigger and matSmaller that are more optimized for searching.
For example, create a {value, coordinate list} entry for each unique value in matSmaller:
Value coordinate list
26 -> (2,3), (6,5), ..., (1007, 75)
31 -> (4,7), (2634, 5), ...
Now you can use this data structure to find values in matSmaller and immediately know their locations. So you could search matBigger for each unique value in this data structure. This again reduces the number of access to the matrices.
I have a problem. I'm working on a task that tries to find a matrix (vector) inside another matrix(vector) and the size of the matrices are:
Massive Matrix: 1024x768
Small Matrix: 36x49
Basically, my theory was to split the massive matrix into blocks that were the size of the small matrix thus meaning I was able to just see whether the small matrix exists in which block and then output the block. However, it just will not split equally but I need a way to determine if the small matrix does actually exist in the massive matrix.
As an example, I'll use test data:
M1 =
0 1 0 0
1 1 1 1
0 0 0 0
1 0 1 1
M2 =
0 1
1 1
And then I would split the matrices into blocks of 2x2 and then check that way. This is simple as I'm only working with a small matrix AND the matrix can be split equally, whereas the problem above is a lot more complex to understand and figure out.
In essence, I need to be able to split the (1024x768) into block sizes of (36x49) so then I can do the check to determine where that particular matrix is. I have been working with this algorithm:
// Assume:
// matrix1ColSize = 768
// matrix2ColSize = 49
const int ROW_BOUNDS = matrix1.size() - matrix2.size();
const int COL_BOUNDS = matrix1ColSize - matrix2ColSize;
bool found = false;
for(int i=0; (i < ROW_BOUNDS); i++)
{
bool matchFound = false;
for(int j=0; (j < COL_BOUNDS); j++) {
// logic here
}
cout << endl;
}
Could anyone offer any advice please? This is really annoying me now :(!
Two matrices are the same if all their elements are the same. So the following pseudo-code compares the small matrix with a block in the large matrix:
Initialize result to "true"
For each position in the small matrix
Read the value from the large matrix; call it x1
Read the value from the small matrix; call it x2
If x1 is not equal to x2, set result to "false"
(Optional) If x1 is not equal to x2, stop looking at other positions
Here, use the result
This logic is going to be inside your 2 nested loops, so you will have 4 nested loops there! If you fear of getting confused, put the implementation inside a function. If you want to use 4 nested loops, good luck.
In c++:
bool is_equal = true;
for (int y = 0; y < 49; ++y)
{
for (int x = 0; x < 36; ++x)
{
if (matrix1.at(j + x, i + y) != matrix2.at(x, y))
{
is_equal = false;
goto DONE; // optional
}
}
}
DONE:;
Edit: this code assumes using a custom class for matrices; after looking again at your code i realize that you probably use a vector of vectors (std::vector<std::vector<int>>), so use matrix2[y][x] instead of matrix2.at(x, y).