How to efficiently change a contiguous portion of a matrix? - c++

Given a matrix of M rows and N columns, and allocated as a byte array of M*N elements (these elements are initially set to zero), I would modify this matrix in according to the following rule: the elements that are found in the neighborhood of a certain element must be set to a given value. In other words, given a matrix, I should set a region of the matrix: for this purpose I should access not contiguous portion of the array.
In order to perform the above operation, I have access to the following information:
the pointer to the element that is located in the center of the neighborhood (this pointer must not be changed during the above operation); the position (row and column) of this element is also provided;
the size L*L of the neighborhood (L is always an odd number).
The code that implements this operation should be executed as fast as possible in C++: for this reason I thought of using the above pointer to access different pieces of the array. Instead, the position (row and column) of the central element of the neighborhood could allow me to check whether the specified region exceeds the dimensions of the matrix (for example, the center of the region may be located on the edge of the matrix): in this case I should set only that part of the region that is located in the matrix.
int M = ... // number of matrix rows
int N = ... // number of matrix columns
char* centerPtr = ... // pointer to the center of the region
int i = ... // position of the central element
int j = ... // of the region to be modified
char* tempPtr = centerPtr - (N+1)*L/2;
for(int k=0; k < L; k++)
{
memset(tempPtr,value,N);
tempPtr += N;
}
How can I improve the code?
How to handle the fact that one region may exceeds the dimensions of a matrix?
How to make the code more efficient with respect to the execution time?

Your code is probably optimal for the general case where the region does not overlap the outside of the matrix. The main efficiency problem you can cause with this kind of code is to make the outer loop over columns instead of rows. This destroys cache and paging performance. You haven't done that.
Using pointers has little or no speed advantage with most modern compilers. Optimizers will come up with very good pointer code from normal array indices. In some cases I've seen array index code run substantially faster than hand-tweaked pointer code for the same thing. So don't use pointer arithmetic if index arithmetic is clearer.
There are 8 boundary cases: north, northwest, west, ..., northeast. Each of these will need a custom version of your loop to touch the right elements. I'll show the northwest case and let you work out the rest.
The fastest possible way to handle the cases is a 3-level "if" tree:
if (j < L/2) { // northwest, west, or southwest
if (i < L/2) {
// northwest
char* tempPtr = centerPtr - (L/2 - i) * N - (L/2 - j);
for(int k = 0; k < L; k++) {
memset(tempPtr, value, L - j);
tempPtr += N;
}
} else if (i >= M - L/2) {
// southwest
} else {
// west
}
} else if (j >= N - L/2) { // symmetrical cases for east.
if (i < L/2) {
// northeast
} else if (i >= M - L/2) {
// southeast
} else {
// east
}
} else {
if (i < L/2) {
// north
} else if (i >= M - L/2) {
// south
} else {
// no overlap
}
}
It's tedious to do it like this, but you'll have no more than 3 comparisons per region.

Related

Tallest tower with stacked boxes in the given order

Given N boxes. How can i find the tallest tower made with them in the given order ? (Given order means that the first box must be at the base of the tower and so on). All boxes must be used to make a valid tower.
It is possible to rotate the box on any axis in a way that any of its 6 faces gets parallel to the ground, however the perimeter of such face must be completely restrained inside the perimeter of the superior face of the box below it. In the case of the first box it is possible to choose any face, because the ground is big enough.
To solve this problem i've tried the following:
- Firstly the code generates the rotations for each rectangle (just a permutation of the dimensions)
- secondly constructing a dynamic programming solution for each box and each possible rotation
- finally search for the highest tower made (in the dp table)
But my algorithm is taking wrong answer in unknown test cases. What is wrong with it ? Dynamic programming is the best approach to solve this problem ?
Here is my code:
#include <cstdio>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <cstring>
struct rectangle{
int coords[3];
rectangle(){ coords[0] = coords[1] = coords[2] = 0; }
rectangle(int a, int b, int c){coords[0] = a; coords[1] = b; coords[2] = c; }
};
bool canStack(rectangle &current_rectangle, rectangle &last_rectangle){
for (int i = 0; i < 2; ++i)
if(current_rectangle.coords[i] > last_rectangle.coords[i])
return false;
return true;
}
//six is the number of rotations for each rectangle
int dp(std::vector< std::vector<rectangle> > &v){
int memoization[6][v.size()];
memset(memoization, -1, sizeof(memoization));
//all rotations of the first rectangle can be used
for (int i = 0; i < 6; ++i) {
memoization[i][0] = v[0][i].coords[2];
}
//for each rectangle
for (int i = 1; i < v.size(); ++i) {
//for each possible permutation of the current rectangle
for (int j = 0; j < 6; ++j) {
//for each permutation of the previous rectangle
for (int k = 0; k < 6; ++k) {
rectangle &prev = v[i - 1][k];
rectangle &curr = v[i][j];
//is possible to put the current rectangle with the previous rectangle ?
if( canStack(curr, prev) ) {
memoization[j][i] = std::max(memoization[j][i], curr.coords[2] + memoization[k][i-1]);
}
}
}
}
//what is the best solution ?
int ret = -1;
for (int i = 0; i < 6; ++i) {
ret = std::max(memoization[i][v.size()-1], ret);
}
return ret;
}
int main ( void ) {
int n;
scanf("%d", &n);
std::vector< std::vector<rectangle> > v(n);
for (int i = 0; i < n; ++i) {
rectangle r;
scanf("%d %d %d", &r.coords[0], &r.coords[1], &r.coords[2]);
//generate all rotations with the given rectangle (all combinations of the coordinates)
for (int j = 0; j < 3; ++j)
for (int k = 0; k < 3; ++k)
if(j != k) //micro optimization disease
for (int l = 0; l < 3; ++l)
if(l != j && l != k)
v[i].push_back( rectangle(r.coords[j], r.coords[k], r.coords[l]) );
}
printf("%d\n", dp(v));
}
Input Description
A test case starts with an integer N, representing the number of boxes (1 ≤ N ≤ 10^5).
Following there will be N rows, each containing three integers, A, B and C, representing the dimensions of the boxes (1 ≤ A, B, C ≤ 10^4).
Output Description
Print one row containing one integer, representing the maximum height of the stack if it’s possible to pile all the N boxes, or -1 otherwise.
Sample Input
2
5 2 2
1 3 4
Sample Output
6
Sample image for the given input and output.
Usually you're given the test case that made you fail. Otherwise, finding the problem is a lot harder.
You can always approach it from a different angle! I'm going to leave out the boring parts that are easily replicated.
struct Box { unsigned int dim[3]; };
Box will store the dimensions of each... box. When it comes time to read the dimensions, it needs to be sorted so that dim[0] >= dim[1] >= dim[2].
The idea is to loop and read the next box each iteration. It then compares the second largest dimension of the new box with the second largest dimension of the last box, and same with the third largest. If in either case the newer box is larger, it adjusts the older box to compare the first largest and third largest dimension. If that fails too, then the first and second largest. This way, it always prefers using a larger dimension as the vertical one.
If it had to rotate a box, it goes to the next box down and checks that the rotation doesn't need to be adjusted there too. It continues until there are no more boxes or it didn't need to rotate the next box. If at any time, all three rotations for a box failed to make it large enough, it stops because there is no solution.
Once all the boxes are in place, it just sums up each one's vertical dimension.
int main()
{
unsigned int size; //num boxes
std::cin >> size;
std::vector<Box> boxes(size); //all boxes
std::vector<unsigned char> pos(size, 0); //index of vertical dimension
//gets the index of dimension that isn't vertical
//largest indicates if it should pick the larger or smaller one
auto get = [](unsigned char x, bool largest) { if (largest) return x == 0 ? 1 : 0; return x == 2 ? 1 : 2; };
//check will compare the dimensions of two boxes and return true if the smaller one is under the larger one
auto check = [&boxes, &pos, &get](unsigned int x, bool largest) { return boxes[x - 1].dim[get(pos[x - 1], largest)] < boxes[x].dim[get(pos[x], largest)]; };
unsigned int x = 0, y; //indexing variables
unsigned char change; //detects box rotation change
bool fail = false; //if it cannot be solved
for (x = 0; x < size && !fail; ++x)
{
//read in the next three dimensions
//make sure dim[0] >= dim[1] >= dim[2]
//simple enough to write
//mine was too ugly and I didn't want to be embarrassed
y = x;
while (y && !fail) //when y == 0, no more boxes to check
{
change = pos[y - 1];
while (check(y, true) || check(y, false)) //while invalid rotation
{
if (++pos[y - 1] == 3) //rotate, when pos == 3, no solution
{
fail = true;
break;
}
}
if (change != pos[y - 1]) //if rotated box
--y;
else
break;
}
}
if (fail)
{
std::cout << -1;
}
else
{
unsigned long long max = 0;
for (x = 0; x < size; ++x)
max += boxes[x].dim[pos[x]];
std::cout << max;
}
return 0;
}
It works for the test cases I've written, but given that I don't know what caused yours to fail, I can't tell you what mine does differently (assuming it also doesn't fail your test conditions).
If you are allowed, this problem might benefit from a tree data structure.
First, define the three possible cases of block:
1) Cube - there is only one possible option for orientation, since every orientation results in the same height (applied toward total height) and the same footprint (applied to the restriction that the footprint of each block is completely contained by the block below it).
2) Square Rectangle - there are three possible orientations for this rectangle with two equal dimensions (for examples, a 4x4x1 or a 4x4x7 would both fit this).
3) All Different Dimensions - there are six possible orientations for this shape, where each side is different from the rest.
For the first box, choose how many orientations its shape allows, and create corresponding nodes at the first level (a root node with zero height will allow using simple binary trees, rather than requiring a more complicated type of tree that allows multiple elements within each node). Then, for each orientation, choose how many orientations the next box allows but only create nodes for those that are valid for the given orientation of the current box. If no orientations are possible given the orientation of the current box, remove that entire unique branch of orientations (the first parent node with multiple valid orientations will have one orientation removed by this pruning, but that parent node and all of its ancestors will be preserved otherwise).
By doing this, you can check for sets of boxes that have no solution by checking whether there are any elements below the root node, since an empty tree indicates that all possible orientations have been pruned away by invalid combinations.
If the tree is not empty, then just walk the tree to find the highest sum of heights within each branch of the tree, recursively up the tree to the root - the sum value is your maximum height, such as the following pseudocode:
std::size_t maximum_height() const{
if(leftnode == nullptr || rightnode == nullptr)
return this_node_box_height;
else{
auto leftheight = leftnode->maximum_height() + this_node_box_height;
auto rightheight = rightnode->maximum_height() + this_node_box_height;
if(leftheight >= rightheight)
return leftheight;
else
return rightheight;
}
}
The benefits of using a tree data structure are
1) You will greatly reduce the number of possible combinations you have to store and check, because in a tree, the invalid orientations will be eliminated at the earliest possible point - for example, using your 2x2x5 first box, with three possible orientations (as a Square Rectangle), only two orientations are possible because there is no possible way to orient it on its 2x2 end and still fit the 4x3x1 block on it. If on average only two orientations are possible for each block, you will need a much smaller number of nodes than if you compute every possible orientation and then filter them as a second step.
2) Detecting sets of blocks where there is no solution is much easier, because the data structure will only contain valid combinations.
3) Working with the finished tree will be much easier - for example, to find the sequence of orientations of the highest, rather than just the actual height, you could pass an empty std::vector to a modified highest() implementation, and let it append the actual orientation of each highest node as it walks the tree, in addition to returning the height.

How can I optimize this function which handles large c++ vectors?

According to Visual Studio's performance analyzer, the following function is consuming what seems to me to be an abnormally large amount of processor power, seeing as all it does is add between 1 and 3 numbers from several vectors and store the result in one of those vectors.
//Relevant class members:
//vector<double> cache (~80,000);
//int inputSize;
//Notes:
//RealFFT::real is a typedef for POD double.
//RealFFT::RealSet is a wrapper class for a c-style array of RealFFT::real.
//This is because of the FFT library I'm using (FFTW).
//It's bracket operator is overloaded to return a const reference to the appropriate array element
vector<RealFFT::real> Convolver::store(vector<RealFFT::RealSet>& data)
{
int cr = inputSize; //'cache' read position
int cw = 0; //'cache' write position
int di = 0; //index within 'data' vector (ex. data[di])
int bi = 0; //index within 'data' element (ex. data[di][bi])
int blockSize = irBlockSize();
int dataSize = data.size();
int cacheSize = cache.size();
//Basically, this takes the existing values in 'cache', sums them with the
//values in 'data' at the appropriate positions, and stores them back in
//the cache at a new position.
while (cw < cacheSize)
{
int n = 0;
if (di < dataSize)
n = data[di][bi];
if (di > 0 && bi < inputSize)
n += data[di - 1][blockSize + bi];
if (++bi == blockSize)
{
di++;
bi = 0;
}
if (cr < cacheSize)
n += cache[cr++];
cache[cw++] = n;
}
//Take the first 'inputSize' number of values and return them to a new vector.
return Common::vecTake<RealFFT::real>(inputSize, cache, 0);
}
Granted, the vectors in question have sizes of around 80,000 items, but by comparison, a function which multiplies similar vectors of complex numbers (complex multiplication requires 4 real multiplications and 2 additions each) consumes about 1/3 the processor power.
Perhaps it has something to with the fact it has to jump around within the vectors rather then just accessing them linearly? I really have no idea though. Any thoughts on how this could be optimized?
Edit: I should mention I also tried writing the function to access each vector linearly, but this requires more total iterations and actually the performance was worse that way.
Turn on compiler optimization as appropriate. A guide for MSVC is here:
http://msdn.microsoft.com/en-us/library/k1ack8f1.aspx

Algorithm for slicing planes (in place) out of an array of RGB values

I've got a flat array of byte RGB values that goes R1 G1 B1 R2 G2 B2 R3 G3 B3 ... Rn Gn Bn. So my data looks like:
char imageData[WIDTH * HEIGHT * 3];
But I want to pass a WIDTH*HEIGHT array to an existing C library that expects a single plane of this data. That would be a sequence of just the R values (or just the G, or just the B).
It's easy enough to allocate a new array and copy the data (duh). But the images are very large. If it weren't a C library but took some kind of iteration interface to finesse the "slicing" traversal, that would be great. But I can't edit the code I'm calling...it wants a plain old pointer to a block of sequential memory.
HOWEVER I have write access to this array. It is viable to create a routine that would sort it into color planes. I'd also need a reverse transformation that would put it back, but by definition the same method that sorted it into planes could be applied to unsort it.
How efficiently can I (in place) turn this array into R1 R2 R3 ... Rn G1 G2 G3 ... Gn B1 B2 B3 ... Bn and then back again? Any non-naive algorithms?
If you only need one plane, this seems pretty easy. If you need all 3 you will probably have better luck with a more sophisticated algorithm.
void PlanarizeR(char * imageData, int width, int height)
{
char *in = imageData;
int pixelCount = width * height;
for (int i = 0; i < pixelCount; ++i, in+=3)
std::swap(*in, imageData[i])
}
It shouldn't be too hard to run the loop backwards from high to low to reverse the process.
This paper "A Simple In-Place Algorithm for In-Shuffle" describes how to transpose matrix of 2*N and gives a hint how to do it for other cases, so 3*N may be also possible. This answer to other question shows that it is indeed possible.
Or use an algorithm which writes each value to its transposed place, then does the same for the value from that place, and so on until cycle is connected. Flag processed values in a bit vector. And continue until this vector is all 1s.
Both algorithms are not cache-friendly. Probably some clever use of PREFETCH instruction can improve this.
Edit:
C++, RGB to single planes, not optimized:
#include <iostream>
#include <bitset>
#include <vector>
enum {N = 8};
void transpose(std::vector<char>& a)
{
std::bitset<3*N> b;
for (int i = 1; i < 3*N; ++i)
{
if (b[i])
continue;
int ptr = i;
int next;
char nextVal = a[i];
do {
next = ptr/3 + N*(ptr%3);
char prevVal = nextVal;
nextVal = a[next];
a[next] = prevVal;
ptr = next;
b[ptr] = true;
}
while (ptr != i);
}
}
int main()
{
std::vector<char> a(3*N);
for (int i = 0; i != 3*N; ++i)
a[i] = i;
transpose(a);
for (int i = 0; i != 3*N; ++i)
std::cout << (int)a[i] << std::endl;
return 0;
}
My original intent is to use a bit vector of size WIDTHHEIGHT, which gives overhead of WIDTHHEIGHT/8. But it is always possible to sacrifice speed for space. The bit vector may be of size WIDTH or HEIGHT or any desirable value, even 0. The trick is to maintain a pointer to the cell, before which all values are transposed. The bit vector is for cells, starting from this pointer. After it is all 1s, It is moved to next position, then all the algorithm steps are performed except actual data movement. And the bit vector is ready to continue transposing. This variant is O(N^2) instead of O(N).
Edit2:
PREFITCH optimization is not difficult to implement: just calculate indexes, invoke PREFETCH, and put indexes to a queue (ringbuffer), then get indexes from the queue and move data.
Edit3:
The idea of other algorithm, which is O(1) size, O(N*log(N)) time, is cache-friendly and may be faster than "cycle" algorithm (for image sizes < 1Gb):
Split N*3 matrix to several 3*3 matrices of char and transpose them
Split the result to 3*3 matrices of char[3] and transpose them
Continue while matrices size is less than the array size
Now we have up to 3*2*log3(N) ordered pieces. Join them.
First join pieces of equal size. Very simple "cycles" of length 4 may be used.
Join unequal-sized pieces with reverse(reverse(a), reverse(b))
char *imageData = malloc (WIDTH * HEIGHT * 3 * sizeof(char));
this function do this R1 R2 R3 ... Rn G1 G2 G3 ... Gn B1 B2 B3 ... Bn,,
char *toRGB(char *imageData, int WIDTH, int HEIGHT){
int len = WIDTH * HEIGHT;
char *RGB = malloc (len * sizeof(char));
int i, j = 0,flag = 0;
for(i=0 ; i<=len ; i++, j+=3){
if(j<len)
RGB[i] = imageData[j];
else{
switch(flag){
case 0: j=-2; flag=1; break; // j=-2 the next iteration will start at j=1
case 1: j=-1; break; // j=-1 the next iteration will start at j=2
}
}
}
return RGB;
}

C++ Mark for contiguous sections in a 3D array of objects

If we have a 3x3x3 array of objects, which contain two members: a boolean, and an integer; can anyone suggest an efficient way of marking this array in to contiguous chunks, based on the boolean value.
For example, if we picture it as a Rubix cube, and a middle slice was missing (everything on 1,x,x == false), could we mark the two outer slices as separate groups, by way of a unique group identifier on the int member.
The same needs to apply if the "slice" goes through 90 degrees, leaving an L shape and a strip.
Could it be done with very large 3D arrays using recursion? Could it be threaded.
I've hit the ground typing a few times so far but have ended up in a few dead ends and stack overflows.
Very grateful for any help, thanks.
It could be done that way:
struct A {int m_i; bool m_b;};
enum {ELimit = 3};
int neighbour_offsets_positive[3] = {1, ELimit, ELimit*ELimit};
A cube[ELimit][ELimit][ELimit];
A * first = &cube[0][0][0];
A * last = &cube[ELimit-1][ELimit-1][ELimit-1];
// Init 'cube'.
for(A * it = first; it <= last; ++it)
it->m_i = 0, it->m_b = true;
// Slice.
for(int i = 0; i != ELimit; ++i)
for(int j = 0; j != ELimit; ++j)
cube[1][i][j].m_b = false;
// Assign unique ids to coherent parts.
int id = 0;
for(A * it = first; it <= last; ++it)
{
if (it->m_b == false)
continue;
if (it->m_i == 0)
it->m_i = ++id;
for (int k = 0; k != 3; ++k)
{
A * neighbour = it + neighbour_offsets_positive[k];
if (neighbour <= last)
if (neighbour->m_b == true)
neighbour->m_i = it->m_i;
}
}
If I understand the term "contiguous chunk" correctly, i.e the maximal set of all those array elements for which there is a path from each vertex to all other vertices and they all share the same boolean value, then this is a problem of finding connected components in a graph which can be done with a simple DFS. Imagine that each array element is a vertex, and two vertices are connected if and only if 1) they share the same boolean value 2) they differ only by one coordinate and that difference is 1 by absolute value (i.e. they are adjacent)

On-the-fly terrain chunk generation

I'm writing an engine that can generate landscapes using noise functions, and load in new chunks as the player moves around the terrain. I spent the best part of two days figuring out how to place these chunks in the right position, so they don't overlap or get placed on top of existing chunks. It works well functionally, but there is a massive performance hit the further away you generate the chunks from the player (e.g. if you generate in a 3 chunk radius around the player, it's lighting fast, but if you increase that to a radius of 20 chunks it slows down very fast).
I know exactly why that is, but I can't think of any other way to do this. Before I go any further, here's the code I'm currently using, hopefully it's commented well enough to understand:
// Get the player's position rounded to the nearest chunk on the grid.
D3DXVECTOR3 roundedPlayerPos(SnapToMultiple(m_Dx->m_Camera->GetPosition().x, CHUNK_X), 0, SnapToMultiple(m_Dx->m_Camera->GetPosition().z, CHUNK_Z));
// Iterate through every point on an invisible grid. At each point, check if it is
// inside a circle the size of the grid (so we generate chunks in a circle around
// the player, not a square). At each point that is inside the circle, add a chunk to
// the ChunksToAdd vector.
for (int x = -CHUNK_RANGE-1; x <= CHUNK_RANGE; x++)
{
for (int z = -CHUNK_RANGE-1; z <= CHUNK_RANGE; z++)
{
if (IsInside(roundedPlayerPos, CHUNK_X*CHUNK_RANGE, D3DXVECTOR3(roundedPlayerPos.x+x*CHUNK_X, 0, roundedPlayerPos.z+z*CHUNK_Z)))
{
Chunk chunkToAdd;
chunkToAdd.chunk = 0;
chunkToAdd.position = D3DXVECTOR3((roundedPlayerPos.x + x*CHUNK_X), 0, (roundedPlayerPos.z + z*CHUNK_Z));
chunkToAdd.chunkExists = false;
m_ChunksToAdd.push_back(chunkToAdd);
}
}
}
// Iterate through the ChunksToAdd vector. For each chunk in this vector, compare it's
// position to every chunk in the Chunks vector (which stores each generated chunk).
// If the statement returns true, then there is already a chunk at that location, and
// we don't need to generate another.
for (i = 0; i < m_ChunksToAdd.size(); i++)
{
for (int j = 0; j < m_Chunks.size(); j++)
{
// Check the chunk in the ChunksToAdd vector with the chunk in the Chunks vector (chunks which are already generated).
if (m_ChunksToAdd[i].position.x == m_Chunks[j].position.x && m_ChunksToAdd[i].position.z == m_Chunks[j].position.z)
{
m_ChunksToAdd[i].chunkExists = true;
}
}
}
// Determine the closest chunk to the player, so we can generate that first.
// Iterate through the ChunksToAdd vector, and if the vector doesn't exist (if it
// does exist, we're not going to generate it so ignore it), compare the current (i)
// chunk against the current closest chunk. If it is larger, move on, and if it is
// smaller, store it's position as the new smallest chunk.
int closest = 0;
for (j = 0; j < m_ChunksToAdd.size(); j++)
{
if (!m_ChunksToAdd[j].chunkExists)
{
// Get the distance from the player to the chunk for the current closest chunk, and
// the chunk being tested.
float x1 = ABS(DistanceFrom(roundedPlayerPos, m_ChunksToAdd[j].position));
float x2 = ABS(DistanceFrom(roundedPlayerPos, m_ChunksToAdd[closest].position));
// If the chunk being tested is closer to the player, make it the new closest chunk.
if (x1 <= x2)
closest = j;
}
}
// After determining the position of the closest chunk, generate the volume and mesh, and add it
// to the Chunks vector for rendering.
if (!m_ChunksToAdd[closest].chunkExists) // Only add it if the chunk doesn't already exist in the Chunks vector.
{
Chunk chunk;
chunk.chunk = new chunkClass;
chunk.chunk->m_Position = m_ChunksToAdd[closest].position;
chunk.chunk->GenerateVolume(m_Simplex);
chunk.chunk->GenerateMesh(m_Dx->GetDevice());
chunk.position = m_ChunksToAdd[closest].position;
chunk.chunkExists = true;
m_Chunks.push_back(chunk);
}
// Clear the ChunksToAdd vector ready for another frame.
m_ChunksToAdd.clear();
(if it wasn't already obvious, this is run every frame.)
The problem area is to do with the CHUNK_RANGE variable. The larger this value, the more the first two loops are iterated through each frame, slowing the whole thing down tremendously. I need some advice or suggestions on how to do this more efficiently, thanks.
EDIT: Here's some improved code:
// Get the player's position rounded to the nearest chunk on the grid.
D3DXVECTOR3 roundedPlayerPos(SnapToMultiple(m_Dx->m_Camera->GetPosition().x, CHUNK_X), 0, SnapToMultiple(m_Dx->m_Camera->GetPosition().z, CHUNK_Z));
// Find if the player has changed into another chunk, if they have, we will scan
// to see if more chunks need to be generated.
static D3DXVECTOR3 roundedPlayerPosOld = roundedPlayerPos;
static bool playerPosChanged = true;
if (roundedPlayerPosOld != roundedPlayerPos)
{
roundedPlayerPosOld = roundedPlayerPos;
playerPosChanged = true;
}
// Iterate through every point on an invisible grid. At each point, check if it is
// inside a circle the size of the grid (so we generate chunks in a circle around
// the player, not a square). At each point that is inside the circle, add a chunk to
// the ChunksToAdd vector.
if (playerPosChanged)
{
m_ChunksToAdd.clear();
for (int x = -CHUNK_CREATE_RANGE-1; x <= CHUNK_CREATE_RANGE; x++)
{
for (int z = -CHUNK_CREATE_RANGE-1; z <= CHUNK_CREATE_RANGE; z++)
{
if (IsInside(roundedPlayerPos, CHUNK_X*CHUNK_CREATE_RANGE, D3DXVECTOR3(roundedPlayerPos.x+x*CHUNK_X, 0, roundedPlayerPos.z+z*CHUNK_Z)))
{
bool chunkExists = false;
for (int j = 0; j < m_Chunks.size(); j++)
{
// Check the chunk in the ChunksToAdd vector with the chunk in the Chunks vector (chunks which are already generated).
if ((roundedPlayerPos.x + x*CHUNK_X) == m_Chunks[j].position.x && (roundedPlayerPos.z + z*CHUNK_Z) == m_Chunks[j].position.z)
{
chunkExists = true;
break;
}
}
if (!chunkExists)
{
Chunk chunkToAdd;
chunkToAdd.chunk = 0;
chunkToAdd.position = D3DXVECTOR3((roundedPlayerPos.x + x*CHUNK_X), 0, (roundedPlayerPos.z + z*CHUNK_Z));
m_ChunksToAdd.push_back(chunkToAdd);
}
}
}
}
}
playerPosChanged = false;
// If there are chunks to render.
if (m_ChunksToAdd.size() > 0)
{
// Determine the closest chunk to the player, so we can generate that first.
// Iterate through the ChunksToAdd vector, and if the vector doesn't exist (if it
// does exist, we're not going to generate it so ignore it), compare the current (i)
// chunk against the current closest chunk. If it is larger, move on, and if it is
// smaller, store it's position as the new smallest chunk.
int closest = 0;
for (j = 0; j < m_ChunksToAdd.size(); j++)
{
// Get the distance from the player to the chunk for the current closest chunk, and
// the chunk being tested.
float x1 = ABS(DistanceFrom(roundedPlayerPos, m_ChunksToAdd[j].position));
float x2 = ABS(DistanceFrom(roundedPlayerPos, m_ChunksToAdd[closest].position));
// If the chunk being tested is closer to the player, make it the new closest chunk.
if (x1 <= x2)
closest = j;
}
// After determining the position of the closest chunk, generate the volume and mesh, and add it
// to the Chunks vector for rendering.
Chunk chunk;
chunk.chunk = new chunkClass;
chunk.chunk->m_Position = m_ChunksToAdd[closest].position;
chunk.chunk->GenerateVolume(m_Simplex);
chunk.chunk->GenerateMesh(m_Dx->GetDevice());
chunk.position = m_ChunksToAdd[closest].position;
m_Chunks.push_back(chunk);
m_ChunksToAdd.erase(m_ChunksToAdd.begin()+closest);
}
// Remove chunks that are far away from the player.
for (i = 0; i < m_Chunks.size(); i++)
{
if (DistanceFrom(roundedPlayerPos, m_Chunks[i].position) > (CHUNK_REMOVE_RANGE*CHUNK_X)*(CHUNK_REMOVE_RANGE*CHUNK_X))
{
m_Chunks[i].chunk->Shutdown();
delete m_Chunks[i].chunk;
m_Chunks[i].chunk = 0;
m_Chunks.erase(m_Chunks.begin()+i);
}
}
Have you tried profiling it to work out exactly where the bottleneck is?
Do you need to check all of those chunks or could you get away with checking the direction the player is looking and only generate the ones in view?
Is there any reason why you draw the chunk closest to the player first if you're generating it all once per frame before displaying it? Skipping the stage where you sort them may free up a bit of processing power.
Is there any reason you couldn't combine the first two loops to just create a vector of chunks which need generating?
It sounds like you're trying to do too much work (i.e. building chunks) on the render thread. If you can do the work of a three chunk radius really fast you should limit it to that per frame. How many chunks are you trying to generate, in each situation, per frame?
I'm going to assume that generating each chunk is independent, therefore, you can probably move the work to another thread - then show the chunk when it is ready.