How Can I Make My Array Rotation More Efficient? - c++

How can I make my circular array rotation more efficient? I read in this thread about an excellent sorting algorithm, but it won't work for what I need because there are spaces at the end of the array that get sorted into the middle.
The rotation function needs to work for both left and right rotation. Not every space of the array will be filled.
void Quack::rotate(int r)
{
if(r > 0) //if r is positive, rotate left
{
for(int i = 0; i < r; i++)
items[(qBack + i) % qCapacity] = items[(qFront + i) % qCapacity];
//move items in array
}
else if(r < 0) //if r is negative, rotate right
{
for(int i = 0; i < (r * -1); i++)
items[(qFront - i - 1) % qCapacity] =
items[(qBack - i - 1) % qCapacity];
//move items in array
}
//if r = 0, nothing happens
//rotate front and back by r
qFront = (qFront + r) % qCapacity;
qBack = (qBack + r) % qCapacity;
}

I haven't used it, so I can't promise it will do everything you need. But you might want to look into simply replacing this function body with the std::rotate function.
It should already be well optimized, and will be much less likely to introduce bugs into your application.
http://www.sgi.com/tech/stl/rotate.html
If you want suggestions for optimization though, I recommend avoiding all modulo operations. They may require a divide, which is one of the most expensive operations you can perform on your processor. They are a convenient way to think about how to accomplish your goal, but could be very costly for your CPU to execute.
You can remove your modulo operators if you use two loops: one from the middle to the end, and the other from the beginning to the middle.
But if you can, see if you can avoid doing the rotation altogether. If you are careful you might be able to eliminate pointless full-array traversal/copy operations. See my comment on the OP for how to accomplish this.

Related

Chunks loading and sort

I working on clone of minecraft and i have 2 problem with chunk loading.
First: Determinate chunks to be loaded.
i found one way it's ugly but works fast for me
Define 3d array (array) (size : MAX_CHUNKS_X,MAX_CHUNKS_Y,MAX_CHUNKS_Z)
Fill 3d array with FALSE
While passing from list of chunks checking if chunk inside a vision range
if inside set array[chunk_x][chunk_y][chunk_z] = true;
After passing list begin bassing array
For all array[chunk_x][chunk_y][chunk_z] == false add to LoadingList chunk at chunk_x chunk_y chunk_z
Another ways to less ugly and still fast ?
Code:
ChunksRenderList.clear();
CChunk* Chunk = NULL;
s32 RootChunk_X_Location = (floor(RenderCenter.x) / CHUNK_SIZE);
s32 RootChunk_Y_Location = (floor(RenderCenter.y) / CHUNK_SIZE);
s32 RootChunk_Z_Location = (floor(RenderCenter.z) / CHUNK_SIZE);
if(RenderCenter.x < 0)
RootChunk_X_Location--;
if(RenderCenter.y < 0)
RootChunk_Y_Location--;
if(RenderCenter.z < 0)
RootChunk_Z_Location--;
core::vector3s RootChunkLocation(RootChunk_X_Location,RootChunk_Y_Location,RootChunk_Z_Location);
u32 XZ_ArraySide = (RenderDistance_XZ*2)+1;
u32 Y_ArraySide = (RenderDistance_Y*2)+1;
char array[XZ_ArraySide][Y_ArraySide][XZ_ArraySide];
memset(array,0,(XZ_ArraySide*XZ_ArraySide*Y_ArraySide));
for(auto it = Chunks.begin(); it != Chunks.end(); it++)
{
Chunk = (it->second);
if(Chunk->Locked)
continue;
if(Chunk->KeepAliveCounter <= 0)
{
ChunksUnloadList.push_back(Chunk);
continue;
}
else
{
Chunk->KeepAliveCounter -= WORLD_UPDATE_PERIOD;
Chunk->DistanceToCamera = RenderCenter.distance_to(Chunk->ChunkAbsolutePosition);
}
if(Chunk->ChunkPosition.x >= (RootChunk_X_Location - (s32)RenderDistance_XZ) && Chunk->ChunkPosition.x <= (RootChunk_X_Location + (s32)RenderDistance_XZ))
if(Chunk->ChunkPosition.y >= (RootChunk_Y_Location - (s32)RenderDistance_Y) && Chunk->ChunkPosition.y <= (RootChunk_Y_Location + (s32)RenderDistance_Y))
if(Chunk->ChunkPosition.z >= (RootChunk_Z_Location - (s32)RenderDistance_XZ) && Chunk->ChunkPosition.z <= (RootChunk_Z_Location + (s32)RenderDistance_XZ))
{
s32 PositionInMatrix_X = Chunk->ChunkPosition.x - (RootChunk_X_Location - (s32)RenderDistance_XZ);
s32 PositionInMatrix_Y = Chunk->ChunkPosition.y - (RootChunk_Y_Location - (s32)RenderDistance_Y);
s32 PositionInMatrix_Z = Chunk->ChunkPosition.z - (RootChunk_Z_Location - (s32)RenderDistance_XZ);
array[PositionInMatrix_X][PositionInMatrix_Y][PositionInMatrix_Z] = true;
Chunk->KeepAliveCounter = CHUNK_LIVE_TIME;
}
if(not Chunk->NeightboarsUpdated)
{
ChunksNeightboarUpdateList.push_back(Chunk);
}
if(not Chunk->ChunkUpdated)
{
ChunksRebuildList.push_back(Chunk);
}
if(not Chunk->Locked and Chunk->VisibleBlocks > 0)
{
ChunksRenderList.push_back(Chunk);
}
}
for(u32 y = 0; y < Y_ArraySide; y++)
for(u32 x = 0; x < XZ_ArraySide; x++)
for(u32 z = 0; z < XZ_ArraySide; z++)
{
s32 ChunkPosition_X = (s32)x + (RootChunk_X_Location - (s32)RenderDistance_XZ);
s32 ChunkPosition_Y = (s32)y + (RootChunk_Y_Location - (s32)RenderDistance_Y);
s32 ChunkPosition_Z = (s32)z + (RootChunk_Z_Location - (s32)RenderDistance_XZ);
if(array[x][y][z] == 0)
{
SPendingToLoad ToLoad;
ToLoad.Position.set(ChunkPosition_X,ChunkPosition_Y,ChunkPosition_Z);
ToLoad.DistanceToCamera = ToLoad.Position.distance_to_sqr(RootChunkLocation);
ChunksLoadList.push_back(ToLoad);
}
}
Second:
how to sort ChunksLoadList to take effect like left on this pic
https://www.dropbox.com/s/owjfaaekcj2m23w/58f2e4c8.png?dl=0
Red = nearest to ChunksLoadList.begin()
Blue = farest to ChunksLoadList.begin()
im try to use
ChunksLoadList.sort([&RootChunkLocation](SPendingToLoad& i,SPendingToLoad& j)
{
return i.DistanceToCamera < j.DistanceToCamera;
}
);
But it method to slow for big vision ranges...
How i must rewrite code to take fast wave-loading effect ?
Sorry me horrible english, i hope you understand me...
Lets first look at the distance sorting problem, if your ChunksLoadList is a std::list and not a std::vector or std::array(C++11) you have lost the performance race already! Bjarne Stroustrup: Why you should avoid Linked Lists Pay close attention to the graph!!!
If its still too slow after you've changed it into a std::vector you can try "this method I just invented(TM)"!
The best sorting algorithms are something like
O(C+K*N log log N) fastest?
With a horrible C constant prep time, horrible K per element and a very nice N log log N
For N -> infinity this gets to be O(N log log N)
BUT for this problem there is an even better algorithm!
Flood fill followed by an insertion sort, the flood fill produces a nearly sorted list in O(N) and the insertion sort secures the totally ordered list from the partially ordered in O(N) for a total of O(N) ...
O(C+K*N)
with a horrible constant prep time, and an awful per element but only N times
variant of wikipedia
Flood-fill (node, target-color, replacement-color):
If target-color is equal to replacement-color, return.
Set Q to the empty queue. [must be std::vector or std::array or this will fail]
Add camera node to the end of Q.
While Q is not empty:
Set n equal to the *first* element of Q.
Remove *first* element from Q.
If the color of n is equal to target-color:
Add n to the distance list as the next closed (this will be nearly correct)
Set the color of n to replacement-color and mark "n" as processed.
Add adjacent nodes to end of Q if they has not been processed yet. (x/y/z +1/-1)
Return.
Queue elements are x,y,z
use std::dequeue
The distance list must also be a random access contain, that is fully allocated from start of size (viewdistance*2+1)^3, that is potentially big.
If view distance 100 is 201^3 = ~80000000 voxels, did you really want this? if you need some info from it you must have some pointer or index, at least 4 bytes, this blows the cache away on most systems.
As a flood fill its not effective but as a approximation to distance it is.
You could stop here if your requirements are fulfilled.
IF you need total ordered then run an insertion-sort on the nearly sorted list O(N), but then you need to calculate the camera distance also.
Potential further optimization:
opaque voxels doesn't add neighbours that also are opaque.
air(totally transparent) doesn't add to the camera list but need to be there for the fill, in case a flying island is present.

Is there a more efficient way to do this algorithm?

To the best of my knowledge, this algorithm will search correctly and turn out true when it needs too. In class we are talking about Big O analysis so this assignment is to show how the recursive search is faster than an iterative search. The point is to search for a number such that A[i] = i (find an index that is the same as the number stored at the index). This algorithm versus an iterative one only varies by about 100 nanoseconds, but sometimes the iterative one is faster. I set up the vector in main using the rand() function. I run the two algorithms a million times and record the times. The question I am asking is, is this algorithm as efficient as possible or is there a better way to do it?
bool recursiveSearch(vector<int> &myList, int beginning, int end)
{
int mid = (beginning + end) / 2;
if (myList[beginning] == beginning) //check if the vector at "beginning" is
{ //equal to the value of "beginning"
return true;
}
else if (beginning == end) //when this is true, the recursive loop ends.
{ //when passed into the method: end = size - 1
return false;
}
else
{
return (recursiveSearch(myList, beginning, mid) || recursiveSearch(myList, mid + 1, end));
}
}
Edit: The list is pre-ordered before being passed in and a check is done in main to make sure that beginning and the end both exist
One possible "improvement" would be to not copy the vector in each recursion by passing a reference:
bool recursiveSearch(const vector<int>& myList, int beginning, int end)
Unless you know something special about the ordering of the data, there is absolutely no advantage to performing a partitioned search like this.
Indeed, your code is actually [trying] to do a linear search, so it is actually implementing a simple for loop with the cost of a lot of stack and overhead.
Note that there is a weirdness in your code: If the first element doesn't match, you will call recursiveSearch(myList, beginning /*=0*/, mid). Since we already know that element 0 doesn't match, you're going to subdivide again, but only after re-testing the element.
So given a vector of 6 elements that has no matches, you're going to call:
recursiveSearch(myList, 0, 6);
-> < recursiveSearch(myList, 0, 3) || recursiveSearch(myList, 4, 6); >
-> < recursiveSearch(myList, 0, 1) || recursiveSearch(2, 3) > < recursiveSearch(myList, 4, 5); || recursiveSearch(myList, 5, 6); >
-> < recursiveSearch(myList, 0, 0) || recursiveSearch(myList, 1, 1) > < recursiveSearch(myList, 2, 2) || recursiveSearch(myList, 3, 3) > ...
In the end, you're failing on a given index because you reached the condition where begin and end were both that value, that seems an expensive way of eliminating each node, and the end-result is not a partitioned search, it a simple linear search, you just use a lot of stack-depth to get there.
So, a simpler and faster way to do this would be:
for (size_t i = beginning; i < end; ++i) {
if (myList[i] != i)
continue;
return i;
}
Since we're trying to optimize here, it's worth pointing out that MSVC, GCC and Clang all assume that if expresses the likely case, so I'm optimizing here for the degenerate case where we have a large vector with no or late matches. In the case where we get lucky and we find a result early, then we're willing to pay the cost of a potential branch miss because we're leaving. I realize that the branch cache will soon figure this out for us, but again - optimizing ;-P
As others have pointed out, you could also benefit from not passing the vector by value (forcing a copy)
const std::vector<int>& myList
An obvious "improvement" would be to run threads on all the remaining cores. Simply divvy up the vector into number of cores - 1 pieces and use a condition variable to signal the main thread when found.
If you need to find an element in an unsorted array such that A[i] == i, then the only way to do it is to go through every element until you find one.
The simplest way to do this is like so:
bool find_index_matching_value(const std::vector<int>& v)
{
for (int i=0; i < v.size(); i++) {
if (v[i] == i)
return true;
}
return false; // no such element
}
This is O(n), and you're not going to be able to do any better than that algorithmically. So we have to turn our attention to micro-optimisations.
In general, I would be quite astonished if on modern machines, your recursive solution is faster in general than the simple solution above. While the compiler will (possibly) be able to remove the extra function call overhead (effectively turning your recursive solution into an iterative one), running through the array in order (as above) allows for optimal use of the cache, whereas, for large arrays, your partitioned search will not.

Game of life continues bounds

So I've been trying my hand at game of life and I noticed that the cells only stay confined within the grid that I've created. I want to try and make it continuous so if a cell reaches one side it will continue from the other side. Similar to the game pac-man when you leave from the left to come back into the game from the right side. Here is an image of how it would look as the cell moves out of bounds http://i.stack.imgur.com/dofv6.png
Here is the code that I have which confines everything. So How would I make it wrap back around?
int NeighborhoodSum(int i, int j) {
int sum = 0;
int k, l;
for (k=i-1;k<=i+1;k++) {
for (l=j-1;l<=j+1;l++) {
if (k>=0 && k<gn && l>=0 && l<gm && !(k==i && l==j)) {
sum+=current[k][l];
}
}
}
return sum;
}
Based on dshepherd suggestion this is what I have come up with.
if (!(k == i && l == j)) {
sum += current[k][l];
} else if (k == 1 || k == -1) { // rows
sum += current[k+1][l];
} else if (l == 1 || l == -1) { // columns
sum += current[k][l+1];
}
Start considering a one dimension array, of size ARRAY_SIZE.
What do you want that array to return when you ask for a cell of a negative index ? What about a for an index >= ARRAY_SIZE ? What operators does that make you think of (hint : <= 0, % ARRAY_SIZE, ...)
This will lead you to a more generic solution that dshepherd's one, for example if you want in the future to be able to specify life / death rules more than just one index around the cell.
Assuming a grid size of 0 to n-1 for x and 0 to m-1 for y (where n is the size of the x dimension and m is the size of the y dimension, what you want to do is check if the coordinates are in-range, and move accordingly. So (pseudocode):
// normal move calculation code here
if (x < 0) { x = n-1; }
if (x >= n) { x = 0; }
if (y < 0) { y = m-1; }
if (y >= m) { y = 0; }
// carry out actual move here
With the start position marked as red, you need to calculate a movement into, or a breeding into, a new square: you need to check for whether it would fall out of bounds. If it does then the new cell would be born in either of the orange positions, if not it could be born
in any of the blue positions:
Hope that helps:) Let me know if you need more information though:)
It looks like you are taking a summation over nearest neighbours, so all you need to do to make it wrap around is to extend the summation to include the cells on the other side if (i,j) is an edge cell.
You could do this fairly easily by adding else statements to the central if to check for the cases where l or k are -1 or gn/gm (i.e. just past the edges) and add the appropriate term from the cell on the opposite side.
Update:
What you've added in your edit is not what I meant, and I'm pretty sure it won't work. I think you need to carefully think through exactly what it is that the initial code does before you go any further. Maybe get some paper and do some example cases by hand?
More specific advice (but do what I said above before you try to use this):
You can't directly take current[k][l] if k or l are negative or greater than gn/gm respectively because there is no array entry with that index (the program should segfault). What you actually want it to do is use the 0th entry anywhere that it would normally use the gnth entry, and so on for all the other boundaries.
You will probably need to split the if statement into 5 parts not 3 because the cases for k < 0 and k > gn are different (similarly for l).
You are comparing against completely the wrong values with (k == 1 || k == -1) and similarly for l

C++ Adding big numbers together with operator overload

I am new to C++ and attempting to create a "BigInt" class. I decided to base most of the implementation on reading the numbers into vectors.
So far I have only written the copy constructor for an input string.
Largenum::Largenum(std::string input)
{
for (std::string::const_iterator it = input.begin(); it!=input.end(); ++it)
{
number.push_back(*it- '0');
}
}
The problem I am having is with the addition function. I have created a function which seems to work after I tested it a few times, but as you can see its highly inefficient. I have 2 different vectors such as:
std::vector<int> x = {1,3,4,5,9,1};
std::vector<int> y = {2,4,5,6};
The way I thought to solve this problem was to add 0s before the shorter, in this case y vector to make both vectors have the same size such as:
x = {1,3,4,5,9,1};
y = {0,0,2,4,5,6};
Then to add them using elementary style addition.
I don't want to add 0s infront of vector Y as it would be slow with a large number. My current solution is to reverse the vector, then push_back the appropriate amount of 0s, then reverse it back. This may be slower then simply inserting at the front it seems, I have not tested yet.
The problem is that after I do all of the addition on the vectors and push_back the result. I am left with a backward vector and I need to use reverse yet again! There has got to be a much better way then my method but I am stuck on finding it. Ideally I would make A const as well. Here is the code of the function:
Largenum Largenum::operator+(Largenum &A)
{
bool carry = 0;
Largenum sum;
std::vector<int>::size_type max = std::max(A.number.size(), this->number.size());
std::vector<int>::size_type diff = std::abs (A.number.size()-this->number.size());
if (A.number.size()>this->number.size())
{
std::reverse(this->number.begin(), this->number.end());
for (std::vector<int>::size_type i = 0; i<(max-diff); ++i) this->number.push_back(0);
std::reverse(this->number.begin(), this->number.end());
}
else if (this->number.size() > A.number.size())
{
std::reverse(A.number.begin(), A.number.end());
for (std::vector<int>::size_type i = 0; i<(max-diff); ++i) A.number.push_back(0);
std::reverse(A.number.begin(), A.number.end());
}
for (std::vector<int>::size_type i = max; i!=0; --i)
{
int num = (A.number[i-1] + this->number[i-1] + carry)%10;
sum.number.push_back(num);
(A.number[i-1] + this->number[i-1] + carry >= 10) ? carry = 1 : carry = 0;
}
if (carry) sum.number.push_back(1);
reverse(sum.number.begin(), sum.number.end());
return sum;
}
If anyone has any input that would be great, this is my first program using classes in C++ and its fairly overwhelming.
I think your function is quite close to the most optimal one I have seen. Still here are few suggestions how to improve it:
Decimal numeric system is quite inefficient, you have a lot of digits for big numbers. Better use a higher base to reduce the number of digits you have to add. Reading and writing such numbers in human readable representation will be a bit harder, but you will optimize the operations several times, because you will have less digits.
When implementing big integers I represent them in reverse order, thus I have the least significant digit at position with index 0, and the most significant one at the end of the array. This way when carry forces you to add a new digit you only perform a push_back, not a whole reverse.
One issue: integer modulus is pretty slow on modern processors, even compared to branch misprediction. Rather than doing an explicit %10, try this for your third for-loop:
int num = A.number[i-1] + this->number[i-1] + carry;
if(num >= 10)
{
carry = 1;
num -= 10;
}
else
{
carry = 0;
}
sum.number.push_back(num);

binary search and eps in comparisons

I have 2 comparisons inside binary search, but I can't make an exact preference in between two underlain. I oscillate between in two samples below:
for (int step = 0; step < 100; ++step) {
double middle = (left + right) / 2;
if (f(middle) > 0) right = middle; else left = middle;
}
and
for (int step = 0; step < 100; ++step) {
double middle = (left + right) / 2;
if (f(middle) > eps) right = middle; else left = middle;
}
f is a monotonically increasing function, because even with small eps, there's a danger that the corresponding error in the binary search parameter will be much bigger. On the other hand, even if our comparison is incorrect for equal values due to rounding errors, the binary search will still converge correctly since equal values may only appear in one point and everything will be correct in points very close to it. I want to have an idea about that.
Judging from your code, you are trying to decide when the function will have a zero value. The first method is already good enough, for it is consistent with your intention. It seems that there is no need to use the second method.