Fastest way to propgate through a 2d Array C++ - c++

I have 2 large 2d arrays which is 100s*100s. which has one big loop to do the operation for several times. Inside it there is 3 loops; first loop store in arr1 the sum of each cell in arr2 multiplied by number, 2nd loop stream the 2 arrays to a file and the third loop store in arr2 the sum of the two arrays divided by number.
The code explains better:
for(int i=1;i<x+1;i++) {//initialize
for(int j=1;j<y+1;j++) {
arr1[i][j]=i*j*5.5;
arr2[i][j]=0.;
}
}
for (int i=0;i<x+2;i++) {//padding
vi[i][0]=5;
vi[i][y+1]=-5;
}
for (int j=0;j<y+2;j++) {//padding
vi[0][j]=10.;
vi[x+1][j]=-10.;
}
for(int t=0;t<times;++t) {
for(int i=1;i<x+1;++i) {
for(int j=1;j<y+1;j++) {
arr2[i][j]=(arr1[i+1][j]+arr1[i-1][j]+arr1[i][j-1]+arr1[i][j+1])*1.5;
}
}
arr2[1][1]=arr2[1][y]=arr2[x][1]=arr2[x][y]=0.;
for(int i=1;i<x+1;++i) {
for(int j=1;j<y+1;j++) {
arr1[i][j]=(arr1[i][j]+arr2[i][j])*0.5;
if(arr2[i][j]+arr1[i][j]>5.)
cout<<"\n"<<t<<" "<<i-1<<" "<<j-1<<" "<<arr1[i][j]<<" "<<arr2[i][j];
}
}
}
the whole code works in more then 14s. How should I optimize the code to work in a fastest time possible.

You could use a 3rd array to temporary store the array values of arr2 for the next run.
After the first loop is done, you overwrite arr2 with the temporary array - like this you don't need the second loop. You will save half of the time.
for (n=0;n<x;n++)
{
for (i=0;i<maxi;i++)
{
for (j=0;j<maxj;j++)
{
arr1[i][j]=(arr2[i+1][j]+arr2[i-1][j]+arr2[i][j+1]+arr2[i][j-1])*1.5;
arr_tmp[i][j] = (arr1[i][j]+arr2[i][j])*0.5;
}
}
arr2 = arr_tmp;
}

Note: The OP's code has changed dramatically in response to comments about padding and such. There wasn't really anything wrong with the original code -- which is what I have based this answer on.
Assuming that your 2D arrays are indexed row-major (the first index is the row, and the second index is the column), your memory accesses are already in the correct order for best cache utilization (you are accessing nearby elements as you progress). Your latest code calls this assumption into question since you seem have renamed 'maxi' to be 'x' which would suggest that you are indexing a column-major 2D array (which is very non-standard for C/C++).
It wasn't specified how you were declaring your 2D arrays, and that could make a difference, but I got a big improvement by converting your implementation to use raw pointers. I also eliminated the second loop (from your original post) by combining the operations and alternating the direction for each iteration. I changed the weighting coefficients so that they added up to 1.0 so that I could test this more easily (by generating an image output).
typedef std::vector< std::vector<double> > Array2D;
void run( int x, Array2D & arr2 )
{
Array2D temp = arr2; // easy way to create temporary array of the correct size
int maxi=arr2.size(), maxj=arr2[0].size();
for (int n=0;n<x;n++)
{
Array2D const & src = (n&1)?temp:arr2; // alternate direction
Array2D & dst = (n&1)?arr2:temp;
for (int i=1;i<maxi-1;i++)
{
double const * sp0=&src[i-1][1], * sp1=&src[i][1], * sp2=&src[i+1][1];
double * dp=&dst[i][1];
for (int j=1;j<maxj-1;j++)
{
dp[0]=(sp0[0]+sp1[-1]+4*sp1[0]+sp1[+1]+sp2[0])*0.125;
dp++, sp0++, sp1++, sp2++;
}
}
}
if ( (x&1) ) arr2=temp; // copy the result back if the iteration count was odd
} /**/
Other things you could look into (somewhat platform-dependent):
restrict keyword for pointers (not standard C++)
prefetch requests -- a compiler/processor specific way of reducing memory access latency
make sure you have enabled optimizations when you compile
depending on the size of the array, you might find it advantageous to columnize your algorithm to make better use of available cache
Take advantage of available compute resources (very platform-dependent):
Create a SIMD-based implementation
Take advantage of your multi-core CPU -- OpenMP
Take advantage of your GPU -- OpenCL

Related

Increasing the size of an array during runtime

I want to dynamically allocate an array in a for loop using pointers. As the for loop proceeds, the size of the array should increase by one and a new element should be added then. The usual method involves using the new operator, but this always allocated a fixed memory at the time of declaration. Is there any way to do so?
I tried to do so using the following code (simplified for explaing the problem):
sameCombsCount = 0;
int **matchedIndicesArray;
for(int i = 0; i<1000; i++) //loop condition a variable
{
sameCombsCount++;
matchedIndicesArray = new int*[sameCombsCount]; // ??
// Now add an element in the new block created...
}
The thing is, I do not know the size of the for loop during execution time. It can vary depending upon execution conditions and inputs given. I don't think this is the correct way to do so. Can someone suggest a way to do so?
std::vector handles the resizing for you:
sameCombsCount = 0;
std::vecotr<int> matchedIndicesArray;
for(int i = 0; i<1000; i++) //loop condition a variable
{
sameCombsCount++;
#if 0
matchedIndicesArray.resize(sameCombsCount);
matchedIndicesArray.back() = someValue;
#else
matchedIndicesArray.push_back(someValue);
#endif
}
The first version does what you wanted and resizes the vector then sets the value. The second version just adds the element directly at the end of the array and should be marginally more efficient.

modifying values in pointers is very slow?

I'm working with a huge amount of data stored in an array, and am trying to optimize the amount of time it takes to access and modify it. I'm using Window, c++ and VS2015 (Release mode).
I ran some tests and don't really understand the results I'm getting, so I would love some help optimizing my code.
First, let's say I have the following class:
class foo
{
public:
int x;
foo()
{
x = 0;
}
void inc()
{
x++;
}
int X()
{
return x;
}
void addX(int &_x)
{
_x++;
}
};
I start by initializing 10 million pointers to instances of that class into a std::vector of the same size.
#include <vector>
int count = 10000000;
std::vector<foo*> fooArr;
fooArr.resize(count);
for (int i = 0; i < count; i++)
{
fooArr[i] = new foo();
}
When I run the following code, and profile the amount of time it takes to complete, it takes approximately 350ms (which, for my purposes, is far too slow):
for (int i = 0; i < count; i++)
{
fooArr[i]->inc(); //increment all elements
}
To test how long it takes to increment an integer that many times, I tried:
int x = 0;
for (int i = 0; i < count; i++)
{
x++;
}
Which returns in <1ms.
I thought maybe the number of integers being changed was the problem, but the following code still takes 250ms, so I don't think it's that:
for (int i = 0; i < count; i++)
{
fooArr[0]->inc(); //only increment first element
}
I thought maybe the array index access itself was the bottleneck, but the following code takes <1ms to complete:
int x;
for (int i = 0; i < count; i++)
{
x = fooArr[i]->X(); //set x
}
I thought maybe the compiler was doing some hidden optimizations on the loop itself for the last example (since the value of x will be the same during each iteration of the loop, so maybe the compiler skips unnecessary iterations?). So I tried the following, and it takes 350ms to complete:
int x;
for (int i = 0; i < count; i++)
{
fooArr[i]->addX(x); //increment x inside foo function
}
So that one was slow again, but maybe only because I'm incrementing an integer with a pointer again.
I tried the following too, and it returns in 350ms as well:
for (int i = 0; i < count; i++)
{
fooArr[i]->x++;
}
So am I stuck here? Is ~350ms the absolute fastest that I can increment an integer, inside of 10million pointers in a vector? Or am I missing some obvious thing? I experimented with multithreading (giving each thread a different chunk of the array to increment) and that actually took longer once I started using enough threads. Maybe that was due to some other obvious thing I'm missing, so for now I'd like to stay away from multithreading to keep things simple.
I'm open to trying containers other than a vector too, if it speeds things up, but whatever container I end up using, I need to be able to easily resize it, remove elements, etc.
I'm fairly new to c++ so any help would be appreciated!
Let's look from the CPU point of view.
Incrementing an integer means I have it in a CPU register and just increments it. This is the fastest option.
I'm given an address (vector->member) and I must copy it to a register, increment, and copy the result back to the address. Worst: My CPU cache is filled with vector pointers, not with vector-member pointers. Too few hits, too much cache "refueling".
If I could manage to have all those members just in a vector, CPU cache hits would be much more frequent.
Try the following:
int count = 10000000;
std::vector<foo> fooArr;
fooArr.resize(count, foo());
for (auto it= fooArr.begin(); it != fooArr.end(); ++it) {
it->inc();
}
The new is killing you and actually you don't need it because resize inserts elements at the end if the size it's greater (check the docs: std::vector::resize)
And the other thing it's about using pointers which IMHO should be avoided until the last moment and it's uneccesary in this case. The performance should be a little bit faster in this case since you get better locality of your references (see cache locality). If they were polymorphic or something more complicated it might be different.

Big O(N) difference while using 2 different appraoches

My question is as follows:
I have a function to manipulate content of an array of MAX elements. This function will simply look like the following:
//GLobal Variables
uint8_t my_array[ MAX ];
#define EMPTY 0xFF
...
void initArray( void )
{
for( uint8_t i=0; i<MAX; i++ )
{
my_array[ i ] = EMPTY;
}
}
void manipulateArray( uint8_t value )
{
for( uint8_t i=0; i<MAX; i++ )
{
if( EMPTY == my_array[ i ] )
{
my_array[ i ] = value;
break;
}
}
}
...
int main( void )
{
...
initArray();
...
while( false == exit_flag )
{
manipulateArray( value );
//get new value from user
//update exit_flag based on new value
}
...
return 0;
}
But then I thought that if I end up doing a lot of insertion/deletion, then i would be using for loops like crazy which is bound to affect the speed of the program or big O(N). So I thought what if I use another global
variable to keep track of where next empty sport in the array is for insertion instead of looping through it every time:
//GLobal Variables
uint8_t my_array[ MAX ];
uint8_t idx = 0;
...
void manipulateArray( uint8_t value )
{
my_array[ idx++ ] = value;
}
Is my assumption here correct? Also is it true that it would be better to use another data structure in this particular case that is more suitable to the nature of operations (a lot of insertion & a bit-less deletion): vectors, linked lists...
Thanks in advance,
Interpreting you generally, I take you to be asking about the problem of "inserting" values be overwriting the EMPTY value, and of "deleting" values by replacing them with EMPTY. In that context, you propose to maintain a global variable that tracks the next "empty" position, so as to avoid having to search the array for that position.
Indeed, if you know the location of the next position for insertion, then you can perform the insertion in O(1) steps, whereas if you need to perform a linear search, the best possible bound is O(n). Maintaining metadata such as you propose is a perfectly good strategy if you will always be inserting at or deleting from the end of (the non-empty portion of) the array, for then you can maintain the auxiliary variable in O(1) steps, too.
But suppose you need to support deletions from arbitrary positions, without moving the other array elements, and you also want to be able to re-fill those positions with your insertion function. In that case you have to solve a problem of maintaining information about where multiple empty positions are. A single scalar variable is not enough, and relying on the array itself for that requires you to search the array for empty positions, which is back to where you started.
The alternative is to use a more complex data structure -- an array or a linked list, for example -- to track the openings in the main array. In this way you could achieve O(1) complexity for any number of insertions and deletions at any positions in any sequence, at the cost of using O(n) memory to maintain the metadata about open array positions. This is a classic space vs. speed tradeoff: implementing a faster algorithm requires using more memory, but you can conserve memory by using a slower algorithm.

How to remove elements from a vector based on a condition in another vector?

I have two equal length vectors from which I want to remove elements based on a condition in one of the vectors. The same removal operation should be applied to both so that the indices match.
I have come up with a solution using std::erase, but it is extremely slow:
vector<myClass> a = ...;
vector<otherClass> b = ...;
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
{
if( !a[i].alive() )
{
a.erase(a.begin() + i);
b.erase(b.begin() + i);
i--;
}
}
Is there a way that I can do this more efficiently and preferably using stl algorithms?
If order doesn't matter you could swap the elements to the back of the vector and pop them.
for(size_t i=0; i<a.size();)
{
if( !a[i].alive() )
{
std::swap(a[i], a.back());
a.pop_back();
std::swap(b[i], b.back());
b.pop_back();
}
else
++i;
}
If you have to maintain the order you could use std::remove_if. See this answer how to get the index of the dereferenced element in the remove predicate:
a.erase(remove_if(begin(a), end(a),
[b&](const myClass& d) { return b[&d - &*begin(a)].alive(); }),
end(a));
b.erase(remove_if(begin(b), end(b),
[](const otherClass& d) { return d.alive(); }),
end(b));
The reason it's slow is probably due to the O(n^2) complexity. Why not use list instead? As making a pair of a and b is a good idea too.
A quick win would be to run the loop backwards: i.e. start at the end of the vector. This tends to minimise the number of backward shifts due to element removal.
Another approach would be to consider std::vector<std::unique_ptr<myClass>> etc.: then you'll be essentially moving pointers rather than values.
I propose you create 2 new vectors, reserve memory and swap vectors content in the end.
vector<myClass> a = ...;
vector<otherClass> b = ...;
vector<myClass> new_a;
vector<myClass> new_b;
new_a.reserve(a.size());
new_b.reserve(b.size());
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
{
if( a[i].alive() )
{
new_a.push_back(a[i]);
new_b.push_back(b[i]);
}
}
swap(a, new_a);
swap(b, new_b);
It can be memory consumed, but should work fast.
erasing from the middle of a vector is slow due to it needing to reshuffle everything after the deletion point. consider using another container instead that makes erasing quicker. It depends on your use cases, will you be iterating often? does the data need to be in order? If you aren't iterating often, consider a list. if you need to maintain order, consider a set. if you are iterating often and need to maintain order, depending on the number of elements, it may be quicker to push back all alive elements to a new vector and set a/b to point to that instead.
Also, since the data is intrinsically linked, it seems to make sense to have just one vector containing data a and b in a pair or small struct.
For performance reason need to use next.
Use
vector<pair<myClass, otherClass>>
as say #Basheba and std::sort.
Use special form of std::sort with comparision predicate. And do not enumerate from 0 to n. Use std::lower_bound instead, becouse vector will be sorted. Insertion of element do like say CashCow in this question: "how do you insert the value in a sorted vector?"
I had a similar problem where I had two :
std::<Eigen::Vector3d> points;
std::<Eigen::Vector3d> colors;
for 3D pointclouds in Open3D and after removing the floor, I wanted to delete all points and colors if the points' z coordinate is greater than 0.05. I ended up overwriting the points based on the index and resizing the vector afterward.
bool invert = true;
std::vector<bool> mask = std::vector<bool>(points.size(), invert);
size_t pos = 0;
for (auto & point : points) {
if (point(2) < CONSTANTS::FLOOR_HEIGHT) {
mask.at(pos) = false;
}
++pos;
}
size_t counter = 0;
for (size_t i = 0; i < points.size(); i++) {
if (mask[i]) {
points.at(counter) = points.at(i);
colors.at(counter) = colors.at(i);
++counter;
}
}
points.resize(counter);
colors.resize(counter);
This maintains order and at least in my case, worked almost twice as fast than the remove_if method from the accepted answer:
for 921600 points the runtimes were:
33 ms for the accepted answer
17 ms for this approach.

Insert into a desired element of an array and push all other elements one spot over in c++

Having some issues with one small function I'm working on for a homework assignment.
I have a static array size of 20 (shelfSize), however, I only need to use a max of 10 elements. So I don't have to worry about out of bounds etc (the entire array of 20 is initialized to 0).
What I am looking to do is insert an integer, booknum, into an element of an array it receives as input.
This my current logic:
void insert_at(int booknum, int element){
for(int i=element+1; i < shelfSize; i++)
bookshelf[i+1]=bookshelf[i]
bookshelf[element]=booknum;
}
so let's say I have the this array:
[5,4,3,1,7]
I want to insert an 8 at element 1 and have the array turn to:
[5,8,4,3,1,7]
Technically, everything after the final element 7 is a 0, however, I have a separate print function that only prints up to a certain element.
No matter how many times I take some pencil and paper and manually write out my logic, I can't get this to work.
Any help would be appreciated, thanks.
You should start from the end of the array, this should word for you:
void insert_at(int booknum, int element)
{
for (int i = shelfsize-1;i>element;i--)
bookshelf[i] = bookshelf[i-1];
bookshelf[element] = booknum;
}
Also I recommend that you get used to handling illegal values, for example, what if a user entered 21?
The optimized code would be:
bool insert_at(int booknum, int element)
{
if (element>=shelfsize-1)
return false;
for (int i = shelfsize-2;i>element;i--)
bookshelf[i] = bookshelf[i-1];
bookshelf[element] = booknum;
return true;
}
If your example is correct, then you're assuming 1-based indices instead of 0-based. Use the following instead:
void insert_at(int booknum, int element){
for(int i=element; i < shelfSize; i++)
bookshelf[i]=bookshelf[i-1];
bookshelf[element-1]=booknum;
}
However, I would prefer you just use the same code, and change "at element 2" in your example to "at element 1". Always remember C++ arrays are 0-based.
That being said, please tell your professor that this is why vectors (and other standard containers) were made, and that C++ arrays are evil.
http://www.parashift.com/c++-faq-lite/containers.html#faq-34.1
Just noticed, you are copying up, this means your function does this:
[5,4,3,1,7]
--^
[5,4,4,1,7]
--^
[5,4,4,4,7]
--^
[5,4,4,4,4]
--^
[5,4,4,4,4,4]
For moving values in an array, you always want to copy in the opposite direction to which you are moving, so to move up, you want to copy each item up from the top down:
[5,4,3,1,7]
--^
[5,4,3,1,7,7]
--^
[5,4,3,1,1,7]
--^
[5,4,3,3,1,7]
--^
[5,4,4,3,1,7]
And then you can overwrite the index you freed up.