Efficient sub-array (2D) access - c++

NB: I'm open to suggestion of a better title..
Imagine an nxn square, stored as an integer array.
What is the most efficient method of generating an n-length array of the integers in each of the n non-overlapping sqrt(n)xsqrt(n) sub-squares?
A special case (n=9) of this is Sudoku, if we wanted the numbers in the smaller squares.
The only method I can think of is something like:
int square[n][n], subsq[n], len;
int s = sqrt(n);
for(int j=0; j<n; j+=s){
for(int i=0; i<n; i+=s){
//square[i][j] is the top-left of each sub-square
len = 0;
for(int y=j; y<j+s; y++){
for(int x=i; x<i+s; x++){
subsq[len] = square[x][y];
len++;
}
}
}
}
But this seems loopy, if you'll forgive me the pun.
Does anyone have a more efficient suggestion?

Despite the four level loop, you are only accessing each array element at most one time, so the complexity of your approach is O(n^2), and not O(n^4) as the four loop levels suggest. And, since you actually want to look at all elements, this is close to optimal.
There is only one possible suboptimality: Incomplete use of cachelines. If s is not a multiple of a cache line, your subsquares will end in the middle of a cacheline, leading to parts of the data being fetched twice from memory. However, this is only an issue if your subsquares do not fit into cache anymore, so you need a very large problem size to trigger this. For a sudoku square, there is no faster way than the one you've given.
To work around this cacheline issue (once you determined that this is really worth it!), you can go through your matrix one line at a time, aggregating data for ciel(n/sqrt(n)) subsquares in an output array. This would exchange the loops in the following way:
for(int j=0; j<n; j+=s){
for(int y=j; y<j+s; y++){
for(int i=0; i<n; i+=s){
for(int x=i; x<i+s; x++){
However, this will only work out if the intermediate data you need to hold while traversing a single subsquare is small. If you need to copy the entire data into a temporary array like you do, you won't gain anything.
If you really want to optimize, try to get away from storing the data in the temporary subseq array. Try to interprete the data you find directly where you read it from the matrix. If you are indeed checking sudoku squares, it is possible to avoid this temporary array.
From the way you pose the question, I presume that your goal is to pass the data in each subsquare to an analysis function in turn. If that is the case, you can simply pass a pointer to the 2D subarray to the function like this:
void analyse(int width, int height, int (*subsquare)[n]) {
for(int y = 0; y < height; y++) {
for(int x = 0; x < width; x++) {
subsquare[y][x]; //do anything you like with this value
}
}
}
int main() {
int square[n][n], subsq[n], len;
int s = sqrt(n);
for(int j=0; j<n; j+=s){
for(int i=0; i<n; i+=s){
analyse(s, s, (int (*)[n])&square[i][j]);
}
}
}
Now you can just pass any 2D subarray shape to your analysis function by varying the first two parameters, and completely avoid a copy.

Related

Allocate a (fixed size) vector<vector<double>> for efficient access and add them?

I am implementing an algorithm that uses rather large vector(vector(double)) types for storage; no elements will be added or removed after preallocation. I would like to make sure that element access is as fast as possible, also I need to add and scale several of them (elementwise). What is the best way to do this?
Here are relevant parts of my (naive) code, that I doubt is efficient:
vector<vector<double>> z;
vector<vector<double>> mu;
vector<vector<double>> temp_NNZ;
..
for(int i = 0; i < init.valsA.size(); ++i){
z.push_back({});
mu.push_back({});
temp_NNZ.push_back({});
for(int j = 0; j < init.valsA[i].size(); ++j){
z[i].push_back(0);
mu[i].push_back(0);
temp_NNZ[i].push_back(0);
}
}
..
for(int i = 0; i < z.size(); ++i){
for(int j = 0; j < z[i].size(); ++j){
z[i][j] = temp_NNZ[i][j] - mu[i][j]/rho - z[i][j];
}
}
There are two ways to do this : vector::resize will create all the elements, and value-initialize them (or copy-initialize them if you give it an initial value), and vector::reserve will let you allocate the amount of memory you need in advance without initializing it (which may be more efficient). In the first case, you'll have to copy the final value to the already existing elements (z[i] = x) whereas in the other you'll have to create the elements as you would do with your current code (z.push_back(x)).

outputting a row of a vector

I am learning c++ and I am currently at halt.
I am trying to write a function such that:
It takes in input a one dimensional vector and an integer which specifies a row.
The numbers on that row are put into an output vector for later use.
The only issue is that this online course states that I must use another function that I have made before that allows a 1d vector with one index be able to have two indexes.
it is:
int twod_to_oned(int row, int col, int rowlen){
return row*rowlen+col;
}
logically what I am trying to do:
I use this function to store the input vector into a temporary vector as a 2D matric with i as the x axis and y as the y axis.
from there I have a loop which reads out the numbers on the row needed and stores it in the output vector.
so far I have:
void get_row(int r, const std::vector<int>& in, std::vector<int>& out){
int rowlength = std::sqrt(in.size());
std::vector <int> temp;
for(int i = 0; i < rowlength; i++){ // i is vertical and j is horizontal
for(int j = 0; j < rowlength; j++){
temp[in[twod_to_oned(i,j,side)]]; // now stored as a 2D array(matrix?)
}
}
for(int i=r; i=r; i++){
for(int j=0; j< rowlength; j++){
out[temp[i][j]];
}
}
I'm pretty sure there is something wrong in the first and last loop which turns into a 2D matric then stores the row.
I starred the parts that are incomplete due to my lack of knowledge.
How could I overcome this issue? I would appreciate any help, Many thanks.
takes in input a one dimensional vector
but
int rowlength = std::sqrt(in.size());
The line of code appears to assume that the input is actually a square two dimensional matrix ( i.e. same number of rows and columns ) So which is it? What if the number of elements in the in vector is not a perfect square?
This confusion about the input is likely to cuase your problem and should be sorted out before doing anything else.
I think what you wanted to do is the following:
void get_row(int r, const std::vector<int>& in, std::vector<int>& out) {
int rowlength = std::sqrt(in.size());
std::vector<std::vector<int>> temp; // now a 2D vector (vector of vectors)
for(int i = 0; i < rowlength; i++) {
temp.emplace_back(); // for each row, we need to emplace it
for(int j = 0; j < rowlength; j++) {
// we need to copy every value to the i-th row
temp[i].push_back(in[twod_to_oned(i, j, rowlength)]);
}
}
for(int j = 0; j < rowlength; j++) {
// we copy the r-th row to out
out.push_back(temp[r][j]);
}
}
Your solution used std::vector<int> instead of std::vector<std::vector<int>>. The former does not support accessing elements by [][] syntax.
You were also assigning to that vector out of its bounds. That lead to undefined behaviour. Always use push_back or emplate_back to add elements. Use operator [] only to access the present data.
Lastly, the same holds true for inserting the row to out vector. Only you can know if the out vector holds enough elements. My solution assumes that out is empty, thus we need to push_back the entire row to it.
In addition: you might want to use std::vector::insert instead of manual for() loop. Consider replacing the third loop with:
out.insert(out.end(), temp[r].begin(), temp[r].end());
which may prove being more efficient and readable. The inner for() look of the first loop could also be replaced in such a way (or even better - one could emplace the vector using iterators obtained from the in vector). I highly advise you to try to implement that.

How do I generate a vector during each iteration of loop, store data, and then delete that vector?

I am trying to create a vector of vectors in my program. I wish to have a double loop; the inner loops checks for a certain condition, and if that condition is met, a value is stored in my vector. Once the inner loop runs its course, that "temp" vector is stored in the main vector. My idea was to clear my "temp" (inner) vector, but vector.clear() deletes everything in my main vector as well. This is my vector code:
vector <int> vectortestInner;
vector <vector<int> > vectortestOuter(10);
I populate my vectors here:
void vectorTest()
{
for (int i=0; i<vectortestOuter.size(); i++)
{
for (int j=0; j<vectortestOuter.size(); j++)
{
vectortestInner.push_back(j);
}
vectortestOuter[i]=vectortestInner;
//vectortestInner.clear();
}
}
and attempt printing the contents like this:
for(int i=0; i<vectortestOuter.size(); i++)
{
for (int j=0; j<vectortestInner.size(); i++)
{
cout<<vectortestInner[j]<<endl;
}
}
So far, it seems to be printing 0s, (when I want it to print 1-10), and if I call clear();, it just outputs empty lines.
What am I doing wrong, and how can I achieve what I am trying to do? Thanks!
Populating (or repopulating since that's what your first function does can be done with
// remove any global declaration of vectortestInner since we won't use it
void vectorTest()
{
for (int i=0; i<vectortestOuter.size(); i++)
{
std::vector<int> vectortestInner;
vectortestInner.reserve(vectortestOuter.size());
for (int j=0; j<vectortestOuter.size(); j++)
{
vectortestInner.push_back(j);
}
vectortestOuter[i]=vectortestInner;
// vectortestInner ceases to exist here
}
}
This locally constructs vectorTestInner in every iteration of the outer loop, so it will be destructed at the end of the iteration as well. The reserve() call avoids multiple resizing (but is specific to the fact your inner loop is, in total, going to append vectortestOuter.size() elements).
Yes, this reconstructs vectortestInner every time. But that is not actually any worse than clearing and repopulating every time (since those are the most significant operations done in construction and destruction).
To print the elements of your vector of vectors, you actually need to refer to them. Your code has a flaw in that (somehow) you are assuming vectorTestInner magically provides a means of accessing elements of vectorTestOuter. That is not so.
for(int i=0; i<vectortestOuter.size(); i++)
{
for (int j=0; j<vectortestOuter[i].size(); j++) // also using j++ here, not i++
{
cout<<vectortestOuter[i][j]<<endl;
}
}
There are other inefficiencies in your code that I haven't addressed. Rather than using [] consider using iterators as well. I'll leave that as an exercise.
You are printing the temporary vector (vectortestInner) in your inner loop, not a vector contained in the main vector (vectortestOuter[i]).
for(int i=0; i<vectortestOuter.size(); i++)
{
for (int j=0; j<vectortestOuter[i].size(); j++)
{
cout<<vectortestOuter[i][j]<<endl;
}
}
With the printing function changed, the clear of vectortestInner should work as expected.

Improving O(n) while looping through a 2d array in C++

A goal of mine is to reduce my O(n^2) algorithms into O(n), as it's a common algorithm in my Array2D class. Array2D holds a multidimensional array of type T. A common issue I see is using doubly-nested for loops to traverse through an array, which is slow depending on the size.
As you can see, I reduced my doubly-nested for loops into a single for loop here. It's running fine when I execute it. Speed has surely improved. Is there any other way to improve the speed of this member function? I'm hoping to use this algorithm as a model for my other member functions that have similar operations on multidimensional arrays.
/// <summary>
/// Fills all items within the array with a value.
/// </summary>
/// <param name="ob">The object to insert.</param>
void fill(const T &ob)
{
if (m_array == NULL)
return;
//for (int y = 0; y < m_height; y++)
//{
// for (int x = 0; x < m_width; x++)
// {
// get(x, y) = ob;
// }
//}
int size = m_width * m_height;
int y = 0;
int x = 0;
for (int i = 0; i < size; i++)
{
get(x, y) = ob;
x++;
if (x >= m_width)
{
x = 0;
y++;
}
}
}
Make sure things are contiguous in memory as cache behavior is likely to dominate the run-time of any code which performs only simple operations.
For instance, don't use this:
int* a[10];
for(int i=0;i<10;i++)
a[i] = new int[10];
//Also not this
std::vector< std::vector<int> > a(std::vector<int>(10),10);
Use this:
int a[100];
//or
std::vector<int> a(100);
Now, if you need 2D access use:
for(int y=0;y<HEIGHT;y++)
for(int x=0;x<WIDTH;x++)
a[y*WIDTH+x];
Use 1D accesses for tight loops, whole-array operations which don't rely on knowledge of neighbours, or for situations where you need to store indices:
for(int i=0;i<HEIGHT*WIDTH;i++)
a[i];
Note that in the above two loops the number of items touched is HEIGHT*WIDTH in both cases. Though it may appear that one has a time complexity of O(N^2) and the other O(n), it should be obvious that the net amount of work done is HEIGHT*WIDTH in both cases. It is better to think of N as the total number of items touched by an operation, rather than a property of the way in which they are touched.
Sometimes you can compute Big O by counting loops, but not always.
for (int m = 0; m < M; m++)
{
for (int n = 0; n < N; n++)
{
doStuff();
}
}
Big O is a measure of "How many times is doStuff executed?" With the nested loops above it is executed MxN times.
If we flatten it to 1 dimension
for (int i = 0; i < M * N; i++)
{
doStuff();
}
We now have one loop that executes MxN times. One loop. No improvement.
If we unroll the loop or play games with something like Duff's device
for (int i = 0; i < M * N; i += N)
{
doStuff(); // 0
doStuff(); // 1
....
doStuff(); // N-1
}
We still have MxN calls to doStuff. Some days you just can't win with Big O. If you must call doStuff on every element in an array, no matter how many dimensions, you cannot reduce Big O. But if you can find a smarter algorithm that allows you to avoid calls to doStuff... That's what you are looking for.
For Big O, anyway. Sometimes you'll find stuff that has an as-bad-or-worse Big O yet it outperforms. One of the classic examples of this is std::vector vs std::list. Due to caching and prediction in a modern CPU, std::vector scores a victory that slavish obedience to Big O would miss.
Side note (Because I regularly smurf this up myself) O(n) means if you double n, you double the work. This is why O(n) is the same as O(1,000,000 n). O(n2) means if you double n you do 22 times the work. If you are ever puzzled by an algorithm, drop a counter into the operation you're concerned with and do a batch of test runs with various Ns. Then check the relationship between the counters at those Ns.

C++ Checking for identical values in 2 arrays

I have 2 arrays called xVal, and yVal.
I'm using these arrays as coords. What I want to do is to make sure that the array doesn't contain 2 identical sets of coords.
Lets say my arrays looks like this:
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
Here I want to find the match between xVal[0] yVal[0] and xVal[1] yVal[1] as 2 identical sets of coords called 1,1.
I have tried some different things with a forLoop, but I cant make it work as intended.
You can write an explicit loop using an O(n^2) approach (see answer from x77aBs) or you can trade in some memory for performance. For example using std::set
bool unique(std::vector<int>& x, std::vector<int>& y)
{
std::set< std::pair<int, int> > seen;
for (int i=0,n=x.size(); i<n; i++)
{
if (seen.insert(std::make_pair(x[i], y[i])).second == false)
return false;
}
return true;
}
You can do it with two for loops:
int MAX=4; //number of elements in array
for (int i=0; i<MAX; i++)
{
for (int j=i+1; j<MAX; j++)
{
if (xVal[i]==xVal[j] && yVal[i]==yVal[j])
{
//DUPLICATE ELEMENT at xVal[j], yVal[j]. Here you implement what
//you want (maybe just set them to -1, or delete them and move everything
//one position back)
}
}
}
Small explanation: first variable i get value 0. Than you loop j over all possible numbers. That way you compare xVal[0] and yVal[0] with all other values. j starts at i+1 because you don't need to compare values before i (they have already been compared).
Edit - you should consider writing small class that will represent a point, or at least structure, and using std::vector instead of arrays (it's easier to delete an element in the middle). That should make your life easier :)
int identicalValueNum = 0;
int identicalIndices[4]; // 4 is the max. possible number of identical values
for (int i = 0; i < 4; i++)
{
if (xVal[i] == yVal[i])
{
identicalIndices[identicalValueNum++] = i;
}
}
for (int i = 0; i < identicalValueNum; i++)
{
printf(
"The %ith value in both arrays is the same and is: %i.\n",
identicalIndices[i], xVal[i]);
}
For
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
the output of printf would be:
The 0th value in both arrays is the same and is: 1.
The 1th value in both arrays is the same and is: 1.
The 3th value in both arrays is the same and is: 4.