Two- and one-dimensional arrays equivalence in C++ - c++

It is known that two- and one-dimensional arrays can be used equivalently, by simple coordinate conversion. Is such equivalence guaranteed by the C++ standard, or maybe it's the most convenient way of organizing data, but doesn't have to be obeyed everywhere?
For example, is the following code compiler-independent?
std::ofstream ofbStream;
ofbStream.open("File", std::ios::binary);
char Data[3][5];
for(int i=0; i<3; ++i)
for(int j=0; j<5; ++j)
{
Data[i][j] = (char) 5*i+j;
}
ofbStream.write(&Data[0][0], 15);
ofbStream.close();
The program is expected to write the numbers: 0, 1, 2, ..., 14 to a file.

In practice, this is just fine. Any compiler that doesn't do that would have countless problems with existing code.
Very strictly speaking, though, the pointer arithmetic needed is Undefined Behavior.
char Data[3][5];
char* p = &Data[0][0];
p + 7; // UB!
5.7/5 (emphasis mine):
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that that difference of the subscripts of the resulting and original array elements equals the integral expression. ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
The Standard does guarantee that all the array elements are adjacent in memory and are in a specific order, and that dereferencing a pointer with the correct address (no matter how you got it) refers to the object at that address, but it doesn't guarantee that p + 7 does anything predictable, since p and p + 7 don't point at elements of the same array or past-the-end. (Instead they point at elements of elements of the same array.)

In his book The C++ Programming Language, Bjarne Stroustrup mentions (C.7.2; p. 838 of the Special Edition, 2000):
... We can initialize ma like this:
void int_ma() {
for(int i=0; i<3; i++)
for(int j=0; j<5; j++) ma[i][j] = 10 * i + j; }
...
The array ma is simply 15 ints that we access as if it were 3
arrays of 5 ints. In particular, there is no single object in memory
that is the matrix ma - only the elements are stored. The dimensions 3
and 5 exist in the compiler source only.
(emphasis mine).
In other words, the notation [][]...[] is a compiler construction; syntactical sugar if you will.
For entertainment purposes, I wrote the following code:
#include<cstdlib>
#include<iostream>
#include<iterator>
#include<algorithm>
int main() {
double ma[5][3]; double *beg = &ma[0][0]; // case 1
//double ma[3][5]; double *beg = &ma[0][0]; // case 2
//double ma[15]; double *beg = &ma[0]; // case 3
double *end = beg + 15;
// fill array with random numbers
std::generate(beg, end, std::rand);
// display array contents
std::copy(beg, end, std::ostream_iterator<double>(std::cout, " "));
std::cout<<std::endl;
return 0;
}
And compared the assembly generated for the three cases using the compilation command (GCC 4.7.2):
g++ test.cpp -O3 -S -oc1.s
The cases are called c1.s, c2.s, and c3.s. The output of the command shasum *.s is:
5360e2438aebea682d88277da69c88a3f4af10f3 c1.s
5360e2438aebea682d88277da69c88a3f4af10f3 c2.s
5360e2438aebea682d88277da69c88a3f4af10f3 c3.s
Now, I must mention that the most natural construction seems to be the one-dimensional declaration of ma, that is: double ma[N], because then the initial position is simply ma, and the final position is simply ma + N (this is as opposed to taking the address of the first element of the array).
I find that the algorithms provided by the <algorithm> C++ Standard Library header fit much more snuggly in this case.
Finally, I must encourage you to consider using std::array or std::vector if at all possible.
Cheers.

C++ stores multi-dimensional arrays in row major order as a one-dimensional array extending through memory.

As other commenters have indicated, the 2-dimensional array will be mapped to 1-dimensional memory. Is your assumption platform independent? I would expect so, but you should always test it to be sure.
#include <iostream>
#include <iterator>
#include <algorithm>
int main() {
char Data[3][5];
int count = 0;
for (int i = 0; i < 3; ++i)
for (int j = 0; j < 5; ++j)
Data[i][j] = count++;
std::copy(&Data[0][0], &Data[0][0] + 15, std::ostream_iterator<int>(std::cout,", "));
}
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
http://www.fredosaurus.com/notes-cpp/arrayptr/23two-dim-array-memory-layout.html
How are multi-dimensional arrays formatted in memory?
Memory map for a 2D array in C

Quote
It follows from all this that arrays in C++ are stored row-wise (last subscript varies fastest) and that
the first subscript in the declaration helps determine the amount of storage consumed by an array but plays
no other part in subscript calculations.
C++ ISO standard

Related

C++: storing a matrix in a 1D array

I am very new to C++, but I have the task of translating a section of C++ code into python.
Going through the file, I found this section of code, which confuses me:
int n_a=(e.g 10)
int n_b=n_a-1;
int l_b[2*n_b];
int l_c[3*n_b];
int l_d[4*n_b];
for (int i=0; i<n_b; i++){
for (int j=0; j<2; j++) l_b[i*2+j]=0;
for (int j=0; j<3; j++) l_c[i*3+j]=0;
for (int j=0; j<4; j++) l_d[i*4+j]=0;
I know that it creates 3 arrays, the length of each defined by the action on the n_b variable, and sets all the elements to zero, but I do not understand what exactly this matrix is supposed to look like, e.g. if written on paper.
A common way to store a matrix with R rows and C columns is to store all elements in a vector of size R * C. Then when you need element (i, j) you just index the vector with i*C + j. This is not the only way your "matrix" could be stored in memory, but it is a common one.
In this code there are 3 C arrays that declared and initialized with zeros. The l_b array seems to be storage for a n_a x 2 matrix, the l_c array for a n_a x 3 matrix and the l_d array for a n_a x 4 matrix.
Of course, this is only an impression since to be sure we would need to see how these arrays are used later.
As in the comments, if you are going to convert this to python then you should probably use numpy for the matrices. In fact, the numpy arrays will store the elements in memory exactly like indexing I mentioned (by default, but you can also choose an alternative way passing an extra argument). You could do the same of this C++ code in oython with just
import numpy as np
n_a = (e.g 10)
l_b = np.zeros(shape=(n_a, 2))
l_c = np.zeros(shape=(n_a, 3))
l_d = np.zeros(shape=(n_a, 4))
These variables in numpy are 2D arrays and you can index them as usual.
Ex:
l_d[2, 1] = 15.5
We can also have a nice syntax for working with vector, matrices and linear algebra in C++ by using one of the available libraries. One such library is armadillo. We can create the three previous matrices of zeros using armadillo as
#include <armadillo>
int main(int argc, char *argv[]) {
unsigned int n_a = 10;
// A 10 x 3 matrix of doubles with all elements being zero
// The 'arma::fill::zeros' argument is optional and without it the matrix
// elements will not be initialized
arma::mat l_b(n_a, 2, arma::fill::zeros);
arma::mat l_c(n_a, 3, arma::fill::zeros);
arma::mat l_d(n_a, 4, arma::fill::zeros);
// We use parenthesis for index, since "[]" can only receive one element in C/C++
l_b(2, 1) = 15.5;
// A nice function for printing, but it also works with operator<<
l_b.print("The 'l_b' matrix is");
return 0;
}
If you inspect armadillo types in gdb you will see that it has a mem atribute which is a pointer. This is in fact a C array for the internal elements of the matrix and when you index the matrix in armadillo it will translate the indexes into the proper index in this internal 1D array.
You can print the elements in this internal arry in gdb. For instance, print l_b.mem[0] will print the first element, print l_b.mem[1] will print the second element, and so one.

Initializing large 2-dimensional array to all one value in C++

I want to initialize a large 2-dimensional array (say 1000x1000, though I'd like to go even larger) to all -1 in C++.
If my array were 1-dimensional, I know I could do:
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
However, memset does not allow for initializing all the elements of an array to be another array. I know I could just make a 1-dimensional array of length 1000000, but for readability's sake I would prefer a 2-dimensional array. I could also just loop through the 2-dimensional array to set the values after initializing it to all 0, but this bit of code will be run many times in my program and I'm not sure how fast that would be. What's the best way of achieving this?
Edited to add minimal reproducible example:
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j
I am a little bit surprised.
And I know, it is C++. And, I would never use plain C-Style arrays.
And therefore the accepted answer is maybe the best.
But, if we come back to the question
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
and
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j
Then the easiest and fastest solution is the same as the original assumption:
int my_array[1000][1000];
memset(my_array, -1, sizeof(my_array));
There is no difference. The compiler will even optimze this away and use fast assembler loop instructions.
Sizeof is smart enough. It will do the trick. And the memory is contiguous: So, it will work. Fast. Easy.
(A good compiler will do the same optimizations for the other solutions).
Please consider.
With GNU GCC you can:
int my_array[1000][1000] = { [0 .. 999] = { [0 .. 999] = -1, }, };
With any other compiler you need to:
int my_array[1000][1000] = { { -1, -1, -1, .. repeat -1 1000 times }, ... repeat { } 1000 times ... };
Side note: The following is doing assignment, not initialization:
int my_array[1000][1000];
for (auto&& i : my_array)
for (auto&& j : i)
j = -1;
Is there any real difference between doing what you wrote and doing for(int i=0; i<1000; i++){ for(int j=0; j<1000; j++){ my_array[i][j]=-1; } }?
It depends. If you have a bad compiler, you compile without optimization, etc., then yes. Most probably, no. Anyway, don't use indexes. I believe the range based for loop in this case roughly translates to something like this:
for (int (*i)[1000] = my_array; i < my_array + 1000; ++i)
for (int *j = *i; j < *i + 1000; ++j)
*j = -1;
Side note: Ach! It hurts to calculate my_array + 1000 and *i + 1000 each loop. That's like 3 operations done each loop. This cpu time wasted! It can be easily optimized to:
for (int (*i)[1000] = my_array, (*max_i)[1000] = my_array + 10000; i < max_i; ++i)
for (int *j = *i, *max_j = *i + 1000; j < max_j; ++j)
*j = -1;
The my_array[i][j] used in your loop, translates into *(*(my_array + i) + j) (see aarray subscript operator). That from pointer arithmetics is equal to *(*((uintptr_t)my_array + i * sizeof(int**)) + j * sizeof(int*)). Counting operations, my_array[i][j] is behind the scenes doing multiplication, addition, dereference, multiplication, addition, derefence - like six operations. (When using bad or non-optimizing compiler), your version could be way slower.
That said, a good compiler should optimize each version to the same code, as shown here.
And are either of these significantly slower than just initializing it explicitly by typing a million -1's?
I believe assigning each array element (in this particular case of elements having the easy to optimize type int) will be as fast or slower then initialization. It really depends on your particular compiler and on your architecture. A bad compiler can do very slow version of iterating over array elements, so it would take forever. On the other hand a static initialization can embed the values in your program, so your program size will increase by sizeof(int) * 1000 * 1000, and during program startup is will do plain memcpy when initializing static regions for your program. So, when compared to a properly optimized loop with assignment, you will not gain nothing in terms of speed and loose tons of read-only memory.
If the array is static, it's placed in sequential memory (check this question). So char [1000][1000] is equal to char [1000000] (if your stack can hold that much).
If the array has been created with multidimensional new (say char(*x)[5] = new char[5][5]) then it's also contiguous.
If it's not (if you create it with dynamic allocations), then you can use the solutions found in my question to map a n-dimension array to a single one after you have memsetd it.

How to set every element in an array to 0

I am learning C++ and one of my practice exercises is to use pointers to set all the elements in an array to 0. I have no idea how to do this by incrementing the pointer to the next position in the array since my IDE log said that comparison between int and * is forbidden. I only need a small snippet as an example to help me better understand where i'm going wrong. The array I have created is of type int and has a single dimension with 5 elements consisting of 1,2,3,4 and 5.
int array[5] = {1, 2, 3, 4, 5};
for(int *i = &array[0], *end = &array[5]; i != end; i++)
*i = 0;
The code creates a pointer to the start &array[0] and a pointer to one position past the end &array[5]
Then it steps the pointer through the array, setting each element to zero.
A more advanced concept that is very similar is iterators.
You could use std::fill, http://en.cppreference.com/w/cpp/algorithm/fill, as follows.
const size_t dataSize = 10;
int data[dataSize];
std::fill(data, data + dataSize, 0);

How to get intersection of two Arrays

I have two integer arrays
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
And i have to get intersection of these array.
i.e. output will be - 2,6,7
I am thinking to sove it by saving array A in a data strcture and then i want to compare all the element till size A or B and then i will get intersection.
Now i have a problem i need to first store the element of Array A in a container.
shall i follow like -
int size = sizeof(A)/sizeof(int);
To get the size but by doing this i will get size after that i want to access all the elemts too and store in a container.
Here i the code which i am using to find Intersection ->
#include"iostream"
using namespace std;
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
int main()
{
int sizeA = sizeof(A)/sizeof(int);
int sizeB = sizeof(B)/sizeof(int);
int big = (sizeA > sizeB) ? sizeA : sizeB;
int small = (sizeA > sizeB) ? sizeB : sizeA;
for (int i = 0; i <big ;++i)
{
for (int j = 0; j <small ; ++j)
{
if(A[i] == B[j])
{
cout<<"Element is -->"<<A[i]<<endl;
}
}
}
return 0;
}
Just use a hash table:
#include <unordered_set> // needs C++11 or TR1
// ...
unordered_set<int> setOfA(A, A + sizeA);
Then you can just check for every element in B, whether it's also in A:
for (int i = 0; i < sizeB; ++i) {
if (setOfA.find(B[i]) != setOfA.end()) {
cout << B[i] << endl;
}
}
Runtime is expected O(sizeA + sizeB).
You can sort the two arrays
sort(A, A+sizeA);
sort(B, B+sizeB);
and use a merge-like algorithm to find their intersection:
#include <vector>
...
std::vector<int> intersection;
int idA=0, idB=0;
while(idA < sizeA && idB < sizeB) {
if (A[idA] < B[idB]) idA ++;
else if (B[idB] < A[idA]) idB ++;
else { // => A[idA] = B[idB], we have a common element
intersection.push_back(A[idA]);
idA ++;
idB ++;
}
}
The time complexity of this part of the code is linear. However, due to the sorting of the arrays, the overall complexity becomes O(n * log n), where n = max(sizeA, sizeB).
The additional memory required for this algorithm is optimal (equal to the size of the intersection).
saving array A in a data strcture
Arrays are data structures; there's no need to save A into one.
i want to compare all the element till size A or B and then i will get intersection
This is extremely vague but isn't likely to yield the intersection; notice that you must examine every element in both A and B but "till size A or B" will ignore elements.
What approach i should follow to get size of an unkown size array and store it in a container??
It isn't possible to deal with arrays of unknown size in C unless they have some end-of-array sentinel that allows counting the number of elements (as is the case with NUL-terminated character arrays, commonly referred to in C as "strings"). However, the sizes of your arrays are known because their compile-time sizes are known. You can calculate the number of elements in such arrays with a macro:
#define ARRAY_ELEMENT_COUNT(a) (sizeof(a)/sizeof *(a))
...
int *ptr = new sizeof(A);
[Your question was originally tagged [C], and my comments below refer to that]
This isn't valid C -- new is a C++ keyword.
If you wanted to make copies of your arrays, you could simply do it with, e.g.,
int Acopy[ARRAY_ELEMENT_COUNT(A)];
memcpy(Acopy, A, sizeof A);
or, if for some reason you want to put the copy on the heap,
int* pa = malloc(sizeof A);
if (!pa) /* handle out-of-memory */
memcpy(pa, A, sizeof A);
/* After you're done using pa: */
free(pa);
[In C++ you would used new and delete]
However, there's no need to make copies of your arrays in order to find the intersection, unless you need to sort them (see below) but also need to preserve the original order.
There are a few ways to find the intersection of two arrays. If the values fall within the range of 0-63, you can use two unsigned longs and set the bits corresponding to the values in each array, then use & (bitwise "and") to find the intersection. If the values aren't in that range but the difference between the largest and smallest is < 64, you can use the same method but subtract the smallest value from each value to get the bit number. If the range is not that small but the number of distinct values is <= 64, you can maintain a lookup table (array, binary tree, hash table, etc.) that maps the values to bit numbers and a 64-element array that maps bit numbers back to values.
If your arrays may contain more than 64 distinct values, there are two effective approaches:
1) Sort each array and then compare them element by element to find the common values -- this algorithm resembles a merge sort.
2) Insert the elements of one array into a fast lookup table (hash table, balanced binary tree, etc.), and then look up each element of the other array in the lookup table.
Sort both arrays (e.g., qsort()) and then walk through both arrays one element at a time.
Where there is a match, add it to a third array, which is sized to match the larger of the two input arrays (your result array can be no larger than the largest of the two arrays). Use a negative or other "dummy" value as your terminator.
When walking through input arrays, where one value in the first array is larger than the other, move the index of the second array, and vice versa.
When you're done walking through both arrays, your third array has your answer, up to the terminator value.

What is half open range and off the end value

What do these terminologies mean in C++?
1. off the end value
2. half open range - [begin, off_the_end)
I came across them while reading about for loops.
A half-open range is one which includes the first element, but excludes the last one.
The range [1,5) is half-open, and consists of the values 1, 2, 3 and 4.
"off the end" or "past the end" refers to the element just after the end of a sequence, and is special in that iterators are allowed to point to it (but you may not look at the actual value, because it doesn't exist)
For example, in the following code:
char arr[] = {'a', 'b', 'c', 'd'};
char* first = arr
char* last = arr + 4;
first now points to the first element of the array, while last points one past the end of the array. We are allowed to point one past the end of the array (but not two past), but we're not allowed to try to access the element at that position:
// legal, because first points to a member of the array
char firstChar = *first;
// illegal because last points *past* the end of the array
char lastChar = *last;
Our two pointers, first and last together define a range, of all the elements between them.
If it is a half open range, then it contains the element pointed to by first, and all the elements in between, but not the element pointed to by last (which is good, because it doesn't actually point to a valid element)
In C++, all the standard library algorithms operate on such half open ranges. For example, if I want to copy the entire array to some other location dest, I do this:
std::copy(first, last, dest)
A simple for-loop typically follows a similar pattern:
for (int i = 0; i < 4; ++i) {
// do something with arr[i]
}
This loop goes from 0 to 4, but it excludes the end value, so the range of indices covered is half-open, specifically [0, 4)
These aren't C++ specific terms, they are general maths terms.
[] and () denote whether the range is inclusive/exclusive of the endpoint:
[ includes the endpoint
( excludes the endpoint
[] = 'Closed', includes both endpoints
() = 'Open', excludes both endpoints
[) and (] are both 'half-open', and include only one endpoint
Most C++ for-loops cover a half-open range (you include the first element: e.g for int i=0;, but exclude the final element: i < foo, not i ≤ foo)
As explained on other answers, half-open range is also a mathematical term and usage of this term in programming context, it is implied that the starting point is included and end point is excluded.
What does it actually mean in the context of programming in C/C++ ? Let's say, you are going to print the elements of an integer array. Speaking for the C language, because that you have no any run-time knowledge for the size of the array, you have two choice. Either you have to provide the size of the array and thus, the function signature will be as below;
void printArray(int * array, int size);
or you have to use the half-open range, which means, you have to provide both begin and end pointer (and function is going to process including the begin, excluding the end) additional to the array itself. And the function signature will be as below;
void printArray(int * array, int * begin, int * end);
To illustrate, here is an example for providing the size of the array;
#include <stdio.h>
void printArray(int * array, int size)
{
printf("Array: ");
for(int i = 0; i < size; i++)
printf("%2d ", array[i]);
printf("\n");
}
int main()
{
int array[5] = { 1, 2, 3, 4, 5 };
printArray(array, 5);
return 0;
}
In the example above, we have passed two parameters to the printArray function as it is obvious on the function signature, the pointer to the first element of the array (or the array itself), and the size of the array.
However, as I have written above, we can also use the half-opening range in the function signature which can be seen as below;
#include <stdio.h>
void printArray(int * array, int * begin, int * end)
{
printf("Array: ");
for(int * index = begin; index != end; index++)
printf("%2d ", *index);
printf("\n");
}
int main()
{
int array[5] = { 1, 2, 3, 4, 5 };
printArray(array, array, array+5);
return 0;
}
Both of the code will produce the same output as can be seen below;
Array: 1 2 3 4 5
As you can see, the printArray function prints the function for the range [begin, end). The index which is actually is a pointer to the elements of the integer array, starts from begin, and it includes the begin and the for-loop ends up when index equals to the end pointer, excluding to process the end. This i called half-open range.
Half-open range is the C++ convention.