I started learning C++ and I wanted to implement a simple 2D array and get its size without using std::vector. However I run into weird errors with my second dimension:
int **data= new int*[2];
for (int i = 0; i<2;i++){
data[i] = new int[3];
}
data[0][0] = 1;
data[0][1] = 2;
data[0][2] = 3;
data[1][0] = 4;
data[1][1] = 5;
data[1][2] = 6;
data[1][25] = 20; //Should segfault? AAAAA
cout << "Data[1][25] = " << data[1][25] << endl; //Should segfault, no?
int n = sizeof(data[0]) / sizeof(int);
int m = sizeof(data) / sizeof(int);
cout << "M is " << m << " N is " << n << endl;// Reports m = 2, n =2?!?!? BBBB
At AAAA I should be getting segfault, no? Instead I am able to assign a value and later read it. The value of data[1][any] is zero, like it has been initialized. This is only a problem in the second dimension, the first dimension behaves as expected.
Later at BBBB I am not getting an accurate size for n. Am I doing something wrong?
C++ does not do bound checking on arrays. Accessing data outside the bounds of an array is undefined behavior and anything can happen. It may cause a segfault it might not. If you have valid memory regions before or after the array you can end up accessing or modifying that memory instead. This can lead to corruption of other data used by your program.
Also you use of sizeof is incorrect. sizeof is a compile time construct. It cannot be used to determine the size of an array through a pointer value at runtime. If you need that type of functionality use std::array or std::vector.
char somearray[10];
int size = sizeof(somearray); // result is 10. Yay it works.
char *somearrayptr = new char[10];
int size = sizeof(somearrayptr); // size = the size of char* not char[10].
At AAAA you have undefined behavior. Just anything can happen from that point on -- and more interesting, even before.
In standard C++ there is no such behavior as 'segfault'. And implementation could define some operations to do that but I'm not aware if any ever bothered. It just happens by chance for some cases.
Accessing an array outside its boundaries is undefined behavior. So there is no reason to expect anything in particular will happen: it could crash, return the right answer, return the wrong answer, silently corrupt data in another part of the program, or a whole host of other possibilities.
data[1][25] = 20; //Should segfault? AAAAA
It would segfault if you are not allowed to access the location. There is no checking in C++ to see if the location you are accessing is valid frrm the code-point of view.
You obtained an output because that was stored at that location. It could have been anything. This is undefined behaviour and you may not get the same result everytime.
See this answer, though it talks abput local variables , but it gives nice examples about how such accessing of data can be undefined behaviour
data and data[0] are both pointers (doesn't matter single or double). They have a defined size for every implementation. In your case, size of pointer is twice that of size of int on your machine. Hence, the output. sizeof when used with pointers pointing to arrays (and not arrays i.e. ones declared as arrays char a[] etc) gives the size of the pointer
Both data[0] and data are pointers. Pointers will be size 4 on a 32-bit system, and 8 on a 64-bit system. Therefore, m and n are equal. Size of int is always 4.
Related
Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?
It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.
If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.
At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.
There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.
I am running the following code where I declare a dynamic 2D array, and then go on to assign values at column indexes higher than the number columns actually allocated for the dynamic array. However, when I do this the code runs perfectly and I don't get an error, which I believe I should get.
void main(){
unsigned char **bitarray = NULL;
bitarray = new unsigned char*[96];
for (int j = 0; j < 96; j++)
{
bitarray[j] = new unsigned char[56];
if (bitarray[j] == NULL)
{
cout << "Memory could not be allocated for 2D Array.";
return;// return if memory not allocated
}
}
bitarray[0][64] = '1';
bitarray[10][64] = '1';
cout << bitarray[0][64] << " " << bitarray[10][64];
getch();
return;
}
The link to the output I get is here (The values are actually assigned accurately, don't know why, though).
In C++, accessing a buffer out of its bounds invokes undefined behavior (not a trapped error, as you expected).
The C++ specification defines the term undefined behavior as:
behavior for which this International Standard imposes no requirements.
In your code, both
bitarray[0][64] = '1';
bitarray[10][64] = '1';
are accessing memory out-of-bound,. i.e., those memory locations are "invalid". Accessing invalid memory invokes undefined behaviour.
The access violation error or segmentation fault is one of the many possible outcomes of UB. Nothing is guaranteed.
From the wiki page for segmentation fault,
On systems using hardware memory segmentation to provide virtual memory, a segmentation fault occurs when the hardware detects an attempt to refer to a non-existent segment, or to refer to a location outside the bounds of a segment, .....
so, maybe, just maybe, the memory area for bitarray[0][64] is inside the allocated page (segment) which is accessible (but invalid anyway) by the program , in this very particular case. That does not mean it will be, always.
That said, void main() is not a correct signature of main() function. The recommended (C++11,§3.6.1) signature of main() is int main(void).
C++11 introduced std::array and the method at() provides out of bounds checking.
I have been struggling in finding an explanation to an error I get in the following code:
#include <stdlib.h>
int main() {
int m=65536;
int n=65536;
float *a;
a = (float *)malloc(m*n*sizeof(float));
for (int i = 0; i < m; i++){
for (int j = 0; j < n; j++){
a[i*n + j] = 0;
}
}
return 0;
}
Why do I get an "Access Violation" Error when executing this program?
The memory allocation is succesful, the problem is in the nested for loops at some iteration count. I tried with a smaller value of m&n and the program works.
Does this mean I ran out of memory?
The problem is that m*n*sizeof(float) is likely an overflow, resulting in a relatively small value. Thus the malloc works, but it does not allocate as much memory as you're expecting and so you run off the end of the buffer.
Specifically, if your ints are 32 bits wide (which is common), then 65336 * 65336 is already an overflow, because you would need at least 33 bits to represent it. Signed integer overflows in C++ (and I believe in C) result in undefined behavior, but a common result is that the most significant bits are lopped off, and you're left with the lower ones. In your case, that gives 0. That's then multiplied by sizeof(float), but zero times anything is still zero.
So you've tried to allocate 0 bytes. It turns out that malloc will let you do that, and it will give back a valid pointer rather than a null pointer (which is what you'd get if the allocation failed). (See Edit below.)
So you have a valid pointer, but it's not valid to dereference it. That fact that you are able to dereference it at all is a side-effect of the implementation: In order to generate a unique address that doesn't get reused, which is what malloc is required to do when you ask for 0 bytes, malloc probably allocated a small-but-non-zero number of bytes. When you try to reference far enough beyond those, you'll typically get an access violation.
EDIT:
It turns out that what malloc does when requesting 0 bytes may depend on whether you're using C or C++. In the old days, the C standard required a malloc of 0 bytes to return a unique pointer as a way of generating "special" pointer values. In modern C++, a malloc of 0 bytes is undefined (see Footnote 35 in Section 3.7.4.1 of the C++11 standard). I hadn't realized malloc's API had changed in this way when I originally wrote the answer. (I love it when a newbie question causes me to learn something new.) VC++2013 appears to preserve the older behavior (returning a unique pointer for an allocation of 0 bytes), even when compiling for C++.
You are victim of 2 problems.
First the size calculation:
As some people have pointned out, you are exceeding the range of size_t. You can verify the size that you are trying to allocate with this code:
cout << "Max size_t is: " << SIZE_MAX<<endl;
cout << "Max int is : " << INT_MAX<<endl;
long long lsz = static_cast<long long>(m)*n*sizeof(float); // long long to see theoretical result
size_t sz = m*n*sizeof(float); // real result with overflow as will be used by malloc
cout << "Expected size: " << lsz << endl;
cout << "Requested size_t:" << sz << endl;
You'll be surprised but with MSVC13, you are asking 0 bytes because of the overflow (!!). You might get another number with a different compiler (resulting in a lower than expected size).
Second, malloc() might return a problem pointer:
The call for malloc() could appear as successfull because it does not return nullptr. The allocated memory could be smaller than expected. And even requesting 0 bytes might appear as successfull, as documented here: If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.
float *a = reinterpret_cast<float*>(malloc(m*n*sizeof(float))); // prefer casts in future
if (a == nullptr)
cout << "Big trouble !"; // will not be called
Alternatives
If you absolutely want to use C, prefer calloc(), you'll get at least a null pointer, because the function notices that you'll have an overflow:
float *b = reinterpret_cast<float*>(calloc(m,n*sizeof(float)));
But a better approach would be to use the operator new[]:
float *c = new (std::nothrow) float[m*n]; // this is the C++ way to do it
if (c == nullptr)
cout << "new Big trouble !";
else {
cout << "\nnew Array: " << c << endl;
c[n*m-1] = 3.0; // check that last elements are accessible
}
Edit:
It's also subject to the size_t limit.
Edit 2:
new[] throws bad_alloc exceptions when there is a problem, or even bad_array_new_length. You could try/catch these if you want. But if you prefer to get nullptr when there's not enough memory, you have to use (std::nothrow) as pointed out in the comments by Beat.
The best approach for your case, if you really need these huge number of floats, would be to go for vectors. As they are also subject to size_t limitation, but as you have in fact a 2D array, you could use vectors of vectors (if you have enough memory):
vector <vector<float>> v (n, vector<float>(m));
According to the correct answer in Static array vs. dynamic array in C++ static arrays have fixed sizes.
However, this compiles and runs just fine:
int main(int argc, char** argv) {
int myArray[2];
myArray[0] = 0;
myArray[1] = 1;
cout<<myArray[0]<<endl;
cout<<myArray[1]<<endl;
myArray[4];
myArray[2] = 2;
myArray[3] = 3;
cout<<myArray[2]<<endl;
cout<<myArray[3]<<endl;
return 0;
}
Does this mean a static array can be resized?
You're not actually enlarging the array. Let's see your code in detail:
int myArray[2];
myArray[0] = 0;
myArray[1] = 1;
You create an array of two positions, with indexes from 0 to 1. So far, so good.
myArray[4];
You're accessing the fifth element in the array (an element which surely does not exist in the array). This is undefined behaviour: anything can happen. You're not doing anything with that element, but that is not important.
myArray[2] = 2;
myArray[3] = 3;
Now you are accessing elements three and four, and changing their values. Again, this is undefined behaviour. You are changing memory locations near to the created array, but "nothing else". The array remains the same.
Actually, you could check the size of the array by doing:
std::cout << sizeof( myArray ) / sizeof( int ) << std::endl;
You'll check that the size of the array has not changed. BTW, this trick works in the same function in which the array is declared, as soon you pass it around it decays into a pointer.
In C++, the boundaries of arrays are not checked. You did not receive any error or warning mainly because of that. But again, accessing elements beyond the array limit is undefined behaviour. Undefined behaviour means that it is an error that may be won't show up immediately (something that is apparently good, but is actually not). Even the program can apparently end without problems.
No, not a chance in hell. All you've done is illegally access it outside it's bounds. The fact that this happens to not throw an error for you is utterly irrelevant. It is thoroughly UB.
First, this is not a static array, it is an array allocated in the automatic storage.
Next, the
myArray[4];
is not a new declaration, it is a discarded read from element #4 of the previously declared 2-element array - an undefined behavior.
Assignments that follow
myArray[2] = 2;
myArray[3] = 3;
write to the memory that is not allocated to your program - an undefined behavior as well.
I've been studying C++ for a test and I am currently stuck with pointer arithmetic.
The basic problem is the following:
int numColumns = 3;
int numRows = 4;
int a[numRows][numColumns];
a[0][0] = 1;
a[0][1] = 2;
a[0][2] = 3;
a[1][0] = 4;
a[1][1] = 5;
a[1][2] = 6;
a[2][0] = 7;
a[2][1] = 8;
a[2][2] = 9;
a[3][0] = 10;
a[3][1] = 11;
a[3][2] = 12;
for (int i=numColumns-1; i>-1;i--)
{
cout << a[numRows-1][i] << endl;
}
A very simple program which prints the lower "row of the matrix". i.e. 12,11,10.
Now I am trying to do the equivalent with a int*.
What I have been told by my classmates is to think it like this:
array[i][j] == p[numColumns*i+j]
If that is correct, shouldn't the following be equivalent to what I'm looking for:
int* p = reinterpret_cast<int*> a;
for (int i=numColumns-1; i>-1;i--)
{
cout << p[numColumns*(numRows-1)+i] << endl;
}
Thanks.
int array[3][5] is NOT an abstraction (in the C++ language) for int array[3*5]. The standard says that a 2 dimensional array (and N-dimensional arrays in general) are arrays of arrays. That array[3][5] is an array of three elements, where each element is an array containing 5 elements (integers in this case). C++'s type system does make that distinction.
According to the C++ standard, and array T array[N] is a contiguous block of memory containing the N elements of type T. So that means that a multidimensional array, let's say int array[3][5] will be a continuous block of memory containing 3 int[5] arrays, and each int[5] array is a contiguous block of 5 ints.
On my machine, the memory ends up laid out exactly as you would expect - identical to int array[3*5]. The way the memory is treated is different however, due to the type system (which distinguishes between int[] and int[][]). This is why you need to use a reinterpret_cast which essentially tells your compiler "take this memory and without doing any conversion, treat it like this new type".
I'm not completely sure if this memory layout is guaranteed however. I couldn't find anything in the standard stating that arrays can't be padded. If they can be padded (again, I'm not sure) then it's possible that the int[5] array is not actually 5 elements long (a better example would be char[5], which I could see being padded to 8 bytes).
Also there is an appreciable difference between int* and int** since the latter doesn't guarantee contiguous memory.
EDIT: The reason that C++ distinguishes between int[3*5] and int[3][5] is because it wants to guarantee the order of the elements in memory. In C++ int[0][1] and int[0][2] are sizeof(int) apart in memory. However in Fortran, for example, int[0][0] and int[1][0] are sizeof(int) apart in memory because Fortran uses column major representation.
Here's a diagram to help explain:
0 1 2
3 4 5
6 7 8
Can be made into an array that looks like {0,1,2,3,4,5,6,7,8} or an array that looks like: {0,3,6,1,4,7,2,5,8}.
Hint: in your original code, the type of a is more similar to int**, so you shouldn't cast it to int*. It is a pointer to pointer to something.
If you want to access it like an 1-D array, then a has to be defined as an 1-D array as well.
#rwong: Really? I thought that multi-dimensionals arrays were just an "abstraction" for us, since the following are equivalent:
int array[3][5];
int array[3*5];
Anyways, I detemined what was wrong. As usual it was not my code, but copy-pasting someone's code and working from there.
What I had was this:
for(int i=numRows-1; i>-1 ;i++)
{
cout << p[numColumns*numRows-1+i] << endl;
}
Is funny because I did not copy-paste my code from VS, but actually wrote it from scratch to "illustrate" my error.
Lesson to be learnt here ;)
Edit: I'm still not sure about what rwong explained here. Would anyone care to elaborate?
Another way to think about it: since a is similar to an int**, is there a part of a that's similar to an int*?