MSVC Access Violation when setting array elements - c++

I have been struggling in finding an explanation to an error I get in the following code:
#include <stdlib.h>
int main() {
int m=65536;
int n=65536;
float *a;
a = (float *)malloc(m*n*sizeof(float));
for (int i = 0; i < m; i++){
for (int j = 0; j < n; j++){
a[i*n + j] = 0;
}
}
return 0;
}
Why do I get an "Access Violation" Error when executing this program?
The memory allocation is succesful, the problem is in the nested for loops at some iteration count. I tried with a smaller value of m&n and the program works.
Does this mean I ran out of memory?

The problem is that m*n*sizeof(float) is likely an overflow, resulting in a relatively small value. Thus the malloc works, but it does not allocate as much memory as you're expecting and so you run off the end of the buffer.
Specifically, if your ints are 32 bits wide (which is common), then 65336 * 65336 is already an overflow, because you would need at least 33 bits to represent it. Signed integer overflows in C++ (and I believe in C) result in undefined behavior, but a common result is that the most significant bits are lopped off, and you're left with the lower ones. In your case, that gives 0. That's then multiplied by sizeof(float), but zero times anything is still zero.
So you've tried to allocate 0 bytes. It turns out that malloc will let you do that, and it will give back a valid pointer rather than a null pointer (which is what you'd get if the allocation failed). (See Edit below.)
So you have a valid pointer, but it's not valid to dereference it. That fact that you are able to dereference it at all is a side-effect of the implementation: In order to generate a unique address that doesn't get reused, which is what malloc is required to do when you ask for 0 bytes, malloc probably allocated a small-but-non-zero number of bytes. When you try to reference far enough beyond those, you'll typically get an access violation.
EDIT:
It turns out that what malloc does when requesting 0 bytes may depend on whether you're using C or C++. In the old days, the C standard required a malloc of 0 bytes to return a unique pointer as a way of generating "special" pointer values. In modern C++, a malloc of 0 bytes is undefined (see Footnote 35 in Section 3.7.4.1 of the C++11 standard). I hadn't realized malloc's API had changed in this way when I originally wrote the answer. (I love it when a newbie question causes me to learn something new.) VC++2013 appears to preserve the older behavior (returning a unique pointer for an allocation of 0 bytes), even when compiling for C++.

You are victim of 2 problems.
First the size calculation:
As some people have pointned out, you are exceeding the range of size_t. You can verify the size that you are trying to allocate with this code:
cout << "Max size_t is: " << SIZE_MAX<<endl;
cout << "Max int is : " << INT_MAX<<endl;
long long lsz = static_cast<long long>(m)*n*sizeof(float); // long long to see theoretical result
size_t sz = m*n*sizeof(float); // real result with overflow as will be used by malloc
cout << "Expected size: " << lsz << endl;
cout << "Requested size_t:" << sz << endl;
You'll be surprised but with MSVC13, you are asking 0 bytes because of the overflow (!!). You might get another number with a different compiler (resulting in a lower than expected size).
Second, malloc() might return a problem pointer:
The call for malloc() could appear as successfull because it does not return nullptr. The allocated memory could be smaller than expected. And even requesting 0 bytes might appear as successfull, as documented here: If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.
float *a = reinterpret_cast<float*>(malloc(m*n*sizeof(float))); // prefer casts in future
if (a == nullptr)
cout << "Big trouble !"; // will not be called
Alternatives
If you absolutely want to use C, prefer calloc(), you'll get at least a null pointer, because the function notices that you'll have an overflow:
float *b = reinterpret_cast<float*>(calloc(m,n*sizeof(float)));
But a better approach would be to use the operator new[]:
float *c = new (std::nothrow) float[m*n]; // this is the C++ way to do it
if (c == nullptr)
cout << "new Big trouble !";
else {
cout << "\nnew Array: " << c << endl;
c[n*m-1] = 3.0; // check that last elements are accessible
}
Edit:
It's also subject to the size_t limit.
Edit 2:
new[] throws bad_alloc exceptions when there is a problem, or even bad_array_new_length. You could try/catch these if you want. But if you prefer to get nullptr when there's not enough memory, you have to use (std::nothrow) as pointed out in the comments by Beat.
The best approach for your case, if you really need these huge number of floats, would be to go for vectors. As they are also subject to size_t limitation, but as you have in fact a 2D array, you could use vectors of vectors (if you have enough memory):
vector <vector<float>> v (n, vector<float>(m));

Related

C++ new[] operator creates array of length = length + 1?

Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?
It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.
If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.
At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.
There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.

Trouble with listing elements in a pointer

I am working on a program in c++ in which the user can add phone numbers to a list. For this assignment, we have to use pointers while dynamically allocating the memory needed. The code below works fine, except for the fact that when the program lists the elements in the pointer, random numbers are spit out. I'm new to c++ so any ways I could be pointed into the right direction of fixing this issue are greatly appreciated.
int *FirstArray = new int(size);
int *SecondArray = new int(size + 1);
if (size == 0) {
cout << "Please enter the number which you would like to add";
cin >> FirstArray[size];
for (int x = 0; x <= size; x++) {
cout << x << ". " << FirstArray[x] << endl;
}
for (int x = 0; x <= size; x++) {
FirstArray[x] = SecondArray[x];
}
SecondArray = FirstArray;
delete (FirstArray);
}
else {
cout << "Please enter the number which you would like to add";
cin >> SecondArray[size];
for (int x = 0; x <= size; x++) {
cout << x + 1 << ". " << SecondArray[x] << endl;
}
}
size++;
Apart from the fact that a std::vector would be really the better choice for such application I think learning about pointers is a good starting point to understand why the usage of std-containers is better.
The whole if(size==0)-block in your code snippet is unsafe as well as the else-scope in further consequence because FirstArray[x] reads from memory which is not allocated at least for every x > 0.
So called segmentation faults are then very likely in such cases though such may be defered in case of debugger friendly memory layout or other reasons.
Besides the fact that you then never had really a list but just two values refered by two single-element arrays (or just pointers) it's then clear why you get only random numbers from the memory pointed to by the pointers.
A pointer in C (or C++) is not restricting the access to succeeding elements behind the first element.
This means, that pointers can be used for either single values (which is exactly the same as an array with size == 1) and arrays with more than one element.
Some more issues...
Use new int[] rather than new int() because in this context curved brackets () is understood as argument list to the compiler generated 'constructor' of the data type 'int' which in case of int() just sets the value. C++ is consequently applying its type paradigms to primitive types as well and not only classes. See another SO article on this topic
Using new int[size] instead does what you want. It allocates memory for an integer array with 'size' elements and returns the pointer to the first element.
I think you do not need a SecondArray. A statement like "SecondArray = FirstArray" is anyway not copying the elements. It's copying the pointers and leaving the memory allocated to SecondArray behind as a memory leak.
Deleting then FirstArray with "delete (FirstArray)" makes it even worse because then you delete FirstArray and SecondArray at once because both point to the same memory location and any further access to SecondArray would be dangerous (segfault etc.)
Incrementing size++ at the end is as well in vain (if I got your idea right) because the size should be clear before you allocate and access the memory, not afterwards.
Resizing the array in case that 'size' changes can be done either by calling new(FirstArray)[size] (which is seldomly used directly but common in std-containers) or by consequently giving up using C++ and switching to the ANSI C style with malloc() for initial allocation, realloc() for resizing, memcpy() for copying/assignment and finally free() for deallocation. But switching to ANSI C style in this case doesn't mean that you are not allowed to use it in a C++ context. BTW, in most standard C++ frameworks the new-operator and the delete-operator call malloc() and free() behind the scenes.
At the end of the day, using std::vector<> can make life MUCH easier ;-)

C++ dynamically allocated 2D array overallocate memory?

I started learning C++ and I wanted to implement a simple 2D array and get its size without using std::vector. However I run into weird errors with my second dimension:
int **data= new int*[2];
for (int i = 0; i<2;i++){
data[i] = new int[3];
}
data[0][0] = 1;
data[0][1] = 2;
data[0][2] = 3;
data[1][0] = 4;
data[1][1] = 5;
data[1][2] = 6;
data[1][25] = 20; //Should segfault? AAAAA
cout << "Data[1][25] = " << data[1][25] << endl; //Should segfault, no?
int n = sizeof(data[0]) / sizeof(int);
int m = sizeof(data) / sizeof(int);
cout << "M is " << m << " N is " << n << endl;// Reports m = 2, n =2?!?!? BBBB
At AAAA I should be getting segfault, no? Instead I am able to assign a value and later read it. The value of data[1][any] is zero, like it has been initialized. This is only a problem in the second dimension, the first dimension behaves as expected.
Later at BBBB I am not getting an accurate size for n. Am I doing something wrong?
C++ does not do bound checking on arrays. Accessing data outside the bounds of an array is undefined behavior and anything can happen. It may cause a segfault it might not. If you have valid memory regions before or after the array you can end up accessing or modifying that memory instead. This can lead to corruption of other data used by your program.
Also you use of sizeof is incorrect. sizeof is a compile time construct. It cannot be used to determine the size of an array through a pointer value at runtime. If you need that type of functionality use std::array or std::vector.
char somearray[10];
int size = sizeof(somearray); // result is 10. Yay it works.
char *somearrayptr = new char[10];
int size = sizeof(somearrayptr); // size = the size of char* not char[10].
At AAAA you have undefined behavior. Just anything can happen from that point on -- and more interesting, even before.
In standard C++ there is no such behavior as 'segfault'. And implementation could define some operations to do that but I'm not aware if any ever bothered. It just happens by chance for some cases.
Accessing an array outside its boundaries is undefined behavior. So there is no reason to expect anything in particular will happen: it could crash, return the right answer, return the wrong answer, silently corrupt data in another part of the program, or a whole host of other possibilities.
data[1][25] = 20; //Should segfault? AAAAA
It would segfault if you are not allowed to access the location. There is no checking in C++ to see if the location you are accessing is valid frrm the code-point of view.
You obtained an output because that was stored at that location. It could have been anything. This is undefined behaviour and you may not get the same result everytime.
See this answer, though it talks abput local variables , but it gives nice examples about how such accessing of data can be undefined behaviour
data and data[0] are both pointers (doesn't matter single or double). They have a defined size for every implementation. In your case, size of pointer is twice that of size of int on your machine. Hence, the output. sizeof when used with pointers pointing to arrays (and not arrays i.e. ones declared as arrays char a[] etc) gives the size of the pointer
Both data[0] and data are pointers. Pointers will be size 4 on a 32-bit system, and 8 on a 64-bit system. Therefore, m and n are equal. Size of int is always 4.

Pointer or Value in my case?

bool example1()
{
long a;
a = 0;
cout << a;
a = 1;
cout << a;
a = 2;
cout << a;
//and again...again until
a = 1000000;
cout << a+1;
return true;
}
bool example2()
{
long* a = new long;//sorry for the misstake
*a = 0;
cout << *a;
*a = 1;
cout << *a;
*a = 2;
cout << *a;
//and again...again until
*a = 1000000;
cout << *a + 1;
return true;
}
Note that I do not delete a in example2(), just a newbie's questions:
1. When the two functions are executing, which one use more memories?
2. After the function return, which one make the whole program use more memories?
Thanks for your help!
UPATE: just repace long* a; with long* a = new long;
UPDATE 2: to avoid the case that we are not doing anything with a, I cout the value each time.
Original answer
It depends and there will be no difference, at the same time.
The first program is going to consume sizeof(long) bytes on the stack, and the second is going to consume sizeof(long*). Typically long* will be at least as big as a long, so you could say that the second program might use more memory (depends on the compiler and architecture).
On the other hand, stack memory is allocated with OS memory page granularity (4KB would be a good estimate), so both programs are almost guaranteed to use the same number of memory pages for the stack. In this sense, from the viewpoint of someone observing the system, memory usage is going to be identical.
But it gets better: the compiler is free to decide (depending on settings) that you are not really doing anything with these local variables, so it might decide to simply not allocate any memory at all in both cases.
And finally you have to answer the "what does the pointer point to" question (as others have said, the way the program is currently written it will almost surely crash due to accessing invalid memory when it runs).
Assuming that it does not (let's say the pointer is initialized to a valid memory address), would you count that memory as being "used"?
Update (long* a = new long edit):
Now we know that the pointer will be valid, and heap memory will be allocated for a long (but not released!). Stack allocation is the same as before, but now example2 will also use at least sizeof(long) bytes on the heap as well (in all likelihood it will use even more, but you can't tell how much because that depends on the heap allocator in use, which in turn depends on compiler settings etc).
Now from the viewpoint of someone observing the system, it is still unlikely that the two programs will exhibit different memory footprints (because the heap allocator will most likely satisfy the request for the new long in example2 from memory in a page that it has already received from the OS), but there will certainly be less free memory available in the address space of the process. So in this sense, example2 would use more memory. How much more? Depends on the overhead of the allocation which is unknown as discussed previously.
Finally, since example2 does not release the heap memory before it exits (i.e. there is a memory leak), it will continue using heap memory even after it returns while example1 will not.
There is only one way to know, which is by measuring. Since you never actually use any of the values you assign, a compiler could, under the "as-if rule" simply optimize both functions down to:
bool example1()
{
return true;
}
bool example2()
{
return true;
}
That would a perfectly valid interpretation of your code under the rules of C++. It's up to you to compile and measure it to see what actually happens.
Sigh, an edit to the question made a difference to the above. The main point still stands: you can't know unless you measure it. Now both of the functions can be optimized to:
bool example1()
{
cout << 0;
cout << 1;
cout << 2;
//and again...again until
cout << 1000001;
return true;
}
bool example2()
{
cout << 0;
cout << 1;
cout << 2;
//and again...again until
cout << 1000001;
return true;
}
example2() never allocates memory for the value referenced by pointer a. If it did, it would take slightly more memory because it would require the space required for a long as well as space for the pointer to it.
Also, no matter how many times you assign a value to a, no more memory is used.
example 2 has a problem of not allocating memory for the pointer. Pointer a initially has an unknown value which makes it to point to somewhere in memory. assigning values to this pointer corrupts the content of that somewhere.
both examples use same amount of memory. (which is 4 bytes.)

In C++, what happens when the delete operator is called?

In C++, I understand that the delete operator, when used with an array, 'destroys' it, freeing the memory it used. But what happens when this is done?
I figured my program would just mark off the relevant part of the heap being freed for re-usage, and continue on.
But I noticed that also, the first element of the array is set to null, while the other elements are left unchanged. What purpose does this serve?
int * nums = new int[3];
nums[0] = 1;
nums[1] = 2;
cout << "nums[0]: " << *nums << endl;
cout << "nums[1]: " << *(nums+1) << endl;
delete [] nums;
cout << "nums[0]: " << *nums << endl;
cout << "nums[1]: " << *(nums+1) << endl;
Two things happen when delete[] is called:
If the array is of a type that has a nontrivial destructor, the destructor is called for each of the elements in the array, in reverse order
The memory occupied by the array is released
Accessing the memory that the array occupied after calling delete results in undefined behavior (that is, anything could happen--the data might still be there, or your program might crash when you try to read it, or something else far worse might happen).
The reasons for it being NULL are up to the heap implementation.
Some possible reasons are that it is using the space for it's free-space tracking. It might be using it as a pointer to the next free block. It might be using it to record the size of the free block. It might be writing in some serial number for new/delete debug tracking.
It could just be writing NULL because it feels like it.
Whenever someone says int* nums = new int[3], the runtime system is required to store the number of objects, 3, in a place that can be retrieved knowing only the pointer, nums. The compiler can use any technique it wants to use, but there are two popular ones.
The code generated by nums = new int[3] might store the number 3 in a static associative array, where the pointer nums is used as the lookup key and the number 3 is the associated value. The code generated by delete[] nums would look up the pointer in the associative array, would extract the associated size_t, then would remove the entry from the associative array.
The code generated by nums = new int[3] might allocate an extra sizeof(size_t) bytes of memory (possibly plus some alignment bytes) and put the value 3 just before the first int object. Then delete[] nums would find 3 by looking at the fixed offset before the first int object (that is, before *num) and would deallocate the memory starting at the beginning of the allocation (that is, the block of memory beginning the fixed offset before *nums).
Neither technique is perfect. Here are a few of the tradeoffs.
The associative array technique is slower but safer: if someone forgets the [] when deallocating an array of things, (a) the entry in the associative array would be a leak, and (b) only the first object in the array would be destructed. This may or may not be a serious problem, but at least it might not crash the application.
The overallocation technique is faster but more dangerous: if someone says delete nums where they should have said delete[] nums, the address that is passed to operator delete(void* nums) would not be a valid heap allocation—it would be at least sizeof(size_t) bytes after a valid heap allocation. This would probably corrupt the heap. - C++ FAQs