Meaning behind memory surrounding array c++ - c++

I've been lately experimenting with dynamically allocated arrays. I got to conclusion that they have to store their own size in order to free the memory.
So I dug a little in memory with pointers and found that 6*4 bytes directly before and 1*4 bytes directly after array don't change upon recompilation (aren't random garbage).
I represented these as unsigned int types and printed them out in win console:
Here's what I got:
(array's content is between fdfdfdfd uints in representation)
So I figured out that third unsigned int directly before the array's first element is the size of allocated memory in bytes.
However I cannot find any information about rest of them.
Q: Does anyone know what the memory surrounding array's content means and care to share?
The code used in program:
#include <iostream>
void show(unsigned long val[], int n)
{
using namespace std;
cout << "Array length: " << n <<endl;
cout << "hex: ";
for (int i = -6; i < n + 1; i++)
{
cout << hex << (*(val + i)) << "|";
}
cout << endl << "dec: ";
for (int i = -6; i < n + 1; i++)
{
cout << dec << (*(val + i)) << "|";
}
cout << endl;
}
int main()
{
using namespace std;
unsigned long *a = new unsigned long[15]{ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 };
unsigned long *b = new unsigned long[15]{ 0 };
unsigned long *c = new unsigned long[17]{ 0 };
show(a, 15);
cout << endl;
show(b, 15);
cout << endl;
show(c, 17);
cout << endl;
cout << endl;
system("PAUSE");
delete[] a;
delete[] b;
delete[] c;
}

It typically means that you carried out your experiments using a debugging configuration of the project and debugging version of the standard library. That version of the library uses some pre-defined bit-patterns to mark the boundaries of each allocated memory block ("no man's land" areas). Later, it checks if these bit-patterns survived intact (e.g. at the moment of delete[]). If they did not, it implies that someone wrote beyond the boundaries of the memory block. Debug version of the library will issue a diagnostic message about the problem.
If you compile your test program in release (optimized) configuration with release (optimized) version of the standard library, these "no man's land" areas will not be created, these bit-patterns will disappear from memory and the associated memory checks will disappear from the code.
Note also the the memory layout you observed is typically specific for arrays of objects with no destructors or with trivial destructors (which is basically the same thing). In your case you were working with plain unsigned long.
Once you start allocating arrays of objects with non-trivial destructors, you will observe that it is not just the size of memory block (in bytes) that's stored by the implementation, but the exact size of the array (in elements) is typically stored there as well.

"I got to conclusion that they have to store their own size in order to free the memory." No they don't.
Array does not free it's memory. You never get an array from new/malloc. You get a pointer to memory under which you can store an array, but if you forget size you have requested you cannot get it back. The standard library often does depend on OS under the hood as well.
And even OS does not have to remember it. There are implementations with very simple memory management which basically returns you current pointer to the free memory, and move the pointer by the requested size. free does nothing and freed memory is forgotten.
Bottom line, memory management is implementation defined, and outside of what you get nothing is guaranteed. Compiler or OS can mess with it, so you need to look documentation specific for the environment.
Bit patterns that you talk about, are often used as safe guards, or used for debugging. E.g: When and why will an OS initialise memory to 0xCD, 0xDD, etc. on malloc/free/new/delete?

Related

Memory allocation in OS

Having this simple code in C++:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class Empty{};
int main() {
array<unique_ptr<Empty>, 1024> empties;
for(size_t i =0; i < 1024; i++){
empties[i] = make_unique<Empty>();
}
for(auto& element : empties){
cout << "ptr: " << element.get() << endl;
}
return 0;
}
when running in ideone.com or Windows we get following result:
ptr: 0x2b601f0c9ca0
ptr: 0x2b601f0c9cc0
ptr: 0x2b601f0c9ce0
ptr: 0x2b601f0c9d00
ptr: 0x2b601f0c9d20
ptr: 0x2b601f0c9d40
ptr: 0x2b601f0c9d60 ...
For me it's totally strange. what kind of allocation algorithms in OS or standard library might cause, that there happened no allocation at address, that ends with number different than 0?
The reason I did this experiment is, that given the OS uses buddy algorithm, that manages pages and allocation request will cause OS to allocate a piece of contiguous memory, then a couple of next allocations (until running out of allocated memory) should be allocated quite nearby. If that would be the case , then probably cache issue with lists would not be so significant in some cases , but I got results I was not expecting anyways.
Also second number from the right in allocated nodes is totally randomly. What may cause such a behavior?
The minimum size of a class in C++ is one byte, if I recall correctly. As there is a consistent 32 byte spacing between the class, it could be that that happens to be the sizeof the empty class you made. To determine this, try adding
std::cout << "Empty class size: " << sizeof(Empty) << std::endl;
It probably won't be 32 bytes, instead, there will probably be some consistent spacing between each object.
Note:
Does this compile for you. It doesn't for me because empties cant be implicitly initialised.
Please notice that the printed pointers is inaccurate, the OS allow you to see them as subsequent pointers when they could be allocated to an entirely different pages.

c++ memory allocation howto

I am just starting C++ and I can't understand how my code works:
Ok I allocate memory, but at the time of the allocation nobody knows the size of the memory to be allocated. But still the code works.
How much memory is allocated? How the compiler knows how much memory I will need?
EDIT:
Sorry if my question was not clear. Let me please try clarify it. So I dynamically allocate some memory in the heap by using my pointer. But since there is no text in the sting variable, in my opinion it is quite difficult to know how much text (bytes) I will enter via getline.
I tried asking the size of two different text literals, and yes they are different in size.
sizeof("") // is 1 (because of the ending 0 maybe?)
sizeof("sometext") // is 9
But for the string: the sizeof gives me 4 both times. It's clear that the sizeof() gives me the length of the pointer pointing to the string.
How can I allocate memory? If I allocate memory for a new string, only allocates to a pointer pointing to the memory address of the first character in the string?
Obviously the characters I enter must be stored somewhere. And I first allocate the memory, and then I load some text into it.
Edit 2: make the edited code to look code, not plain text.
//Edit:
string a,b = "sometext";
cout << sizeof(a) << endl; //4
cout << sizeof(b); //4
//--------------------------------------------------------
#include <iostream>
#include <string>
#include <exception>
using namespace std;
int main()
{
//Defining struct
struct musicCD
{
string artist, title; // artist of the CD
};
//Memory allocation
musicCD *ptr;
try{ptr = new musicCD;}
catch(bad_alloc){cerr << "Out of memory :(";return -1;}
catch(...){cerr << "Something bad happened :O !";return -1;
}
//Get the data to store:
cout << "Please enter the data for the CD!' " << endl;
cout << "Please enter the artist: ";getline(cin, ptr->artist); cout << endl;
//Write out the data
cout << "The data entered: " << endl;
cout << "The artist performing is: \t" << ptr->artist << endl;
delete ptr;
return 0;
}
It seems like you are confused about how std::string, or any dynamic container, handles the fact that it's memory requirements are not predetermined. std::string for example does not store it's character data internally. Simply put, it contains a pointer that points to another dynamic allocated buffer which contains the actual data. std::string has constructors, a destructor and assignment operators that automatically manage the extra buffer, which contains the actual character data. This including reallocating, copying the data, updating the internal pointer and freeing the previous buffer when extra storage is needed. The size of the buffer that contains the actual data does not count towards the size of std::string, only the pointer to it does. Every instance of std::string, throughout it's lifetime, only directly contains a constant number of members which all have constant sizes. In c++ all types have a compile time constant size.
See Rule of five for a simplified implementation of string showing how it works. The size of the class rule_of_five from this example is simply the size of char* regardless of the content of the buffer pointed to by this pointer. The actual buffer is allocated later, during or after construction, which is after the initial allocation for the object itself has already finished.
Edit: There are some cases where a string can store it's character data internally when dealing with very short strings. This is an optimization not generally seen in other containers. See this answer.

MSVC Access Violation when setting array elements

I have been struggling in finding an explanation to an error I get in the following code:
#include <stdlib.h>
int main() {
int m=65536;
int n=65536;
float *a;
a = (float *)malloc(m*n*sizeof(float));
for (int i = 0; i < m; i++){
for (int j = 0; j < n; j++){
a[i*n + j] = 0;
}
}
return 0;
}
Why do I get an "Access Violation" Error when executing this program?
The memory allocation is succesful, the problem is in the nested for loops at some iteration count. I tried with a smaller value of m&n and the program works.
Does this mean I ran out of memory?
The problem is that m*n*sizeof(float) is likely an overflow, resulting in a relatively small value. Thus the malloc works, but it does not allocate as much memory as you're expecting and so you run off the end of the buffer.
Specifically, if your ints are 32 bits wide (which is common), then 65336 * 65336 is already an overflow, because you would need at least 33 bits to represent it. Signed integer overflows in C++ (and I believe in C) result in undefined behavior, but a common result is that the most significant bits are lopped off, and you're left with the lower ones. In your case, that gives 0. That's then multiplied by sizeof(float), but zero times anything is still zero.
So you've tried to allocate 0 bytes. It turns out that malloc will let you do that, and it will give back a valid pointer rather than a null pointer (which is what you'd get if the allocation failed). (See Edit below.)
So you have a valid pointer, but it's not valid to dereference it. That fact that you are able to dereference it at all is a side-effect of the implementation: In order to generate a unique address that doesn't get reused, which is what malloc is required to do when you ask for 0 bytes, malloc probably allocated a small-but-non-zero number of bytes. When you try to reference far enough beyond those, you'll typically get an access violation.
EDIT:
It turns out that what malloc does when requesting 0 bytes may depend on whether you're using C or C++. In the old days, the C standard required a malloc of 0 bytes to return a unique pointer as a way of generating "special" pointer values. In modern C++, a malloc of 0 bytes is undefined (see Footnote 35 in Section 3.7.4.1 of the C++11 standard). I hadn't realized malloc's API had changed in this way when I originally wrote the answer. (I love it when a newbie question causes me to learn something new.) VC++2013 appears to preserve the older behavior (returning a unique pointer for an allocation of 0 bytes), even when compiling for C++.
You are victim of 2 problems.
First the size calculation:
As some people have pointned out, you are exceeding the range of size_t. You can verify the size that you are trying to allocate with this code:
cout << "Max size_t is: " << SIZE_MAX<<endl;
cout << "Max int is : " << INT_MAX<<endl;
long long lsz = static_cast<long long>(m)*n*sizeof(float); // long long to see theoretical result
size_t sz = m*n*sizeof(float); // real result with overflow as will be used by malloc
cout << "Expected size: " << lsz << endl;
cout << "Requested size_t:" << sz << endl;
You'll be surprised but with MSVC13, you are asking 0 bytes because of the overflow (!!). You might get another number with a different compiler (resulting in a lower than expected size).
Second, malloc() might return a problem pointer:
The call for malloc() could appear as successfull because it does not return nullptr. The allocated memory could be smaller than expected. And even requesting 0 bytes might appear as successfull, as documented here: If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.
float *a = reinterpret_cast<float*>(malloc(m*n*sizeof(float))); // prefer casts in future
if (a == nullptr)
cout << "Big trouble !"; // will not be called
Alternatives
If you absolutely want to use C, prefer calloc(), you'll get at least a null pointer, because the function notices that you'll have an overflow:
float *b = reinterpret_cast<float*>(calloc(m,n*sizeof(float)));
But a better approach would be to use the operator new[]:
float *c = new (std::nothrow) float[m*n]; // this is the C++ way to do it
if (c == nullptr)
cout << "new Big trouble !";
else {
cout << "\nnew Array: " << c << endl;
c[n*m-1] = 3.0; // check that last elements are accessible
}
Edit:
It's also subject to the size_t limit.
Edit 2:
new[] throws bad_alloc exceptions when there is a problem, or even bad_array_new_length. You could try/catch these if you want. But if you prefer to get nullptr when there's not enough memory, you have to use (std::nothrow) as pointed out in the comments by Beat.
The best approach for your case, if you really need these huge number of floats, would be to go for vectors. As they are also subject to size_t limitation, but as you have in fact a 2D array, you could use vectors of vectors (if you have enough memory):
vector <vector<float>> v (n, vector<float>(m));

Difference between dynamic array and normal array [duplicate]

This question already has answers here:
What is the difference between Static and Dynamic arrays in C++?
(13 answers)
Closed 9 years ago.
I'm a beginer. I'm confused about the difference between them. I have read a few answers and realized one of the difference is that the dynamic array can be deleted while the normal array can't. But are there any other differences? Such as their functions, sizes or whatsoever?
Well I have read such an example, and I don't see any difference if I replace the dynamic array {p= new (nothrow) int[i];} with a normal array {int p [i];}.
#include <iostream>
#include <new>
using namespace std;
int main ()
{
int i,n;
int * p;
cout << "How many numbers would you like to type? ";
cin >> i;
p= new (nothrow) int[i];
if (p == 0)
cout << "Error: memory could not be allocated";
else
{
for (n=0; n<i; n++)
{
cout << "Enter number: ";
cin >> p[n];
}
cout << "You have entered: ";
for (n=0; n<i; n++)
cout << p[n] << ", ";
delete[] p;
}
return 0;
}
But are there any other differences?
Compile and run this, see what happens:
int main(int argc, char** argv){
char buffer[1024*1024*64];
buffer[0] = 0;
return 0;
}
Explanation:
Well, normal array is placed either on stack or within code segment (if it is global variable or static local variable). At least on windows/linux PCs. The stack has limited size (although you can change it using ulimit in linux and compiler settings on windows). So your array is too big for the stack, your program will instantly crash upon entering the function with that array ("segmentation fault" on linux, "stack overflow" or "access violation" on windows (forgoet which one)). Default limit for array size is 1Megabyte on windows (x86), and 8 megabytes on linux.
You cannot determine size of a block allocated with new. int *p = new int[146]; std::cout << sizeof(p) << std::endl. will print sizeof(int*), not size of allocated memory. However, sizeof works on arrays.
Theoretically, using dynamic memory allocation, you can allocate as much memory as you want (operating system may impose limits, though, and on 32bit system maximum allocated block size will be 2..3 GB). You can also control resource usage by freeing the memory, so your program won't be eating system ram/swap file for no reason.
Dynamic arrays are not automatically freed, you have delete them manually.
That's just a brief overview.
Dynamic memory allocation provides finer resource control and removes some limitations of local variables.
Speaking of which, although you CAN use new/delete, if you want to use arrays with variable size, you should use std::vector instead. Manual memory management is prone to errors, so you should make compiler do it for you when possible. As a result, it is advised to at least study STL(Standard Template Library) , smart pointers, RAII, Rule of Three.
Dynamic is just that. You can change the size of the resulting array at runtime rather than compile time. There are compiler extensions that allow you to use static arrays (int array[CONSTANT]) like a dynamic array in that you can specify the size at runtime but those are nonstandard.
Dynamic arrays are also allocated on the heap rather than the stack. This allows you to use all of the available memory if necessary to create the array. The stack has a limited size that depends on the OS.
Allocating memory on the heap allows you to for example do this:
int* createArray(int n) {
int* arr = new int[n];
return arr;
}
int main()
{
int* myArray = createArray(10);
// this array is now valid and can be used
delete[] myArray;
}
I'm sure there are numerous other intricacies between dynamic vs static arrays but those points are the big ones.

Pointer or Value in my case?

bool example1()
{
long a;
a = 0;
cout << a;
a = 1;
cout << a;
a = 2;
cout << a;
//and again...again until
a = 1000000;
cout << a+1;
return true;
}
bool example2()
{
long* a = new long;//sorry for the misstake
*a = 0;
cout << *a;
*a = 1;
cout << *a;
*a = 2;
cout << *a;
//and again...again until
*a = 1000000;
cout << *a + 1;
return true;
}
Note that I do not delete a in example2(), just a newbie's questions:
1. When the two functions are executing, which one use more memories?
2. After the function return, which one make the whole program use more memories?
Thanks for your help!
UPATE: just repace long* a; with long* a = new long;
UPDATE 2: to avoid the case that we are not doing anything with a, I cout the value each time.
Original answer
It depends and there will be no difference, at the same time.
The first program is going to consume sizeof(long) bytes on the stack, and the second is going to consume sizeof(long*). Typically long* will be at least as big as a long, so you could say that the second program might use more memory (depends on the compiler and architecture).
On the other hand, stack memory is allocated with OS memory page granularity (4KB would be a good estimate), so both programs are almost guaranteed to use the same number of memory pages for the stack. In this sense, from the viewpoint of someone observing the system, memory usage is going to be identical.
But it gets better: the compiler is free to decide (depending on settings) that you are not really doing anything with these local variables, so it might decide to simply not allocate any memory at all in both cases.
And finally you have to answer the "what does the pointer point to" question (as others have said, the way the program is currently written it will almost surely crash due to accessing invalid memory when it runs).
Assuming that it does not (let's say the pointer is initialized to a valid memory address), would you count that memory as being "used"?
Update (long* a = new long edit):
Now we know that the pointer will be valid, and heap memory will be allocated for a long (but not released!). Stack allocation is the same as before, but now example2 will also use at least sizeof(long) bytes on the heap as well (in all likelihood it will use even more, but you can't tell how much because that depends on the heap allocator in use, which in turn depends on compiler settings etc).
Now from the viewpoint of someone observing the system, it is still unlikely that the two programs will exhibit different memory footprints (because the heap allocator will most likely satisfy the request for the new long in example2 from memory in a page that it has already received from the OS), but there will certainly be less free memory available in the address space of the process. So in this sense, example2 would use more memory. How much more? Depends on the overhead of the allocation which is unknown as discussed previously.
Finally, since example2 does not release the heap memory before it exits (i.e. there is a memory leak), it will continue using heap memory even after it returns while example1 will not.
There is only one way to know, which is by measuring. Since you never actually use any of the values you assign, a compiler could, under the "as-if rule" simply optimize both functions down to:
bool example1()
{
return true;
}
bool example2()
{
return true;
}
That would a perfectly valid interpretation of your code under the rules of C++. It's up to you to compile and measure it to see what actually happens.
Sigh, an edit to the question made a difference to the above. The main point still stands: you can't know unless you measure it. Now both of the functions can be optimized to:
bool example1()
{
cout << 0;
cout << 1;
cout << 2;
//and again...again until
cout << 1000001;
return true;
}
bool example2()
{
cout << 0;
cout << 1;
cout << 2;
//and again...again until
cout << 1000001;
return true;
}
example2() never allocates memory for the value referenced by pointer a. If it did, it would take slightly more memory because it would require the space required for a long as well as space for the pointer to it.
Also, no matter how many times you assign a value to a, no more memory is used.
example 2 has a problem of not allocating memory for the pointer. Pointer a initially has an unknown value which makes it to point to somewhere in memory. assigning values to this pointer corrupts the content of that somewhere.
both examples use same amount of memory. (which is 4 bytes.)