c++ memory allocation howto - c++

I am just starting C++ and I can't understand how my code works:
Ok I allocate memory, but at the time of the allocation nobody knows the size of the memory to be allocated. But still the code works.
How much memory is allocated? How the compiler knows how much memory I will need?
EDIT:
Sorry if my question was not clear. Let me please try clarify it. So I dynamically allocate some memory in the heap by using my pointer. But since there is no text in the sting variable, in my opinion it is quite difficult to know how much text (bytes) I will enter via getline.
I tried asking the size of two different text literals, and yes they are different in size.
sizeof("") // is 1 (because of the ending 0 maybe?)
sizeof("sometext") // is 9
But for the string: the sizeof gives me 4 both times. It's clear that the sizeof() gives me the length of the pointer pointing to the string.
How can I allocate memory? If I allocate memory for a new string, only allocates to a pointer pointing to the memory address of the first character in the string?
Obviously the characters I enter must be stored somewhere. And I first allocate the memory, and then I load some text into it.
Edit 2: make the edited code to look code, not plain text.
//Edit:
string a,b = "sometext";
cout << sizeof(a) << endl; //4
cout << sizeof(b); //4
//--------------------------------------------------------
#include <iostream>
#include <string>
#include <exception>
using namespace std;
int main()
{
//Defining struct
struct musicCD
{
string artist, title; // artist of the CD
};
//Memory allocation
musicCD *ptr;
try{ptr = new musicCD;}
catch(bad_alloc){cerr << "Out of memory :(";return -1;}
catch(...){cerr << "Something bad happened :O !";return -1;
}
//Get the data to store:
cout << "Please enter the data for the CD!' " << endl;
cout << "Please enter the artist: ";getline(cin, ptr->artist); cout << endl;
//Write out the data
cout << "The data entered: " << endl;
cout << "The artist performing is: \t" << ptr->artist << endl;
delete ptr;
return 0;
}

It seems like you are confused about how std::string, or any dynamic container, handles the fact that it's memory requirements are not predetermined. std::string for example does not store it's character data internally. Simply put, it contains a pointer that points to another dynamic allocated buffer which contains the actual data. std::string has constructors, a destructor and assignment operators that automatically manage the extra buffer, which contains the actual character data. This including reallocating, copying the data, updating the internal pointer and freeing the previous buffer when extra storage is needed. The size of the buffer that contains the actual data does not count towards the size of std::string, only the pointer to it does. Every instance of std::string, throughout it's lifetime, only directly contains a constant number of members which all have constant sizes. In c++ all types have a compile time constant size.
See Rule of five for a simplified implementation of string showing how it works. The size of the class rule_of_five from this example is simply the size of char* regardless of the content of the buffer pointed to by this pointer. The actual buffer is allocated later, during or after construction, which is after the initial allocation for the object itself has already finished.
Edit: There are some cases where a string can store it's character data internally when dealing with very short strings. This is an optimization not generally seen in other containers. See this answer.

Related

Why does reallocating with realloc, a pre-allocated memory using allocator::allocate conserve the old start memory address?

The main problem was reallocating memory while expanding it and conserving data and the first starting memory address which is used by many other parts of the program (such as a static starting memory)
This doesn't work with realloc because he deallocate the precedent allocated memory and affects another with a new starting memory address:
using namespace std;
int *t = static_cast<int*>(malloc( 2*sizeof(int)));
cout << "address " << t << endl;
t = static_cast<int*>(realloc(t,10*sizeof(int)));
cout << "address " << t << endl;
=========================
// both of the addresses are different
address 0x55c454fc5180
address 0x55c454fc55b0
after testing many solutions (even direct access to the memory by system call), I found this one :
allocator<int> alloc;
int *t = alloc.allocate(2*sizeof(int));
cout << "address " << t << endl;
// reallocating memory using realloc
t = static_cast<int*>(realloc(t, 10*sizeof(int)));
cout << "address " << t << endl;
=========================
// now the addresses are the same
address 0x55c454fc5180
address 0x55c454fc5180
I tried to explain how it's possible but not able to match the both functioning and I want to know why and how it works.
Using realloc on an address allocated with std::allocator, new or anything similar has undefined behavior. It can only be used when the address comes from the malloc/calloc/realloc family of allocation functions. Never mix them.
In general realloc does not guarantee that the address of the allocation remains unchanged. There is no guarantee that realloc will be able to expand the allocation in place (e.g. there might not be enough memory free after the current allocation). It is the defined behavior of realloc to copy the memory block to a new allocation where sufficient space is free in such a situation. This also means that realloc can only be used with trivially-copyable types in C++.
If your program depends on the address of the allocation remaining unchanged, then it can't expand the allocation. You can have one of these, not both.
Also, you are using std::allocator<int> wrong. The argument to .allocate should be the number of elements of the array to allocate, not the number of bytes. Then afterwards you are supposed to call std::allocator_traits<std::allocator<int>>::construct or std::construct_at or a placement-new on each element of the array to construct and start the lifetime of the array elements.
I am not sure why you are trying to use std::allocator here, but it is unlikely that you need it. If you just intend to create an array of int, you should use new int[n], not std::allocator. Or rather don't worry with manual memory management at all and just use std::vector<int>.

Meaning behind memory surrounding array c++

I've been lately experimenting with dynamically allocated arrays. I got to conclusion that they have to store their own size in order to free the memory.
So I dug a little in memory with pointers and found that 6*4 bytes directly before and 1*4 bytes directly after array don't change upon recompilation (aren't random garbage).
I represented these as unsigned int types and printed them out in win console:
Here's what I got:
(array's content is between fdfdfdfd uints in representation)
So I figured out that third unsigned int directly before the array's first element is the size of allocated memory in bytes.
However I cannot find any information about rest of them.
Q: Does anyone know what the memory surrounding array's content means and care to share?
The code used in program:
#include <iostream>
void show(unsigned long val[], int n)
{
using namespace std;
cout << "Array length: " << n <<endl;
cout << "hex: ";
for (int i = -6; i < n + 1; i++)
{
cout << hex << (*(val + i)) << "|";
}
cout << endl << "dec: ";
for (int i = -6; i < n + 1; i++)
{
cout << dec << (*(val + i)) << "|";
}
cout << endl;
}
int main()
{
using namespace std;
unsigned long *a = new unsigned long[15]{ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 };
unsigned long *b = new unsigned long[15]{ 0 };
unsigned long *c = new unsigned long[17]{ 0 };
show(a, 15);
cout << endl;
show(b, 15);
cout << endl;
show(c, 17);
cout << endl;
cout << endl;
system("PAUSE");
delete[] a;
delete[] b;
delete[] c;
}
It typically means that you carried out your experiments using a debugging configuration of the project and debugging version of the standard library. That version of the library uses some pre-defined bit-patterns to mark the boundaries of each allocated memory block ("no man's land" areas). Later, it checks if these bit-patterns survived intact (e.g. at the moment of delete[]). If they did not, it implies that someone wrote beyond the boundaries of the memory block. Debug version of the library will issue a diagnostic message about the problem.
If you compile your test program in release (optimized) configuration with release (optimized) version of the standard library, these "no man's land" areas will not be created, these bit-patterns will disappear from memory and the associated memory checks will disappear from the code.
Note also the the memory layout you observed is typically specific for arrays of objects with no destructors or with trivial destructors (which is basically the same thing). In your case you were working with plain unsigned long.
Once you start allocating arrays of objects with non-trivial destructors, you will observe that it is not just the size of memory block (in bytes) that's stored by the implementation, but the exact size of the array (in elements) is typically stored there as well.
"I got to conclusion that they have to store their own size in order to free the memory." No they don't.
Array does not free it's memory. You never get an array from new/malloc. You get a pointer to memory under which you can store an array, but if you forget size you have requested you cannot get it back. The standard library often does depend on OS under the hood as well.
And even OS does not have to remember it. There are implementations with very simple memory management which basically returns you current pointer to the free memory, and move the pointer by the requested size. free does nothing and freed memory is forgotten.
Bottom line, memory management is implementation defined, and outside of what you get nothing is guaranteed. Compiler or OS can mess with it, so you need to look documentation specific for the environment.
Bit patterns that you talk about, are often used as safe guards, or used for debugging. E.g: When and why will an OS initialise memory to 0xCD, 0xDD, etc. on malloc/free/new/delete?

Does std::string::clear reclaim the memory associated with a string?

For example, if loaded a text file into an std::string, did what I needed to do with it, then called clear() on it, would this release the memory that held the text? Or would I be better off just declaring it as a pointer, calling new when I need it, and deleting it when I'm done?
Calling std::string::clear() merely sets the size to zero. The capacity() won't change (nor will reserve()ing less memory than currently reserved change the capacity). If you want to reclaim the memory allocated for a string, you'll need to do something along the lines of
std::string(str).swap(str);
Copying the string str will generally only reserve a reasonable amount of memory and swapping it with str's representation will install the resulting representation into str. Obviously, if you want the string to be empty you could use
std::string().swap(str);
The only valid method to release unused memory is to use member function shrink_to_fit(). Using swap has no any sense because the Standard does not say that unused memory will be released when this operation is used.
As an example
s.clear();
s.shrink_to_fit();
I realized the OP is old but wanted to add some precision. I found this post when I was attempting to understand a behavior that seemed to contredict the answers provide here.
With gcc4.8.3, clear() might look like it's releasing memory in some scenario.
std::string local("this is a test");
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
As expected I get
Before clear: 14
After clear: 14
Now lets add another string into the mix:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
This time I get:
Before clear: 14
After clear: 0
Looks like stdc++ has some optimisation to, whenever possible, share the memory holding the string content. Depending if a string is shared or not the behavior will differ. In the last example, when clear() is called, a new instance of the internal of std::string local is created and it will be empty(). The capacity will be set to 0. One might conclude from the ouput of capacity() that some memory was freed, but that is not the case.
This can be proven with the following code:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << (int*)local.data() << ' ' << (int*)ref.data() << '\n';
Will give:
0x84668cc 0x84668cc
The two strings point to the same data, i was not execting that. Add a local.clear() or anything that modifies local or ref and the adresses will then differ, obviously.
Regards
... would this release the memory that held the text?
No.
Or would I be better off just declaring it as a pointer ...
No, you'd be better off declaring the string in the scope in which it is needed, and letting its destructor be called. If you must release the memory in the scope which the string still exists, you can do this:
std::string().swap(the_string_to_clear);

In C++, what happens when the delete operator is called?

In C++, I understand that the delete operator, when used with an array, 'destroys' it, freeing the memory it used. But what happens when this is done?
I figured my program would just mark off the relevant part of the heap being freed for re-usage, and continue on.
But I noticed that also, the first element of the array is set to null, while the other elements are left unchanged. What purpose does this serve?
int * nums = new int[3];
nums[0] = 1;
nums[1] = 2;
cout << "nums[0]: " << *nums << endl;
cout << "nums[1]: " << *(nums+1) << endl;
delete [] nums;
cout << "nums[0]: " << *nums << endl;
cout << "nums[1]: " << *(nums+1) << endl;
Two things happen when delete[] is called:
If the array is of a type that has a nontrivial destructor, the destructor is called for each of the elements in the array, in reverse order
The memory occupied by the array is released
Accessing the memory that the array occupied after calling delete results in undefined behavior (that is, anything could happen--the data might still be there, or your program might crash when you try to read it, or something else far worse might happen).
The reasons for it being NULL are up to the heap implementation.
Some possible reasons are that it is using the space for it's free-space tracking. It might be using it as a pointer to the next free block. It might be using it to record the size of the free block. It might be writing in some serial number for new/delete debug tracking.
It could just be writing NULL because it feels like it.
Whenever someone says int* nums = new int[3], the runtime system is required to store the number of objects, 3, in a place that can be retrieved knowing only the pointer, nums. The compiler can use any technique it wants to use, but there are two popular ones.
The code generated by nums = new int[3] might store the number 3 in a static associative array, where the pointer nums is used as the lookup key and the number 3 is the associated value. The code generated by delete[] nums would look up the pointer in the associative array, would extract the associated size_t, then would remove the entry from the associative array.
The code generated by nums = new int[3] might allocate an extra sizeof(size_t) bytes of memory (possibly plus some alignment bytes) and put the value 3 just before the first int object. Then delete[] nums would find 3 by looking at the fixed offset before the first int object (that is, before *num) and would deallocate the memory starting at the beginning of the allocation (that is, the block of memory beginning the fixed offset before *nums).
Neither technique is perfect. Here are a few of the tradeoffs.
The associative array technique is slower but safer: if someone forgets the [] when deallocating an array of things, (a) the entry in the associative array would be a leak, and (b) only the first object in the array would be destructed. This may or may not be a serious problem, but at least it might not crash the application.
The overallocation technique is faster but more dangerous: if someone says delete nums where they should have said delete[] nums, the address that is passed to operator delete(void* nums) would not be a valid heap allocation—it would be at least sizeof(size_t) bytes after a valid heap allocation. This would probably corrupt the heap. - C++ FAQs

About the copy constructor and pointers

I'm reading through an example in primer and something which it talks about is not happening. Specifically, any implicit shallow copy is supposed to copy over the address of a pointer, not just the value of what is being pointed to (thus the same memory address). However, each of the pos properties are pointing to two different memory addresses (so I am able to change the value of one without affecting the other). What am I doing wrong?
Header
#include "stdafx.h"
#include <iostream>
class Yak
{
public:
int hour;
char * pos;
const Yak & toz(const Yak & yk);
Yak();
};
END HEADER
using namespace std;
const Yak & Yak:: toz(const Yak & yk)
{
return *this;
}
Yak::Yak()
{
pos = new char[20];
}
int _tmain(int argc, _TCHAR* argv[])
{
Yak tom;
tom.pos="Hi";
Yak blak = tom.toz(tom);
cout << &blak.pos << endl;
cout << &tom.pos << endl;
system("pause");
return 0;
}
You're printing the addresses of the pointers, not the address of the string they're pointing to:
cout << &blak.pos << endl;
cout << &tom.pos << endl;
They are two different pointers, so their addresses differ. However, they point to the same string:
cout << static_cast<void*>(blak.pos) << endl;
cout << static_cast<void*>(tom.pos) << endl;
(Note that cast static_cast<void*>(tom.pos). As Aaron pointed out in a comment, it is necessary because when outputting a char* will through operator<<, the stream library will assume the character pointed to to be the first character of a zero-terminated string. Outputting a void*, OTOH, will output the address.)
Note that there is more wrong with your code. Here
Yak tom;
You are creating a new object. Its constructor allocates 20 characters, and stores their address in tom.pos. In the very next line
tom.pos="Hi";
you are assigning the address of a string literal to tom.pos, thereby discarding the address of the bytes you allocated, effectively leaking that memory.
Also note that Yak has no destructor, so even if you don't discard the 20 characters that way, when tom goes out of scope, tom.pos will be destroyed and thus the address of those 20 bytes lost.
However, due to your missing copy constructor, when you copy a Yak object, you end up with two of them having their pos element pointing to the same allocated memory. When they go out of scope, they'd both try to delete that memory, which is fatal.
To cut this short: Use std::string. It's much easier. Get a grip on the basics, using safe features of the language like the std::string, the containers of the stdandard library, smart pointers. Once you feel sure with these, tackle manual memory management.
However, keep in mind that I, doing C++ for about 15 years, consider manual resource management (memory is but one resource) error-prone and try to avoid it. If I have to do it, I hide each resource behind an object managing it - effectively falling back to automatic memory management. :)
Which "primer" is this you're reading? Lippmann's C++ Primer? If so, which edition? I'd be surprised if a recent edition of Lippmann's book would let you lose onto dynamic memory without first showing your the tools to tackle this and how to use them.
Unless i'm missing something, it seems you are printing out the address of variable pos, but not the address to which they both point to. The addresses of both pos are different, but the pointers should be the same.
The code does not make a lot of sense.
The toz method takes a Yak but does nothing with it. Why? In fact, what is toz supposed to do, anyway?
tom.pos = "Hi" does not copy Hi into the new char[20] array that you allocate in the constructor. Instead it replaces the pos pointer with a pointer to a static const char array containing "Hi\0". You would need to use strcpy.
Your code has a memory leak in it. You are allocating 20 characters on construction and have pos point to the allocated memory. However you are then pointing pos at the memory containing "Hi" without deleting the memory first.
In any case, pos is a member variable of type char *. Each instance of Yak is going to have its own unique pos. Because pos is a pointer, however, it's possible that they are pointing to the same block of memory (and this is the case in your example).
What you are printing out is the address of the pos variable which is of type char *, and not the address of the memory it is pointing to, which is of type char. Remember that a pointer is a variable like any other, it has an address and a value (and that value, in the case of a pointer, is an address).
If you want blak to be a shallow copy of tom, try the following instead of using your function toz:
Yak *blak = &tom;
// Note the use of -> ; also, &blak->pos will work
cout << &(blak->pos) << endl;
cout << &tom.pos << endl;
Now, blak merely points to tom --- it has nothing else associated with it.
Also, there are two other issues with your code:
pos = new char[20];
You are allocating memory here which has not been freed when you're done with the Yak object --- this will lead to memory leaks (unless you plan to free it elsewhere). If the use of pos ends with the use of the Yak object it's associated with (e.g.tom) then I recommend adding delete pos; to the destructor of Yak. Or as sbi suggested, use std::string.
tom.pos = "Hi";
You're assigning it to a string literal. This means that pos will be stuck to Hi which is actually a read-only section in memory. You cannot change the string stored in pos, so I don't think this what you intended. Therefore, allocating 20 bytes in the beginning to pos seems pointless.
Again, I recommend using string.