Does std::string::clear reclaim the memory associated with a string?

Does std::string::clear reclaim the memory associated with a string? - c++

For example, if loaded a text file into an std::string, did what I needed to do with it, then called clear() on it, would this release the memory that held the text? Or would I be better off just declaring it as a pointer, calling new when I need it, and deleting it when I'm done?

Calling std::string::clear() merely sets the size to zero. The capacity() won't change (nor will reserve()ing less memory than currently reserved change the capacity). If you want to reclaim the memory allocated for a string, you'll need to do something along the lines of
std::string(str).swap(str);
Copying the string str will generally only reserve a reasonable amount of memory and swapping it with str's representation will install the resulting representation into str. Obviously, if you want the string to be empty you could use
std::string().swap(str);

The only valid method to release unused memory is to use member function shrink_to_fit(). Using swap has no any sense because the Standard does not say that unused memory will be released when this operation is used.
As an example
s.clear();
s.shrink_to_fit();

I realized the OP is old but wanted to add some precision. I found this post when I was attempting to understand a behavior that seemed to contredict the answers provide here.
With gcc4.8.3, clear() might look like it's releasing memory in some scenario.
std::string local("this is a test");
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
As expected I get
Before clear: 14
After clear: 14
Now lets add another string into the mix:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
This time I get:
Before clear: 14
After clear: 0
Looks like stdc++ has some optimisation to, whenever possible, share the memory holding the string content. Depending if a string is shared or not the behavior will differ. In the last example, when clear() is called, a new instance of the internal of std::string local is created and it will be empty(). The capacity will be set to 0. One might conclude from the ouput of capacity() that some memory was freed, but that is not the case.
This can be proven with the following code:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << (int*)local.data() << ' ' << (int*)ref.data() << '\n';
Will give:
0x84668cc 0x84668cc
The two strings point to the same data, i was not execting that. Add a local.clear() or anything that modifies local or ref and the adresses will then differ, obviously.
Regards

... would this release the memory that held the text?
No.
Or would I be better off just declaring it as a pointer ...
No, you'd be better off declaring the string in the scope in which it is needed, and letting its destructor be called. If you must release the memory in the scope which the string still exists, you can do this:
std::string().swap(the_string_to_clear);

Related

Memory isn't freed when scope is left in C++

I'm fairly new to C++ so please forgive me for my ignorance. I'm under the impression that anything between { and } is called a scope, and that you can create a separate scope inside a function, or anything else, just by adding more brackets. For example:
int foo(){
std::cout << "I'm inside the scope of foo" << std::endl;
{
std::cout << "I'm inside a scope that's inside the scope of foo" << std::endl;
}
}
I was learning about this in relation to pointers and memory leaks. My understanding is when you leave a scope all variables should be freed from memory unless the memory was manually allocated with new or malloc. In my testing, however, this does not seem to be the case. I've written the following script to test this:
#include <iostream>
void test(){
{
int regdata = 240;
int* pointerInt = new int(1);
*pointerInt = 15;
std::cout << "RegData Addr: " << &regdata << std::endl;
std::cout << "Value: " << regdata << std::endl;
std::cout << "Pointer Addr: " << &pointerInt << std::endl;
std::cout << "Pointer: " << pointerInt << std::endl;
std::cout << "Value: " << *pointerInt << std::endl;
std::cout << std::endl;
std::cout << "Press any key then enter to leave the scope.";
char temp;
std::cin >> temp;
//delete pointerInt;
}
std::cout << "The scope has been left." << std::endl;
std::cout << "Press any key then enter to leave the function.";
char temp;
std::cin >> temp;
}
int main(){
test();
std::cout << "The function has been left." << std::endl;
std::cout << "Press any key then enter to leave the program.";
char temp;
std::cin >> temp;
}
I start this program on my Windows 10 computer and have been monitoring the memory usage using the program Cheat Engine. Now, depending on whether or not I have delete commented out it will delete the bytes that hold 15 and replace them with random bytes when I leave the scope as it should. However, the memory holding the 240 is not freed until after I leave the scope of test (at which point the 240 is replaced with 1). And regardless of if the delete is commented out, the actual pointer itself is never deleted out of memory.
Is my compiler or my machine not compiling/running my code correctly? Or am I misunderstanding memory management between scopes? If it's the latter, please correct me so I can properly understand what is supposed to happen. Also let me know if something doesn't make sense!

Memory is not "freed" when you leave a scope. The lifetime of all variables with automatic storage duration that were declared within that scope end. What happens when the lifetime of a variable ends? For simple values like an int or a pointer: likely absolutely nothing. The lifetime of the variable has ended, so attempting to use it results in undefined behavior, but usually nothing will immediately happen to the value. The compiler knows that the storage in which the variable resided is now available to be re-used, but until it actually re-uses it the old value will likely continue to exist. A compiler could immediately zero out the memory of variables whose lifetime has ended, and for a debug build maybe it will, but doing so would take time so most compilers won't bother.

regdata is stored on the stack so the memory it uses will never normally be "freed" until the end of the thread that the code is running in.
What does happen is that the stack memory is now available again to be used for something else. Your calls to std::cout and std::cin will both need to use a certain amount of stack memory, if they use enough memory then they'll overwrite all of the values in your inner scope (depending on the implementation of your compiler, there's no guarantee that it'll reuse the inner scopes stack memory later in the function, it might decide it's faster to use more stack memory instead).
This is why regdata is always being overwritten with 1, it's a coincidence of later stack usage rather than a deliberate action by the compiler. Some compilers might deliberately overwrite stack memory after its released to help with debugging but in a normal release build that would be an unnecessary waste of time.

c++ memory allocation howto

I am just starting C++ and I can't understand how my code works:
Ok I allocate memory, but at the time of the allocation nobody knows the size of the memory to be allocated. But still the code works.
How much memory is allocated? How the compiler knows how much memory I will need?
EDIT:
Sorry if my question was not clear. Let me please try clarify it. So I dynamically allocate some memory in the heap by using my pointer. But since there is no text in the sting variable, in my opinion it is quite difficult to know how much text (bytes) I will enter via getline.
I tried asking the size of two different text literals, and yes they are different in size.
sizeof("") // is 1 (because of the ending 0 maybe?)
sizeof("sometext") // is 9
But for the string: the sizeof gives me 4 both times. It's clear that the sizeof() gives me the length of the pointer pointing to the string.
How can I allocate memory? If I allocate memory for a new string, only allocates to a pointer pointing to the memory address of the first character in the string?
Obviously the characters I enter must be stored somewhere. And I first allocate the memory, and then I load some text into it.
Edit 2: make the edited code to look code, not plain text.
//Edit:
string a,b = "sometext";
cout << sizeof(a) << endl; //4
cout << sizeof(b); //4
//--------------------------------------------------------
#include <iostream>
#include <string>
#include <exception>
using namespace std;
int main()
{
//Defining struct
struct musicCD
{
string artist, title; // artist of the CD
};
//Memory allocation
musicCD *ptr;
try{ptr = new musicCD;}
catch(bad_alloc){cerr << "Out of memory :(";return -1;}
catch(...){cerr << "Something bad happened :O !";return -1;
}
//Get the data to store:
cout << "Please enter the data for the CD!' " << endl;
cout << "Please enter the artist: ";getline(cin, ptr->artist); cout << endl;
//Write out the data
cout << "The data entered: " << endl;
cout << "The artist performing is: \t" << ptr->artist << endl;
delete ptr;
return 0;
}

It seems like you are confused about how std::string, or any dynamic container, handles the fact that it's memory requirements are not predetermined. std::string for example does not store it's character data internally. Simply put, it contains a pointer that points to another dynamic allocated buffer which contains the actual data. std::string has constructors, a destructor and assignment operators that automatically manage the extra buffer, which contains the actual character data. This including reallocating, copying the data, updating the internal pointer and freeing the previous buffer when extra storage is needed. The size of the buffer that contains the actual data does not count towards the size of std::string, only the pointer to it does. Every instance of std::string, throughout it's lifetime, only directly contains a constant number of members which all have constant sizes. In c++ all types have a compile time constant size.
See Rule of five for a simplified implementation of string showing how it works. The size of the class rule_of_five from this example is simply the size of char* regardless of the content of the buffer pointed to by this pointer. The actual buffer is allocated later, during or after construction, which is after the initial allocation for the object itself has already finished.
Edit: There are some cases where a string can store it's character data internally when dealing with very short strings. This is an optimization not generally seen in other containers. See this answer.

lifetime of a temporary function parameter

Creating a temporary char buffer as a default function argument and binding an r-value reference to it allows us to compose statements on a single line whilst preventing the need to create storage on the heap.
const char* foo(int id, tmp_buf&& buf = tmp_buf()) // buf exists at call-site
Binding a reference/pointer to the temporary buffer and accessing it later yields undefined behaviour, because the temporary no longer exists.
As can be seen from the example app below the destructor for tmp_buf is called after the first output, and before the second output.
My compiler (gcc-4.8.2) doesn't warn that I'm binding a variable to a temporary. This means that using this kind of micro-optimisation to use an auto char buffer rather than std::string with associated heap allocation is very dangerous.
Someone else coming in and capturing the returned const char* could inadvertently introduce a bug.
1. Is there any way to get the compiler to warn for the second case below (capturing the temporary)?
Interestingly you can see that I tried to invalidate the buffer - which I failed to do, so it likely shows I don't fully understand where on the stack tmp_buf is being created.
2. Why did I not trash the memory in tmp_buf when I called try_stomp()? How can I trash tmp_buf?
3. Alternatively - is it safe to use in the manner I have shown? (I'm not expecting this to be true!)
code:
#include <iostream>
struct tmp_buf
{
char arr[24];
~tmp_buf() { std::cout << " [~] "; }
};
const char* foo(int id, tmp_buf&& buf = tmp_buf())
{
sprintf(buf.arr, "foo(%X)", id);
return buf.arr;
}
void try_stomp()
{
double d = 22./7.;
char buf[32];
snprintf(buf, sizeof(buf), "pi=%lf", d);
std::cout << "\n" << buf << "\n";
}
int main()
{
std::cout << "at call site: " << foo(123456789);
std::cout << "\n";
std::cout << "after call site: ";
const char* p = foo(123456789);
try_stomp();
std::cout << p << "\n";
return 0;
}
output:
at call site: foo(75BCD15) [~]
after call site: [~]
pi=3.142857
foo(75BCD15)

For question 2.
The reason you didn't trash the variable is that the compile probably allocated all the stack space it needed at the start of the function call. This includes all the stack space for the temporary objects, and objects that are declared inside a nested scope. You can't guarantee that the compiler does this (I think), rather than push objects on the stack as needed, but it is more efficient and easier to keep track of where your stack variables are this way.
When you call the try_stomp function, that function then allocates its stack after (or before, depending on your system) the stack for the main function.
Note that the default variables for a function call are actually by the compile to the calling code, rather than being part of the called function (which is why the need to be part of the function declaration, rather than the definition, if it was declared separately).
So your stack when in try_stomp looks something like this (there is a lot more going on in the stack, but these are the relevant parts):
main - p
main - temp1
main - temp2
try_stomp - d
try_stomp - buf
So you can't trash the temporary from try_stomp, at least not without doing something really outrageous.
Again, you can't rely on this layout, as it is compile dependent, and is just an exmaple of how the compiler might do it.
The way to trash the temporary buffer would be to do it in the destructor of tmp_buf.
Also interestingly, MSVC seems to allocate stack space for all of the temporary objects separately, rather than re-use the stack space for both objects. This means that even repeated calls to foo won't trash each other. Again, you can't depend on this behavior (I think - I couldn't find an reference to it).
For question 3.
No, don't do this!

Why can I access an element I just erased from an stl vector in c++?

In this example, I create a vector with one integer in it and then I erase that integer from the vector. The size of the vector decreases, but the integer is still there! Why is the integer still there? How is it possible for a vector of size 0 to contain elements?
#include <vector>
#include <iostream>
using namespace std;
int main(int agrc, char* argv[])
{
vector<int> v;
v.push_back(450);
cout << "Before" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << (*v.begin()) << endl;
v.erase(v.begin());
cout << "After" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << *(v.begin()) << endl;
return(0);
}
output:
Before
Size: 1
First element: 450
After
Size: 0
First element: 450

You are invoking undefined behavior by dereferencing an invalid memory location. Normally, the heap manager will not immediately free the memory deleted using delete for efficiency purposes. However, that doesn't mean that you can access that memory location, heap manager can use this memory location for other purposes whenever it likes. So your program will behave unpredictably if you dereference a invalid memory location.

IIRC a vector doesn't release space unless specifically told to, so you're seeing an item which is still in its memory but not being tracked by the vector. This is part of the reason why you're supposed to check the size first (the other being that if you never assigned anything, you'll be dereferencing a garbage pointer).

To start, don't count on it being this way across all systems. How a vector works internally is completely implementation-dependent. By dereferencing an invalid memory location, you're circumventing the behavior that has been outlined in the documentation.
That is to say, you can only count on behavior working that is outlined in the STL docs.
The reason you can still access that memory location is because that particular implementation you are using doesn't immediately delete memory, but keeps it around for awhile(probably for performance purposes). Another implementation could very well delete that memory immediately if the author so desired.

It is just that the vector has not freed the memory, but kept it around for future use.
This is what we call "undefined behaviour" There is no guarantee that it will work next time and it may easily crash the program on a future attempt. Don't do it.

What are your compiler options? I get a crash with the usual
options, with both of the compilers I regularly use (g++ and
VC++). In the case of g++, you have to set some additional
options (-D_GLIBCXX_DEBUG, I think) for this behavior; as far as
I can tell, it's the default for VC++. (My command for VC++ was
just "cl /EHs bounds.cc".)
As others have said, it's undefined behavior, but with a good
compiler, it will be defined to cause the program to crash.

About the copy constructor and pointers

I'm reading through an example in primer and something which it talks about is not happening. Specifically, any implicit shallow copy is supposed to copy over the address of a pointer, not just the value of what is being pointed to (thus the same memory address). However, each of the pos properties are pointing to two different memory addresses (so I am able to change the value of one without affecting the other). What am I doing wrong?
Header
#include "stdafx.h"
#include <iostream>
class Yak
{
public:
int hour;
char * pos;
const Yak & toz(const Yak & yk);
Yak();
};
END HEADER
using namespace std;
const Yak & Yak:: toz(const Yak & yk)
{
return *this;
}
Yak::Yak()
{
pos = new char[20];
}
int _tmain(int argc, _TCHAR* argv[])
{
Yak tom;
tom.pos="Hi";
Yak blak = tom.toz(tom);
cout << &blak.pos << endl;
cout << &tom.pos << endl;
system("pause");
return 0;
}

You're printing the addresses of the pointers, not the address of the string they're pointing to:
cout << &blak.pos << endl;
cout << &tom.pos << endl;
They are two different pointers, so their addresses differ. However, they point to the same string:
cout << static_cast<void*>(blak.pos) << endl;
cout << static_cast<void*>(tom.pos) << endl;
(Note that cast static_cast<void*>(tom.pos). As Aaron pointed out in a comment, it is necessary because when outputting a char* will through operator<<, the stream library will assume the character pointed to to be the first character of a zero-terminated string. Outputting a void*, OTOH, will output the address.)
Note that there is more wrong with your code. Here
Yak tom;
You are creating a new object. Its constructor allocates 20 characters, and stores their address in tom.pos. In the very next line
tom.pos="Hi";
you are assigning the address of a string literal to tom.pos, thereby discarding the address of the bytes you allocated, effectively leaking that memory.
Also note that Yak has no destructor, so even if you don't discard the 20 characters that way, when tom goes out of scope, tom.pos will be destroyed and thus the address of those 20 bytes lost.
However, due to your missing copy constructor, when you copy a Yak object, you end up with two of them having their pos element pointing to the same allocated memory. When they go out of scope, they'd both try to delete that memory, which is fatal.
To cut this short: Use std::string. It's much easier. Get a grip on the basics, using safe features of the language like the std::string, the containers of the stdandard library, smart pointers. Once you feel sure with these, tackle manual memory management.
However, keep in mind that I, doing C++ for about 15 years, consider manual resource management (memory is but one resource) error-prone and try to avoid it. If I have to do it, I hide each resource behind an object managing it - effectively falling back to automatic memory management. :)
Which "primer" is this you're reading? Lippmann's C++ Primer? If so, which edition? I'd be surprised if a recent edition of Lippmann's book would let you lose onto dynamic memory without first showing your the tools to tackle this and how to use them.

Unless i'm missing something, it seems you are printing out the address of variable pos, but not the address to which they both point to. The addresses of both pos are different, but the pointers should be the same.

The code does not make a lot of sense.
The toz method takes a Yak but does nothing with it. Why? In fact, what is toz supposed to do, anyway?
tom.pos = "Hi" does not copy Hi into the new char[20] array that you allocate in the constructor. Instead it replaces the pos pointer with a pointer to a static const char array containing "Hi\0". You would need to use strcpy.

Your code has a memory leak in it. You are allocating 20 characters on construction and have pos point to the allocated memory. However you are then pointing pos at the memory containing "Hi" without deleting the memory first.
In any case, pos is a member variable of type char *. Each instance of Yak is going to have its own unique pos. Because pos is a pointer, however, it's possible that they are pointing to the same block of memory (and this is the case in your example).
What you are printing out is the address of the pos variable which is of type char *, and not the address of the memory it is pointing to, which is of type char. Remember that a pointer is a variable like any other, it has an address and a value (and that value, in the case of a pointer, is an address).

If you want blak to be a shallow copy of tom, try the following instead of using your function toz:
Yak *blak = &tom;
// Note the use of -> ; also, &blak->pos will work
cout << &(blak->pos) << endl;
cout << &tom.pos << endl;
Now, blak merely points to tom --- it has nothing else associated with it.
Also, there are two other issues with your code:
pos = new char[20];
You are allocating memory here which has not been freed when you're done with the Yak object --- this will lead to memory leaks (unless you plan to free it elsewhere). If the use of pos ends with the use of the Yak object it's associated with (e.g.tom) then I recommend adding delete pos; to the destructor of Yak. Or as sbi suggested, use std::string.
tom.pos = "Hi";
You're assigning it to a string literal. This means that pos will be stuck to Hi which is actually a read-only section in memory. You cannot change the string stored in pos, so I don't think this what you intended. Therefore, allocating 20 bytes in the beginning to pos seems pointless.
Again, I recommend using string.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js