Performance on strings initialization in C++

Performance on strings initialization in C++ - c++

I have following questions regarding strings in C++:
1>> which is a better option(considering performance) and why?
1.
string a;
a = "hello!";
OR
2.
string *a;
a = new string("hello!");
...
delete(a);
2>>
string a;
a = "less";
a = "moreeeeeee";
how exactly memory management is handled in c++ when a bigger string is copied into a smaller string? Are c++ strings mutable?

It is almost never necessary or desirable to say
string * s = new string("hello");
After all, you would (almost) never say:
int * i = new int(42);
You should instead say
string s( "hello" );
or
string s = "hello";
And yes, C++ strings are mutable.

All the following is what a naive compiler would do. Of course as long as it doesn't change the behavior of the program, the compiler is free to make any optimization.
string a;
a = "hello!";
First you initialize a to contain the empty string. (set length to 0, and one or two other operations). Then you assign a new value, overwriting the length value that was already set. It may also have to perform a check to see how big the current buffer is, and whether or not more memory should be allocated.
string *a;
a = new string("hello!");
...
delete(a);
Calling new requires the OS and the memory allocator to find a free chunk of memory. That's slow. Then you initialize it immediately, so you don't assign anything twice or require the buffer to be resized, like you do in the first version.
Then something bad happens, and you forget to call delete, and you have a memory leak, in addition to a string that is extremely slow to allocate. So this is bad.
string a;
a = "less";
a = "moreeeeeee";
Like in the first case, you first initialize a to contain the empty string. Then you assign a new string, and then another. Each of these may require a call to new to allocate more memory. Each line also requires length, and possibly other internal variables to be assigned.
Normally, you'd allocate it like this:
string a = "hello";
One line, perform initialization once, rather than first default-initializing, and then assigning the value you want.
It also minimizes errors, because you don't have a nonsensical empty string anywhere in your program. If the string exists, it contains the value you want.
About memory management, google RAII.
In short, string calls new/delete internally to resize its buffer. That means you never need to allocate a string with new. The string object has a fixed size, and is designed to be allocated on the stack, so that the destructor is automatically called when it goes out of scope. The destructor then guarantees that any allocated memory is freed. That way, you don't have to use new/delete in your user code, which means you won't leak memory.

Is there a specific reason why you constantly use assignment instead of intialization? That is, why don't you write
string a = "Hello";
etc.? This avoids a default construction and just makes more sense semantically. Creating a pointer to a string just for the sake of allocating it on the heap is never meaningful, i.e. your case 2 doesn't make sense and is slightly less efficient.
As to your last question, yes, strings in C++ are mutable unless declared const.

string a;
a = "hello!";
2 operations: calls the default constructor std:string() and then calls the operator::=
string *a; a = new string("hello!"); ... delete(a);
only one operation: calls the constructor std:string(const char*) but you should not forget to release your pointer.
What about
string a("hello");

In case 1.1, your string members (which include pointer to the data) are held in stack and the memory occupied by the class instance is freed when a goes out of scope.
In case 1.2, memory for the members is allocated dynamically from heap too.
When you assign a char* constant to a string, memory that will contain the data will be realloc'ed to fit the new data.
You may see how much memory is allocated by calling string::capacity().
When you call string a("hello"), memory gets allocated in the constructor.
Both constructor and assignment operator call same methods internally to allocated memory and copy new data there.

If you look at the docs for the STL string class (I believe the SGI docs are compliant to the spec), many of the methods list complexity guarantees. I believe many of the complexity guarantees are intentionally left vague to allow different implementations. I think some implementations actually use a copy-on-modify approach such that assigning one string to another is a constant-time operation, but you may incur an unexpected cost when you try to modify one of those instances. Not sure if that's still true in modern STL though.
You should also check out the capacity() function, which will tell you the maximum length string you can put into a given string instance before it will be forced to reallocate memory. You can also use reserve() to cause a reallocation to a specific amount if you know you're going to be storing a large string in the variable at a later time.
As others have said, as far as your examples go, you should really favor initialization over other approaches to avoid the creation of temporary objects.

Most likely
string a("hello!");
is faster than anything else.

You're coming from Java, right? In C++, objects are treated the same (in most ways) as the basic value types. Objects can live on the stack or in static storage, and be passed by value. When you declare a string in a function, that allocates on the stack however many bytes the string object takes. The string object itself does use dynamic memory to store the actual characters, but that's transparent to you. The other thing to remember is that when the function exits and the string you declared is no longer in scope, all of the memory it used is freed. No need for garbage collection (RAII is your best friend).
In your example:
string a;
a = "less";
a = "moreeeeeee";
This puts a block of memory on the stack and names it a, then the constructor is called and a is initialized to an empty string. The compiler stores the bytes for "less" and "moreeeeeee" in (I think) the .rdata section of your exe. String a will have a few fields, like a length field and a char* (I'm simplifying greatly). When you assign "less" to a, the operator=() method is called. It dynamically allocates memory to store the input value, then copies it in. When you later assign "moreeeeeee" to a, the operator=() method is again called and it reallocates enough memory to hold the new value if necessary, then copies it in to the internal buffer.
When string a's scope exits, the string destructor is called and the memory that was dynamically allocated to hold the actual characters is freed. Then the stack pointer is decremented and the memory that held a is no longer "on" the stack.

Creating a string directly in the heap is usually not a good idea, just like creating base types. It's not worth it since the object can easily stay on the stack and it has all the copy constructors and assignment operator needed for an efficient copy.
The std:string itself has a buffer in heap that may be shared by several string depending on the implementation.
For instance, with Microsoft's STL implementation you could do that:
string a = "Hello!";
string b = a;
And both string would share the same buffer until you changed it:
a = "Something else!";
That's why it was very bad to store the c_str() for latter use; c_str() guarantee only validity until another call to that string object is made.
This lead to very nasty concurrency bugs that required this sharing functionality to be turned off with a define if you used them in a multithreaded application.

Related

Placement-new vs new-expression

Again with placement new I've found an example on this forum like this:
char *buf = new char[sizeof(string)]; // pre-allocated buffer
string *p = new (buf) string("hi"); // placement new
string *q = new string("hi"); // ordinary heap allocation
But I think here buf is a pointer to an allocated and Constructed dynamic array of default-init characters. So the characters in the array are default initialized and have an indeterminate values.
I guess using the placement new in the second line will constructs objects on the previously constructed array of objects.
Why the user didn't call operator new on the array allocation rather than using new expression?:
char *buf = static_cast<char*>(operator new[](sizeof(string)));
After all I think if buff is a pointer to a dynamic array of non-default-constructible objects then the code will fail to compile using the new expression rather than using the operator new function.
Are my guesses correct?
Here is the link to the original answer:
What uses are there for "placement new"?

Why the user didn't call operator new on the array allocation rather than using new expression?:
We cannot answer that question because we aren't that user. You should ask that from the user - though given that the example was written in 1998 it might not be easy to contact them. My guess: They didn't know that non-placement operator new exists or they didn't know what it is used for. Reusing the memory of an array of char is an intuitive choice in such case.
Note that the example of creating a singular dynamic std::string object makes little sense in the first place (I'm assuming that's what string in the example is).
I have a similar question to you: Why are you using operator new[] in your suggestion and not operator new? Even more importantly, why not use an allocator?
Are my guesses correct?
Correct.
Correct.
This is a question and not a guess. I covered it above.
It would fail. But that's irrelevant since char is default constructible.

char is an object type that is both fundamental and trivial. Creating one doesn't, in practice, touch memory, and making an array of them does not either.
char* foo = new char[10];
and
char *foo = static_cast<char*>(operator new[](10));
end up doing exactly the same thing in machine code, except the second one is a lot more verbose.
There are some subtle differences in the abstract machine; in one a bunch of chars are created, in the other the other they are not on that line. Coming up with a case where that matters is going to require a fair bit of language lawyering effort (I am thinking disposal may be different, and some access might be different, especially in standard versions before c++ fixed the malloc problem).
After all I think if buff is a pointer to a dynamic array of non-default-constructible objects then the code will fail to compile using the new expression rather than using the operator new function.
Sure, but the cast would be code smell, and the point of buf is to be storage for the later placement new. I guess it already is,
void *foo = operator new[](10);
is less bonkers.
Just because you can static cast does not mean you should.

operator new[](sizeof(string)) that's something odd, its incorrect syntax for creating an object. In best case scenario it creates an object in memory implicitly (if operator new implemented as std::malloc call and object is a POD type), without initializing or constructing one. All you can do in that case is to static_cast<char*>(new string); The offered line just would create a string object in dynamic storage and then make it anonymous by replacing type of pointer by char*.
Thing is, for placement new buf is not necessary to point to dynamic memory. It can be a static buffer.It can be a pointer to memory location within a rather large storage used to store multiple objects, a memory pool. New object would constructed at given location.
Note that in case of placement new std::string's data storage is still behaves as it usually does - it allocates character data in dynamic memory. To use some memory pool, programmer should provide appropriate allocator and that's one of purposes for placement new operator.

No, buf isn't an array of objects. It's an array of characters, so basically an array of bytes. And while it was allocated with an array new, its basically being used as a byte pointer.
The use of placement new is if you want to allocate an object at an exact location, but you want to do so following all the rules of C++ object allocation- so constructors called and vtables set up. The usual use case for this is if you're doing your own custom memory allocation and reusing existing memory addresses. Firmware may use this to reuse memory as a pool. Or an RTOS may use it so that it doesn't exceed memory restrictions for a task.
This is actually a poor example of how its used because of that. You'd never new an array then placement new into it. You'd have a pointer to a block of allocated memory lying around, and you'd use placement new into that.

Where does the memory used for a string go when the string variable is re-assigned a new value?

Say I have this simple function
void foo(string a, string b) {
std::string a_string = a + a;
// reassign a new string to a_string
a_string = b + b + b;
// more stuff
}
Does the memory reserved for a+a get released as soon as a_string is assigned a new string?
Newbie in C++ memory management. I'm still wrapping my head around it.

It depends if it is a short or long string:
std::string a_string = a + a;
// reassign a new string to a_string
a_string = b + b + b;
First, a+a is constructed directly as a_string due to guaranteed copy elison (c++17). No freeing is going on here.
Then, if a_string is short enough it is allocated on stack, without additional heap allocations. This Short String Optimization (SSO) is performed by most compilers, but is not mandated by the standard.
If SSO took place, then this does not free any space in a_string but merely reuses it. The memory goes nowhere:
a_string = b + b + b;
However, if a_string is too long for SSO, then this line does free heap space allocated for the string.
It is clear to see where memory goes when the declaration of std::string is examined:
template< 
   class CharT,
class Traits = std::char_traits<CharT>,
  class Allocator = std::allocator<CharT>
> class basic_string;
The memory of the non-SSO string is allocated and freed with std::allocator. This allocator allocates and frees memory with the new and delete operaors. Usually they use malloc/free behind the scenes. This is how malloc and free work.
It is easy to find out how big as SSO string can be by running
std::cout << std::string().capacity() << '\n';
For clang 8.0.0 on 64 bit Intel SSO is for strings up to 22 chars, and for gcc 8.3 it is only 15 chars.

From what I know the string is just copied to memory allocated for a_string since it is more efficient than freeing memory and allocating new one. If the memory allocated for (a+a) is less than size allocated for a_string it gets resized.
String manages the memory for you. You do not have to ever allocate buffer space when you add or remove data to the string. If you add more than will fit in the currently-allocated buffer, string will reallocate it for you behind the scenes.
std::string and its automatic memory resizing

std::string is a ressource manager class. It owns the underlying char*. The allocated memory gets deleted as soon as the destructor of the std::string object is called, i.e. when the object itself goes out of scope.
The same thing happens when std::string is reassigned. The old memory will be freed if it's to small for the new string and potentially replaced with new heap memory. Ressource Management classes such as std::string have a strong guarantee to never leak memory (assuming standard conforming implementation).
Because you are assigning a temporary, no copy (and thus no reallocation) needs to take place. Instead, the contents of the temporary b + b + b will me moved into the string. This means copying the underlying pointer of the temporary into the existing string object and stripping the temporary of the ownership of that pointer. This means, that the temporary no longer owns the memory and thus will not delete it when its destructor is called directly after the assignment. This has the tremendous advantage, that only a pointer needs to be copied instead of a memcpy of the complete string.
Other ressource management classes with that guarantee include smart pointers (std::unique_pointer, std::shared_pointer) and collections (e.g. std::vector, std::list, std::map...).
In modem C++ it's rarely necessary to do this kind of memory management manually. The mentioned ressource management classes cover most cases and should be preferred over manual memory management nearly all of the time unless it's really necessary and you know exactly what you are doing.

No, a string works a bit like a vector in C++ in that once the space is reserved in memory, it will not be released unless explicitly told to do so, or it passes its max capacity. This is to avoid resizing as much as possible, because doing so would mean allocating a new array of characters, copying over the necessary values, and deleting the old array. By maintaining the reserved memory, removing a character doesn't require making a brand new array, and the reallocation only needs to happen when the string isn't large enough to contain what you're trying to put in it. Hope that helps!

Notice than you have a lot of allocations, starting with arguments passed by value instead of by const reference.
Then a+a that you cannot really avoid.
then the interesting part:
(b + b) + b which create 2 allocations for the 2 temporaries.
then you use the move assignment which replace a_string by the temporary: No allocation.

String management in C++

I've decided to come back to C++ after some time spent in Java and now I'm quite confused about how strings work in C++.
To start with, suppose we have a function:
void fun() {
int a = 1;
Point b(1,2);
char c[] = "c-string";
}
As I understand, a and b are allocated on the stack. c (the pointer) is allocated on the stack too, but the contents ("c-string") live happily on the heap.
Q1: Are the contents of c automatically deallocated when the function fun ends?
Secondly let's suppose we have a c++ string:
void fun2() {
(1) string s = "c++ string";
(2) s += "append";
(3) s = "new contents";
(4) s = "a" + s + "c";
}
String documentation isn't too specific about how the strings work, so here are the questions:
Q2: Are the contents of s automatically deallocated after fun2 ends?
Q3: What does happen when we concatenate two strings? Should I care about memory usage? (line 2)
Q4: What happens when we overwrite the contents of a string (line 3) - what about memory, should I worry? Is the originally allocated space reused?
Q5: What if I construct a string like this (line 4). Is it expensive? Are string literals ("a","c") pooled (like in Java) or repeated throughout the final executable?
What I am ultimately trying to learn is how to correctly use strings in C++.
Thanks for reading this,
Queequeg

As I understand, a and b are allocated on the stack. c (the pointer) is allocated on the stack too, but the contents ("c-string") live happily on the heap.
That's wrong, they all live in automatic memory (the stack). Even the char array. In C++, a string is an object of type std::string.
Q1: Are the contents of c automatically deallocated when the function fun ends?
Yes.
Q2: Are the contents of s automatically deallocated after fun2 ends?
Yes.
Q3: What does happen when we concatenate two strings? Should I care about memory usage? (line 2)
They are concatenated, and the memory is managed automatically. (assuming we're talking about std::string and not char[] or char*.
Q4: What happens when we overwrite the contents of a string (line 3) - what about memory, should I worry? Is the originally allocated space reused?
Implementation detail. It can be reused, it can be re-allocated if the previous memory can't hold the new contents.
Q5: What if I construct a string like this (line 4). Is it expensive? Are string literals ("a","c") pooled (like in Java) or repeated throughout the final executable?
String literals can be pooled, but it's not required. For large concatenation, it's usual to use a std::stringstream instead (similar to Java). But profile first, don't do premature optimizations. Not that neither of those are string literals though.
char* pStr = "this is a string literal";
This resides in read-only memory and can't be modified.
What I am ultimately trying to learn is how to correctly use strings in C++.
Use a std::string.

c is not a pointer. It is an array. You can tell that it's an array because it has square brackets, while pointers have a star. Since c is an automatic variable, it does not require any manual lifetime or memory management.
Ad Q2: s is also an automatic variable, and since it is of a well-designed class type, this means you don't need to take care of anything manually.
Ad Q3: Local string objects are suitably modified to contain the new string; in the process, there may or may not exist temporary string objects for the duration of the concatenation expression. (This applies only to line 4; there's no temporary in line 2.)
Ad Q4: Everything is fine and works as expected; see Q2. The original memory may or may no be used, depending on the details of the assignment. In your example, the original memory would probably be overwritten; in a case like s = std::string("hello");, the buffers of the two strings would probably be swapped.
Ad Q5: String literals are read-only global constants, which the compiler may implement any way it likes. The details are not so important; you will definitely end up with the desired string object in s. See Q3 re temporary objects.
To "learn how to use strings in C++", just go and use them. Treat them like integers. It'll be correct. The beauty of the standard library is that you really don't need to know "how things work"; when you use standard library classes in an idiomatic C++ fashion, all resource management is done automatically and efficiently for you.

Q1: Yes, it is de-allocated. The character array resides on the functions's stack.
Q2: Yes, std::string takes care of releasing all of its resources on destruction, which happens on leaving scope, as it does for all automatically allocated variables.
Q3: No, you should not worry, unless profiling tells you you should.
Q4: You should not worry, and the original space may or may not be re-used. In any case, all space used by the strings is de-allocated on exiting the function.
Q5: Given the optimizations the compiler is allowed to make the only way to determining for sure whether X is more expensive than Y is to profile both of them.

Q1 Should to note that literals "c++ string","append","new contents","a","c" are in static memory

Use of : Construction of objects at predetermined location in C++

What is the use of Construction of objects at predetermined locations in C++?
The following code illustrates Construction at predetermined location-
void *address = (void *) 0xBAADCAFE ;
MyClass *ptr = new (address) MyClass (/*arguments to constructor*/) ;
This eventually creates object of MyClass, at the predetermined "address".
(Assuming storage pointed by address is fairly large enough to hold MyClass object).
I would like to know the use of creating objects at such predetermined locations in memory.

One scenario where placement new is useful is:
You can preallocate big buffer once and then use many placement new operators.
This gives you better performance(You don't need reallocations everytime) and less fragmented memory (when you need small memory chunks). Typically this is what an std::vector imlementation uses.
The downside is, You have to manually manage the allocated memory. Objects allocated by placement new require an explicit destructor invocation once they are not needed anymore.
Given that it is always advicable to profile your application for bottle necks instead of running over to placement new for pre-optimization.

There are mainly two cases:
The first is when -for example in an embedded system- you have to construct an object in a given well-known place.
The second is when you want -for some reason- to manage memory in a way other than the default.
In C++, an expression like pA = new(...) A(...) does two consecutive things:
calls the void* operator new(size_t, ...) function and subsequently
calls A::A(...).
Since calling new is the only way to call A::A(), adding parameters to new allows to specialize different way to manage memory. The most trivial is to "use memory already obtained by some other means".
This method is fine when the allocation and construction needs to be separated. The typical case is std::allocator, whose purpose is allocate uninitialized memory for a given quantity, while object contruction happens later.
This happens, for example, in std::vector, since it has to allocate a capacity normally wider than its actual size, and then contruct the object as they are push_back-ed in the space that already exist.
In fact the default std::allocator implementation, when asket to allocate n object, does a return reinterpret_cast<T*>(new char[n*sizeof(T)]), so allocating the space, but actually not constructing anything.
Admitting that std::vector stores:
T* pT; //the actual buffer
size_t sz; //the actual size
size_t cap; //the actual capacity
allocator<T> alloc;
an implemetation of push_back can be:
void vector<T>::push_back(const T& t)
{
if(sz==cap)
{
size_t ncap = cap + 1+ cap/2; //just something more than cap
T* npT = alloc.allocate(ncap);
for(size_t i=0; i<sz; ++i)
{
new(npT+i)T(pt[i]); //copy old values (may be move in C++11)
pt[i].~T(); // destroy old value, without deallocating
}
alloc.deallocate(pt,cap);
pT = npT;
cap = ncap;
// now we heve extra capacity
}
new(pT+sz)T(t); //copy the new value
++sz; //actual size grown
}
In essence, there is the need to separate the allocation (that relates to the buffer as a whole) with the construction of the elements (that has to happen in the already existent bufffer).

Usually you use predetermined locations in embedded or driver code, where some hardware is addressed via certain address ranges.
But in this case the storage at that address isnt used for accessing, or better it is not intended (or better dont have to be used for it, as you dont know that the new operator is doing with it), as later on the new operation is executed.
You use it as initialization value (with new not really changing it).
There come two purposes to my mind: First, in case you forgot later on a new, you instantly see in the debugger your magic address (i.e. in this case 0xBAAADCAFE).
Secondly you can use in case you fiddle around with the new operator and need an init value, so you can debug it (e.g. you can see changes).
Or you have modified your new operator that it makes whatever with that magic number (e.g. you can use it for debugging, or, like mentioned above, to really indeed use memory at a specific address for certain hardware), switch between different allocation methods, ...
EDIT: To answer it in this case correct, one needs to see what the new operator really does, you should check that news source code.

This particular behaviour is useful when you know the address of a class by having a long, DWORD, DWORD_PTR or otherwise sized pointer passed as an argument to a function and need to reconstruct a copy of the class for O-O use.
Alternatively, this could also be used to create a class in pre-allocated memory or a location which you have determined is static (ie: you are linking your application with some ancient ASM libraries).

Custom allocators, realtime (no lock here), and performance.

std::vector - how to free the memory of char* elements in a vector?

Consider the following C++ codes :
using namespace std;
vector<char*> aCharPointerRow;
aCharPointerRow.push_back("String_11");
aCharPointerRow.push_back("String_12");
aCharPointerRow.push_back("String_13");
for (int i=0; i<aCharPointerRow.size(); i++) {
cout << aCharPointerRow[i] << ",";
}
aCharPointerRow.clear();
After the aCharPointerRow.clear(); line, the character pointer elements in aCharPointerRow should all be removed.
Is there a memory leak in the above C++ code ? Do I need to explicitly free the memory allocated to the char* strings ? If yes, how ?
Thanks for any suggestion.

Is there a memory leak in the above C++ code?
There is no memory leak.
Since you never used new you do not need to call delete. You only need to deallocate dynamicmemory if it was allocated in first place.
Note that ideally, You should be using vector of std::string.
std::vector<std::string> str;
str.push_back("String_11");
str.push_back("String_12");
str.push_back("String_13");
You could use std::string.c_str() in case you need to get the underlying character pointer(char *), which lot of C api expect as an parameter.

You are pushing in your vector string literals (the strings in "..."). These aren't allocated by you. They are given to you by the C++ compiler/runtime and they have a lifetime equal to the lifetime of the app, so you can't/mustn't free them.
See for example Scope of (string) literals
Note that everything I told you was based on the fact that you are using string literals. If you need to allocate your strings' memory, then you will have to use some automatic deallocators like std::unique_ptr (of C++11) or boost::unique_ptr or boost::shared_ptr (of Boost) or better use the std::string class as suggested by Als

The sample has no leak, since the pointer you give don't refer to dynamic memory.
But is also a bad written code: string literals are constant, but C++ allow to refer them as char* to retain a C library backward compatibility. If you intend to refer to string literals, you should better use const char* instead of char* (in case of an attempt to modify them you got a compiler error, not a runtime exception)
Another bad thing, here, is that in a more extensive code, you sooner or later lose the control on what are the char* effectively stored in the vector: Are they granted to always be string literals or can they also be some other way allocated dynamic char[] ?? And who is responsible for their allocation / deallocation ?
std::vector says nothing about that, and if you are in the position you cannot give a clean answer to the above questions (each const char* referred buffer can be either exist outside the vector existence scope or not), you have probably better to use std::vector<std::string>, and treat the strings as "values" (not referenced objects), letting the string class to do the dirty job.

There is no leak. As long as you're not making a copy of those strings, you don't need to explicitly delete or free() them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js