Default capacity of std::string? - c++

When I create a std::string using the default constructor, is ANY memory allocated on the heap? I'm hoping the answer does not depend on the implementation and is standardized. Consider the following:
std::string myString;

Unfortunately, the answer is no according to N3290.
Table 63 Page 643 says:
data() a non-null pointer that is copyable and can have 0 added to it
size() 0
capacity() an unspecified value
The table is identical for C++03.

No, but, and I don't know of any implementation that does allocate memory on the heap by default. Quite a few do, however, include what's called the short string optimization (SSO), where they allocate some space as part of the string object itself, so as long as you don't need more than that length (seems to be between 10 and 20 characters as a rule) it can avoid doing a separate heap allocation at all.
That's not standardized either though.

It is implementation dependent. Some string implementations use a small amount of automatically allocated storage for small strings, and then dynamically allocate more for larger strings.

It depends on the compiler. Take a look here, there is a good explanation:
http://www.learncpp.com/cpp-tutorial/17-3-stdstring-length-and-capacity/

Generally, yes they allocate memory on the heap. I'll give an example: c_str() requires a NULL trailing character '\0'. Most implementations allocate this NUL \0 ahead of time, as part of the string. So you'll get at least one byte allocated, often more.
If you really need specific behavior I'd advise writing your own class. Buffer/string classes are not that hard to write.

Related

Difference between std::string(size, '\0') and s.resize(size)?

Unlike std::vector, std::string does not provide a unary constructor that takes a size:
std::string s(size); // ERROR
Is there any difference between:
std::string s(size, '\0');
and
std::string s;
s.resize(size);
in terms of their performance on common implementations?
Will resize initialize the string to all zero characters or will it leave them an unspecified value?
If all zero, is there any way to construct a string of a given size, but leave the characters with an unspecified value?
There is a difference, as in std::string s(size, '\0');, all of the memory needed for the string can be allocated at once. However, with the second example, if size is greater than the amount of characters stored for small string optimization, an extra allocation may have to be performed, although this is implementation defined, and will definitely not be more performant in that regard in a standard-compliant C++ 17 implementation. However, the first example is more consise, and may be more performant, so it is probably preferable. When calling s.resize(size);, all new characters will be initialized with char's default constructor, aka '\0'. There is no way to initialize a string with unspecified values.
The actual answer would be implementation-based, but I'm fairly sure that std::string s(size, '\0'); is faster.
std::string s;
s.resize(size);
According to the documentation for std::string.
1) Default constructor. Constructs empty string (zero size and unspecified capacity).
The default constructor will create a string with an "unspecified capacity". My sense here is that the implementation is free to determine a default capacity, probably in the realm of 10-15 characters (totally speculation).
Then in the next line, you will reallocate the memory (resize) with the new size if the size is greater than the current capacity. This is probably not what you want!
If you really want to find out definitively, you can run a profiler on the two methods.
There is already a good answer from DeepCoder.
For the records however, I'd like to point out that strings (as for vectors) there are two distinct notions:
the size(): it's the number of actual (i.e. meaningful) characters in the string. You can change it using resize() (to which you can provide a second parameter to say what char you want to use as filler if it should be other than '\0')
the capacity(): it's the number of characters allocated to the string. Its at least the size but can be more. You can increase it with reserve()
If you're worried about allocation performance, I believe it's better to play with the capacity. The size should really be kept for real chars in the string not for padding chars.
By the way, more generally, s.resize(n) is the same as s.resize(n, char()). So if you'd like to fill it on the same way at construction, you could consider string s(n, char()). But as long as you don't use basic_string<T> for T being different from characters, your '\0' just does the trick.
Resize does not leave elements uninitialized. According to the documentation: http://en.cppreference.com/w/cpp/string/basic_string/resize
s.resize(size) will value-initialize each appended character. That will cause each element of the resized string to be initialized to '\0'.
You would have to measure the performance difference of your specific C++ implementation to really decide if there's a worthwhile difference or not.
After looking at the machine generated by Visual C++ for an optimized build, I can tell you the amount of code for either version is similar. What seems counter intuitive is that the resize() version measured faster for me. Still, you should check your own compiler and standard library.

Does the standard guarantee, that std::string::resize will not do reallocate memory, if the new size is less than or equal to as the old one?

I need to frequently make a string empty and then append some chars into it.
std::string::clear() may realloc
Does std::string::resize(0) do realloc? The standard's words didn't garentee any about it.
I think the best possible answer to this is the "Notes" section at http://en.cppreference.com/w/cpp/string/basic_string/clear.
Unlike for std::vector::clear, the C++ standard does not explicitly require that capacity is unchanged by this function, but existing implementations do not change capacity.
And if the capacity is unchanged, that would almost certainly mean that no allocation or freeing functions are called. That's probably the best you can do, short of looking at each and every implementation you care about.

Is the size of a dynamically-allocated array stored somewhere?

It seems to me that delete[] knows the size of a dynamic allocated array. My question is: Is there any way to get it out so that we don't need to provide the size explicitly when coding.
The method used by delete[] to figure out how many items it has to deal with is implementation dependent. You can't get to it or use it in any way.
Read C++ FAQ [16.14] After p = new Fred[n], how does the compiler know there are n objects to be destructed during delete[] p? (and the whole section for a general idea on free store management.)
My question is: Is there any way to get it out so that we don't need to provide the size explicitly when coding.
You don't need to, you just call delete [], with no size.
The way the compiler stores the size is an implementation detail and no specified. Most store it in some memory right before the array starts (not after, as others mentioned).
See this related question : How does delete[] "know" the size of the operand array?
Edited:
Since delete [] needs to call destructors for all elements of the array, the length must definitely be stored somewhere. As of why this memory is not accessible to prevent errors such as walking outside of the array due to its unknown size - I am not really sure. Strictly speaking, the length of statically allocated arrays must be known during compile time and the length of dynamically allocated arrays must be stored by the runtime, so in both cases buffer overflow errors are theoretically 100% preventable, and yet both static and dynamic arrays are unsafe. My guess is this is for performance purposes, bounds checking will make it slower and raw (C style) arrays offer best performance at zero safety.
The implementation of this varies with compiler and runtime vendors, there might be some implementations it may be accessible and usable, but it wouldn't be considered standard and recommended practice. The logical place for the length to be stored is somewhere in the header of the allocated memory fragment before the actual address you will get for the first element of the array.
The C++ compiler has the size of dynamically allocated arrays buried deep somewhere; however, this is not accessible in any way while coding in C++ - so you'll have to store the size somewhere after allocation.
[Edit]: Though some versions of the Visual Studio compiler suite stores the size at the index -1, this is not to be trusted across compilers, or to be used at all when coding.
I think it is compiler dependent and you cannot get it for your application to use. Following link shows 2 methods compiler uses.
http://www.parashift.com/c++-faq-lite/compiler-dependencies.html#faq-38.7
http://www.parashift.com/c++-faq-lite/compiler-dependencies.html#faq-38.8
Compilers follow different approaches to store the memory allocated on new.
This is one of the approach, I read somewhere.
When compiler allocates memory based on a new call, it sets apart one extra byte, maybe in the beginning, where it will store how much memory was allocated.
So when it encounters a delete call, it will use this stored value to decide how much memory has to be de-allocated.

Size of memory allocated on heap

Can you check the size memory allocated on heap if the buffer contains '0' characters?
char *c = new char[6]; //random size memory
memset(c, 0, 6);
There's no reliable way to do that - you have to store that information yourself.
operator new[]() function can be implemented (and replaced by you) in whatever way so you just can't know the size unless you know the exact implementation in details.
In Visual C++ the default implementation for built-in types is to just forward calls to malloc() - then you could try _msize(), but again it's unportable and maybe even unreliable.
No, in general1 you can't. You have to store this information separately.
If you need to use that memory as a string or as an array, my advice is to use a std::string or std::vector, which do all this bookkeeping by themselves.
1. i.e. "as far as the standard is concerned"
I see that your question is MSVC++-specific; in that case, some heap-debugging helpers are provided, but they work only when the project is compiled in debug mode; I think there's some other compiler-specific function to get the allocated size, but it wouldn't work if custom allocators are used.
On the other hand, APIs like LocalAlloc let you know how big is the allocated chunk of memory (see e.g. LocalSize).
But again, I think that it's a cleaner design to keep track of this information by yourself.
No. You need to store the amount of allocated memory as a separate variable, and you need to take it with you whenever you want to do something with your allocated structure. This is cumbersome, but may be fast. As a safe and comfortable replacement use std::vector, boost::array, etc.

How to allocate more memory for a buffer in C++?

I have pointer str:
char* str = new char[10];
I use the memory block str points to to store data.
How can I allocate more bytes for the buffer pointed to by str and not lose old data stored in the buffer?
Use std::string instead. It will do what you need without you worrying about allocation, copy etc. You can still access the raw memory via the c_str() function.
Even std::vector<char> will work well for you.
new[] another buffer, copy the data there (use memcpy() for that), then delete[] the old one, assign the new buffer address to the pointer originally holding the old buffer address.
You cannot using the new construction. For that you need to use the good old malloc, realloc, and free (do not mix malloc/realloc/free and new/delete).
The realloc function is what you are searching for. You had to use malloc/free instead of new/delete to use it
If you are really using C++, the most correct solution would be to use std::vector. I assume that you are not using that information as a standard string, in that case you should use std::string (which is an specialization of std::vector, so no big deal). You are creating at least 10 chars. This gives me the hint that you are probably quite sure that you'll need 10 chars, but maybe you'll nedd more. Maybe you are worried about the performance problems involved in allocating and deallocating memory. In that case, you can create your string and then reserve the estimated capacity that you expect you'll need, so there won't be any reallocation at least until you get to that limit.
int main()
{
std::string s;
s.reserve( 10 );
// do whatever with s
}
As others have already pointed out, the use of std::string or std::Vector will get you the benefit of forgetting about copy, resizing or deleting the reserved memory.
You have to allocate a different, bigger string array, and copy over the data from str to that new string array.
Allocation is a bit like finding a parking place.
You're asking here if it's possible to add a trailer on your car that has been parked for a fews days.
The answer is, in C there exists something called realloc that allows you to do following thing.
If I have already enough place to add my trailer, do so. If not park in another place big enough for your trailer and your car, which is equivalent to copying your data.
In other words you'll get strong and random performance hits.
So what would you do in the real world? If you knew you might need to add some trailers to your car you'd probably park in a bigger place than required. And when exceeding the size required for the place, you'd move your car and your trailers to a place with a nice margin for future trailers.
That's precisely what the STL's string and vector is doing for you. You can even give them a hint of the size of your futures trailer by calling "reserve". Using std::string is probably the best answer to your problem.
You can use realloc: http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/
I would add that this approach is not the favored c++ approach (depending on your needs you could use std::vector<char> for instance).