If I have a string member within a struct that's then stored into an array, how does memory get allocated?
struct garage {
int ncars;
int nspaces;
int nmechanics;
string name;
}
But for that last member, name, string is basically a typedef of basic_string, so its memory gets allocated when it gets defined, right? For example: garage.name = "Cool Cars";
But if I don't define that member YET, and store the struct in an array:
garage nearby_garages[15];
garage g0, g1, g2;
nearby_garages[0] = g0; nearby_garages[1] = g1; nearby_garages[2] = g2;
garage current;
current = nearby_garage[1];
current.name = "Jack's Garage";
string size can vary depending on the length of the string/data. struct size can vary depending on string size, which means the array size can vary depending on struct size, but then the array would fall apart if it was pre-allocated. The only way I can see this working is if string is a pointer to a memory location not sandwiched within the struct. But I don't think that is what's happening here. Help please?
Your garage only has references so your array can be allocated on the stack with no problem. Internally however, std::string does new/malloc to create memory for your data.
Your garage then holds a reference to a string which holds a pointer to a chunk of memory containing your data. Nothing breaks here because the garage knows at creation that the string will have a pointer to data so the pointer already has a space for it.
When you include literals such as "Jack's Garage", the compiler creates a special place to hold those strings, they are not allocated in the same memory segment.
Finally, when you call current.name = "Jack's Garage", C++ will determine that it needs a conversion between a const char* to a std::string. Fortunately for all of us, such a conversion exists. Your assignment is then transformed to
current.name = std::string("Jack's Garage");
Then the assignment operator of std::string will copy the value to current.name. New memory will be allocated inside garage to hold that value and (probably) that memcpy will be called at a lower level.
std::string is similar in implementation to an std::vector: Essentially a pointer and size, two pointers (begin and end), or one pointer and the ability to query allocator block sizes.
In some cases, it may also implement SSO (Small String Optimization) where the string structure itself has a small buffer for short strings, and switches to using a pointer for longer strings.
Without SSO, the backing store for characters owned by an std::string is allocated upon construction or assignment with a literal (or with another string, if the implementation isn't COW), or re-allocated during a concatenation.
In your code above, current.name = "Jack's Garage", would be the allocation site (without SSO in this case).
Related
If a vector always provides contiguous memory storage, how does the compiler allocate memory to empty std::strings?
I have a vector to which I've pushed a number of classes with std:string as a private member. I then pass a reference to the vector as an argument to another method.
Is the string's data elsewhere in the heap referenced from the vector's contiguous array?
Allocating memory for std::string is trivial.
Internally, it'll have some sort of pointer that points to a block of memory in which the actual string data will be stored. So, allocating memory for a std::string is simply a matter of allocating space for a pointer, a size_t or something, and maybe a couple more primitives.
If you have a std::vector<std::string> for example, it's easy for the vector to allocate space for the std::string's because they're just k bytes each for some constant k. The string data will not be involved in this allocation.
The details of what happens really in memory in this case are quite dependent on the specific STL implementation you're using.
Having said that, my impression is that in most implementations vector and string are implemented with something like (very simplified):
template<typename T>
class vector
{
//...
private:
T* _data;
};
class string
{
private:
char _smallStringsBuffer[kSmallSize];
char* _bigStringsBuffer;
};
The vector's data is dynamically allocated on the heap based on the capacity (which has got a default value when default-initialized, and grows while you add elements to the vector).
The string's data is statically allocated for small strings (implementation-dependent value of "small"), and then dynamically when the string becomes bigger. This is the case for a number of reasons but mostly to allow more efficient handling of small strings.
The example you described is something like:
void MyFunction(const vector<string>& myVector)
{
// ...
}
int main()
{
vector<string> v = ...;
// ...
MyFunction(v);
// ...
return 0;
}
In this particular case only the basic data of the vector v will be in the stack, as v._data will be allocated on the heap. If v has capacity N, v._data's size in the heap will be sizeof(string) * N, where the size of the string is a constant that will depend on kSmallSize * sizeof(char) + sizeof(char*), based on the definition of the string above.
As for contiguous data, only if all strings collected in the vector have fewer characters than kSmallSize, will their data be "almost" contiguous in memory.
This is an important consideration for performance-critical code, but to be honest I don't think that most people would rely on standard STL's vectors and strings for such situations, as the implementation details can change over time and on different platforms and compilers. Furthermore, whenever your code goes out of the "fast" path, you won't notice except with spikes of latency that are going to be hard to keep in check.
Suppose we have the following:
string_class s1("hello");
string_class s2("goodbye");
If the internal representation of the string_class string is a c string, what happens to memory allocation when you swap the values? For example, let's say string_class allocates char* c_str_s1 = new char[5], but char* c_str_s2 = new char[10] (because, say, after 5 the size doubles). If we do something like std::swap(c_str_s1, c_str_s2), is the memory allocated for each c string swapped, or is minimum allocation given to each?
The pointers are swapped as-they-are: that means each of them after the swap will point to the memory allocated for the other. The contents of memory are not supposed to be changed in any other way.
When you swap std::strings, they internally exchange fields including char* pointers, no allocation is performed.
EDIT I missed the point that you are not using std::strings, but you should consider doing it anyway.
I found this example of using placement new in C++, and it doesn't make sense to me.
It is my view that this code is exception-prone, since more memory than what was allocated may be used.
char *buf = new char[sizeof(string)];
string *p = new (buf) string("hi");
If "string" is the C++ STD::string class,then buf will get an allocation
the size of an empty string object (which with my compiler gives 28 bytes),
and then the way I see it if you initialize your string with more chars you might
exceed the memory allocated. For example:
string *p = new (buf) string("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii");
On my VS this seems to be working nevertheless, and I'm not sure if this is because
the exception is somehow waived or I simply don't understand how string works.
Can someone help clarify?
You're misunderstanding the (typical) internal implementation of std::string. Usually it's implemented something like this:
class string {
protected:
char *buffer;
size_t capacity;
size_t length;
public:
// normal interface methods
};
The key point is that there are two distinct blocks of memory: one for the string object itself, containing the members shown above, and one for the content of the string. When you do your placement new, it's only the string object that is placed into the provided memory, not the memory for buffer, where the content of the string is stored. That is allocated separately, automatically, by the string class as needed.
The size returned by sizeof is the number of bytes required to store the members of the class, with some implementation-defined padding. That memory must be allocated before the constructor of std::string can be called.
However, when the constructor runs, it may allocate a larger amount of memory, which indeed it must in order to store large strings. That amount of memory is not part of the sizeof size, and you don't need to allocate it yourself.
I'm a beginner to C++, I've got the following piece of code:
struct Airline {
string Name;
int diameter;
int weight;
};
Airline* myPlane = new Airline;
my question is when I call the method new it allocates memory, if I recall correctly. How does the PC know how much memory to allocate,especially given that there is a string type in there?
Thanks
An std::string object is fixed-size; it contains a pointer to an actual buffer of characters along with its length. std::string's definition looks something like
class string
{
char *buffer;
size_t nchars;
public:
// interface
};
It follows that your Airline objects also have a fixed size.
Now, new does not only allocate; it also initializes your object, including the std::string, which means it probably sets the char pointer to 0 because the string is empty.
You can also get the size of the structure, by using sizeof:
cout << "sizeof(Airline) = " << sizeof(Airline) << endl;
This is because the compiler knows the fields inside the structure, and adds up the sizes of each structure member.
The string object is no different than your structure. It is actually a class in the standard library, and not a special type like int or float that is handled by the compiler. Like your structure, the string class contains fields that the compiler knows the size of, and so it knows the size of your complete structure and uses that when you use new.
The call to new will allocate sizeof(Airline) which is what is needed to hold an object of type Airline.
As of the management for strings, the string object holds some internal data to manage the memory of the actual data stored, but not the data itself (unless the small object optimization is in use). While the idea is the same that has been pointed by others with stores a pointer to the actual string, that is not precise enough, as it implementations will store that pointer plus extra data required to hold the size() and capacity() (and others, like reference counts in reference counting implementations).
The memory for the string may or may not be within class string. Possible (and probably), class string will manage its own memory, having only a pointer to the memory used to store the data. Example:
struct Airlane {
String Name {
char *data; // size = 4
size_t size; // size = 4
}
int diameter; // size = 4
int weight; // size = 4
}; // size = 16
Note that those are not necessarily actual sizes, they are just for example.
Also note that in C++ (unlike C, for example), for every class T, sizeof T is a compile time constant, meaning that objects can never have dynamic size. This in effect means: As soon as you need runtime dynamic sized data, there have to be external (w.r.t. the object) memory areas. This may imply the use of standard containers like std::string or std::vector, or even manually managed resources.
This in turn means, operator new does not need to know the dynamic size of all members, recursively, but only the size of the outermost class, the one that you allocate. When this outer class needs more memory, it has to manage it itself. Some exemplary p-code:
Airline* myPlane = new Airline {
Name = {
data = new char[some-size]
...
}
...
}
The inner allocations are done by the holding constructors:
Airline::Airline() : string(), ... {}
string::string () : data(new char[...] ... {}
operator new does nothing else but to allocate some fixed size memory as the "soil" for Airline (see first p-code), and then "seeds" Airlines constructor, which itself has to manage its lifetime in that restricted volume of "soil", by invoking the string constructor (implicitly or explicitly), which itself does another new.
When you allocate Airline, new will allocate enough space on the heap for two ints, string and its fields.
A string will always be the same size on the stack. However, internally, the string stores a pointer to a character array.
My code converts C++ strings to C strings somewhat often, and I am wondering if the original string is allocated on the stack. Will the C string be allocated on the stack as well? For instance:
string s = "Hello, World!";
char* s2 = s.c_str();
Will s2 be allocated on the stack, or in the heap? In other words, will I need to delete s2?
Conversely, if I have this code:
string s = new string("Hello, mr. heap...");
char* s2 = s.c_str();
Will s2 now be on the heap, as its origin was on the heap?
To clarify, when I ask if s2 is on the heap, I know that the pointer is on the stack. I'm asking if what it points to will be on the heap or the stack.
string s = "Hello world";
char* s2 = s.c_str();
Will s2 be allocated on the stack, or in the heap? In other words... Will I need to delete s2?
No, don't delete s2!
s2 is on the stack if the above code is inside a function; if the code's at global or namespace scope then s2 will be in some statically-allocated dynamically-initialised data segment. Either way, it is a pointer to a character (which in this case happens to be the first 'H' character in the null-terminated string_ representation of the text content of s). That text itself is wherever the s object felt like constructing that representation. Implementations are allowed to do that however they like, but the crucial implementation choice for std::string is whether it provides a "short-string optimisation" that allows very short strings to be embedded directly in the s object and whether "Hello world" is short enough to benefit from that optimisation:
if so, then s2 would point to memory inside s, which will be stack- or statically-allocated as explained for s2 above
otherwise, inside s there would be a pointer to dynamically allocated (free-store / heap) memory wherein the "Hello world\0" content whose address is returned by .c_str() would appear, and s2 would be a copy of that pointer value.
Note that c_str() is const, so for your code to compile you need to change to const char* s2 = ....
You must notdelete s2. The data to which s2 points is still owned and managed by the s object, will be invalidated by any call to non-const methods of s or by s going out of scope.
string s = new string("Hello, mr. heap...");
char* s2 = s.c_str();
Will s2 now be on the heap, as its origin was on the heap?
This code doesn't compile, as s is not a pointer and a string doesn't have a constructor like string(std::string*). You could change it to either:
string* s = new string("Hello, mr. heap...");
...or...
string s = *new string("Hello, mr. heap...");
The latter creates a memory leak and serves no useful purpose, so let's assume the former. Then:
char* s2 = s.c_str();
...needs to become...
const char* s2 = s->c_str();
Will s2 now be on the heap, as its origin was on the heap?
Yes. In all the scenarios, specifically if s itself is on the heap, then:
even if there's a short string optimisation buffer inside s to which c_str() yields a pointer, it must be on the heap, otherwise
if s uses a pointer to further memory to store the text, that memory will also be allocated from the heap.
But again, even knowing for sure that s2 points to heap-allocated memory, your code does not need to deallocate that memory - it will be done automatically when s is deleted:
string* s = new string("Hello, mr. heap...");
const char* s2 = s->c_str();
// <...use s2 for something...>
delete s; // "destruct" s and deallocate the heap used for it...
Of course, it's usually better just to use string s("xyz"); unless you need a lifetime beyond the local scope, and a std::unique_ptr<std::string> or std::shared_ptr<std::string> otherwise.
c_str() returns a pointer to an internal buffer in the string object. You don't ever free()/delete it.
It is only valid as long as the string it points into is in scope. In addition, if you call a non-const method of the string object, it is no longer guaranteed to be valid.
See std::string::c_str
std::string::c_str() returns a const char*, not a char *. That's a pretty good indication that you don't need to free it. Memory is managed by the instance (see some details in this link, for example), so it's only valid while the string instance is valid.
Firstly, even your original string is not allocated on the stack, as you seem to believe. At least not entirely. If your string s is declared as a local variable, only the string object itself is "allocated on the stack". The controlled sequence of that string object is allocated somewhere else. You are not supposed to know where it is allocated, but in most cases it is allocated on the heap. I.e. the actual string "Hello world" stored by s in your first example is generally allocated on the heap, regardless of where you declare your s.
Secondly, about c_str().
In the original specification of C++ (C++98) c_str generally returned a pointer to an independent buffer allocated somewhere. Again, you are not supposed to know where it is allocated, but in general case it was supposed to be allocated on the heap. Most implementations of std::string made sure that their controlled sequence was always zero-terminated, so their c_str returned a direct pointer to the controlled sequence.
In the new specification of C++ (C++11) it is now required that c_str returns a direct pointer to the controlled sequence.
In other words, in general case the result of c_str will point to a heap-allocated memory even for local std::string objects. Your first example is not duifferent from your second example in that regard. However, in any case the memory pointed by c_str() is not owned by you. You are not supposed to deallocate it. You are not supposed to even know where it is allocated.
s2 will be valid as long as s remains in scope. It's a pointer to memory that s owns. See e.g. this MSDN documentation: "the string has a limited lifetime and is owned by the class string."
If you want to use std::string inside a function as a factory for string manipulation, and then return C-style strings, you must allocate heap storage for the return value. Get space using malloc or new, and then copy the contents of s.c_str().
Will s2 be allocated on the stack, or in the heap?
Could be in either. For example, if the std::string class does small string optimization, the data will reside on the stack if its size is below the SSO threshold, and on the heap otherwise. (And this is all assuming the std::string object itself is on the stack.)
Will I need to delete s2?
No, the character array object returned by c_str is owned by the string object.
Will s2 now be on the heap, as its origin was on the heap?
In this case the data will likely reside in the heap anyway, even when doing SSO. But there's rarely a reason to dynamically allocate a std::string object.
That depends. If I remember correctly, CString makes a copy of the input string, so no, you wouldn't need to have any special heap allocation routines.