Can C++ automatic variables vary in size? - c++

In the following C++ program:
#include <string>
using namespace std;
int main()
{
string s = "small";
s = "bigger";
}
is it more correct to say that the variable s has a fixed size or that the variable s varies in size?

It depends on what you mean by "size".
The static size of s (as returned by sizeof(s)) will be the same.
However, the size occupied on the heap will vary between the two cases.
What do you want to do with the information?

i'll say yes and no.
s will be the same string instance but it's internal buffer (which is preallocated depending on your STL implementation) will contain a copy of the constant string you wanted to affect to it.
Should the constant string (or any other char* or string) have a bigger size than the internal preallocated buffer of s, s buffer will be reallocated depending on string buffer reallocation algorithm implemented in your STL implmentation.

This is going to lead to a dangerous discussion because the concept of "size" is not well defined in your question.
The size of a class s is known at compile time, it's simply the sum of the sizes of it's members + whatever extra information needs to be kept for classes (I'll admit I don't know all the details) The important thing to get out of this, however is the sizeof(s) will NOT change between assignments.
HOWEVER, the memory footprint of s can change during runtime through the use of heap allocations. So as you assign the bigger string to s, it's memory footprint will increase because it will probably need more space allocated on the heap. You should probably try and specify what you want.

The std::string variable never changes its size. It just refers to a different piece of memory with a different size and different data.

Neither, exactly. The variable s is referring to a string object.
#include <string>
using namespace std;
int main()
{
string s = "small"; //s is assigned a reference to a new string object containing "small"
s = "bigger"; //s is modified using an overloaded operator
}
Edit, corrected some details and clarified point
See: http://www.cplusplus.com/reference/string/string/ and in particular http://www.cplusplus.com/reference/string/string/operator=/
The assignment results in the original content being dropped and the content of the right side of the operation being copied into the object. similar to doing s.assign("bigger"), but assign has a broader range of acceptable parameters.
To get to your original question, the contents of the object s can have variable size. See http://www.cplusplus.com/reference/string/string/resize/ for more details on this.

A variable is an object we refer to by a name. The "physical" size of an object -- sizeof(s) in this case -- doesn't change, ever. They type is still std::string and the size of a std::string is always constant. However, things like strings and vectors (and other containers for that matter) have a "logical size" that tells us how many elements of some type they store. A string "logically" stores characters. I say "logically" because a string object doesn't really contain the characters directly. Usually it has only a couple of pointers as "physical members". Since the string objects manages a dynamically allocated array of characters and provides proper copy semantics and convenient access to the characters we can thing of those characters as members ("logical members"). Since growing a string is a matter of reallocating memory and updating pointers we don't even need sizeof(s) to change.

i would say this is string object , And it has capability to grow dynamically and vice-versa

Related

C++ string / container allocation

This is probably obvious to a C++ non-noob, but it's stumping me a bit - does a string member of a class allocate a variable amount of space in that class? Or does it just allocate a pointer internally to some other space in memory? E.g. in this example:
class Parent {
public:
vector<Child> Children;
}
class Child {
public:
string Name;
}
How is that allocated on the heap if I create a "new Parent()" and add some children with varying length strings? Is Parent 4 bytes, Child 4 bytes (or whatever the pointer size, plus fixed size internal data), and then a random pile of strings somewhere else on the heap? Or is it all bundled together in memory?
I guess in general, are container types always fixed size themselves, and just contain pointers to their variable-sized data, and is that data always on the heap?
Classes in C++ are always fixed size. When there is a variable sized component, e.g., the elements of a vector or the characters in a string, they may be allocated on the heap (for small strings they may also be embedded in the string itself; this is known as the small string optimization). That is, your Parent object would contain a std::vector<Child> where the Child objects are allocated on the heap (the std::vector<...> object itself probably keeps three words to its data but there are several ways things may be laid out). The std::string objects in Child allocate their own memory. That is, there may be quite a few memory allocations.
The C++ 2011 standard thoroughly defines allocators to support passing an allocation mechanism to an object and all its children. Of course, the classes need to also support this mechanism. If your Parent and Child classes had suitable constructors taking an allocator and would pass this allocator to all members doing allocations, it would be propagated through the system. This way, allocation of objects belong together can be arranged to be in reasonably close proximity.
Classes in C++ always have a fixed size. Therefore vector and string can only contain pointers to heap allocated memory* (although they contain typically more data then one pointer, since it also needs to store the length). Therefore the object itself always has a fixed length.
*For string this is not entirely correct. Often an optimization technique called short string optimization is used. In that case small strings are embedded inside the object (in the place where otherwise the pointer to heap data would be stored) and heap memory is only allocated if the string is too long.
Yes -- using your words -- container types always fixed size themselves, and just contain pointers to their variable-sized data.
If we have vector<int> vi;, the size of vi is always fixed, sizeof(vector<int>) to be exact, irrespective of the number of int's in vi.
does a string member of a class allocate a variable amount of space in that class?
No, it does not.
Or does it just allocate a pointer internally to some other space in memory?
No, it does not.
An std::string allocates wahtever sizeof(std::string) is.
Do not confuse
the size of an object
the size of the resources, that an object is responsible for.

Does the compiler copy a std::string into stack while passing it to a function in C++?

I have a simple question. I have a long std::string that I want to pass it to a function.
I wanna know that this string will be copy to stack then a copy of that will be passed or something like pointer will be passed and no additional space will be required?
(C++)
I have another little question: How much memory does an element of a string take?Just like char?
Yes, it will be deep copied, so use const reference is recommended.
void fun(const std::string & arg)
Typically std::string has 2 fields, a pointer pointing to dynamic allocated memory and the length, so it is 16+actual length on 64bit machines.
Spoiler Alert: My answer wont be that relevant, just an optimization technique.
If you dont want to duplicate the string, write your customized string class, which has two pointers or one pointer with size. In the past it has reduced me a lot of duplicates. This will work only as read-only and do a copy_on_write, i.e duplicate only if you encounter a write.
When passing an argument by value in C++ it is conceptually copied. Whether this copy really happens is another question, though, and depends on how the argument is passed and, to some extend, on the compiler: the compiler is explicitly allowed to elide certain copies, in particular copies of temporary objects. For example, when you return an object from a function and it us clear that the object will be returned, the copy is likely to be elided. Similarily, when passing the result of a function directly on to another function, it is likely not to be copied.
Beyond this C++ 2011 added another dimension of possibilities by supporting move constructors. These cover to some extend similar ground but also allow you to have better control: you can explicitly indicate that it would be acceptable for an object to be moved rather than being copied. Still, in no event will an object passed by reference.
With respect to the used bytes per element, the std::string uses just sizeof(cT) bytes (where cT is the character template argument of the std::basic_string). However, the string will overallocate the space in many cases and certainly when characters are added to the string. You can determine the overallocation by comparing size() and capacity() and control it to some extend with reserve() although this function isn't required of getting rid of any overallocation but the capacity() has to be at least as much as was last reserve()d. If the string is small (e.g. at most 15 characters) modern implementations won't make any allocation. This is called the string optimization.
With respect to the actual represention of the string: unless it is small it will use one word for the address of the storage, one word each for the the size and the capacity, and for strings with stateful allocators the size of the allocator (typically another word). Given alignment requirements this effectively means that in most cases the string will take four words in addition to the elements. Typically the small string optimization uses these words to store characters if the string firs there unless, of course, it needs to store a stateful allocator.

(Why) does an empty string have an address?

I guessed no, but this output of something like this shows it does
string s="";
cout<<&s;
what is the point of having empty string with an address ?
Do you think that should not cost any memory at all ?
Yes, every variable that you keep in memory has an address. As for what the "point" is, there may be several:
Your (literal) string is not actually "empty", it contains a single '\0' character. The std::string object that is created to contain it may allocate its own character buffer for holding this data, so it is not necessarily empty either.
If you are using a language in which strings are mutable (as is the case in C++), then there is no guarantee that an empty string will remain empty.
In an object-oriented language, a string instance with no data associated with it can still be used to call various instance methods on the string class. This requires a valid object instance in memory.
There is a difference between an empty string and a null string. Sometimes the distinction can be important.
And yes, I very much agree with the implementation of the language that an "empty" variable should still exist in and consume memory. In an object-oriented language an instance of an object is more than just the data that it stores, and there's nothing wrong with having an instance of an object that is not currently storing any actual data.
Following your logic, int i; would also not allocate any memory space, since you are not assigning any value to it. But how is it possible then, that this subsequent operation i = 10; works after that?
When you declare a variable, you are actually allocating memory space of a certain size (depending on the variable's type) to store something. If you want to use this space right way or not is up to you, but the declaration of the variable is what triggers memory allocation for it.
Some coding practices say you shouldn't declare a variable until the moment you need to use it.
An 'empty' string object is still an object - there may be more to its internal implementation than just the memory required to store the literal string itself. Besides that, most C-style strings (like the ones used in C++) are null-terminated, meaning even that "empty" string still uses one byte for the terminator.
Every named object in C++ has an address. There is even a specific requirement that the size of every type be at least 1 so that T[N] and T[N+1] are different, or so that in T a, b; both variables have distinct addresses.
In your case, s is a named object of type std::string, so it has an address. The fact that you constructed s from a particular value is immaterial. What matters is that s has been constructed, so it is an object, so it has an address.
s is a string object so it has an address. It has some internal data structures keeping track of the string. For example, current length of the string, current storage reserved for string, etc.
More generally, the C++ standard requires all objects to have a nonzero size. This helps ensure that every object has a unique address.
9 Classes
Complete objects and member subobjects of class type shall have nonzero size.
In C++, all classes are a specific, unchanging size. (varying by compiler and library, but specific at compile-time.) The std::string usually consists of a pointer, a length of allocation, and a length used. That's ~12 bytes, no matter how long the string is, and you have allocated std::string s on the call stack. When you display the address of the std::string, cout displays the location of the std::string in memory.
If the string doesn't point at anything, it won't allocate any space from the heap, which is like what you're thinking. But, all c-strings end in a trailing NULL, so the c-string "" is one character long, not zero. This means when you assign the c-string "" to the std::string, the std::string allocates 1 (or more) bytes, and assigns it the value of the trailing NULL character (usually zero '\0').
If there truly was no point to the empty string, then the programmer would not write the instruction at all. The language is loyal and trusting! And will never assume memory you allocate to be "wasted". Even if you are lost and heading over a cliff, it will hold your hand to the bitter end.
I think it'd be interesting to know, just as a curiosity though, that if you create a variable that isn't 'used' later, such as your empty string, the compiler may very well optimize it away so it incurs no cost to begin with. I guess compilers aren't as trusting...

c++ maximum std::string length is dictated by stack size or heap size?

as asked in the question.
std::string myVar; the maximum character it can hold is dictated by stack or heap?
Thank you
By default, the memory allocated for std::string is allocated dynamically.
Note that std::string has a max_size() function returning the maximum number of character supported by the implementation. The usefulness of this is questionable, though, as it's a implementation maximum, and doesn't take into consideration other resources, like memory. Your real limit is much lower. (Try allocating 4GB of contiguous memory, or take into account memory exhaustion elsewhere.)
A std::string object will be allocated the same way an int or any other type must be: on the stack if it's a local variable, or it might be static, or on the heap if new std::string is used or new X where X contains the string etc..
But, that std::string object may contain at least a pointer to additional memory provided by the allocator with which basic_string<> was instantiated - for the std::string typedef that means heap-allocated memory. Either directly in the original std::string object memory or in pointed-to heap you can expect to find:
a string size member,
possibly some manner of reference counter or links,
the textual data the string stores (if any)
Some std::string implementations have "short string" optimisations where they pack strings of only a few characters directly into the string object itself (for memory efficiency, often using some kind of union with fields that are used for other purposes when the strings are longer). But, for other string implementations, and even for those with short-string optimisations when dealing with strings that are too long to fit directly in the std::string object, they will have to follow pointers/references to the textual data which is stored in the allocator-provided (heap) memory.

Access Violation Using memcpy or Assignment to an Array in a Struct

Update 2:
Well I’ve refactored the work-around that I have into a separate function. This way, while it’s still not ideal (especially since I have to free outside the function the memory that is allocated inside the function), it does afford the ability to use it a little more generally. I’m still hoping for a more optimal and elegant solution…
Update:
Okay, so the reason for the problem has been established, but I’m still at a loss for a solution.
I am trying to figure out an (easy/effective) way to modify a few bytes of an array in a struct. My current work-around of dynamically allocating a buffer of equal size, copying the array, making the changes to the buffer, using the buffer in place of the array, then releasing the buffer seems excessive and less-than optimal. If I have to do it this way, I may as well just put two arrays in the struct and initialize them both to the same data, making the changes in the second. My goal is to reduce both the memory footprint (store just the differences between the original and modified arrays), and the amount of manual work (automatically patch the array).
Original post:
I wrote a program last night that worked just fine but when I refactored it today to make it more extensible, I ended up with a problem.
The original version had a hard-coded array of bytes. After some processing, some bytes were written into the array and then some more processing was done.
To avoid hard-coding the pattern, I put the array in a structure so that I could add some related data and create an array of them. However now, I cannot write to the array in the structure. Here’s a pseudo-code example:
main() {
char pattern[]="\x32\x33\x12\x13\xba\xbb";
PrintData(pattern);
pattern[2]='\x65';
PrintData(pattern);
}
That one works but this one does not:
struct ENTRY {
char* pattern;
int somenum;
};
main() {
ENTRY Entries[] = {
{"\x32\x33\x12\x13\xba\xbb\x9a\xbc", 44}
, {"\x12\x34\x56\x78", 555}
};
PrintData(Entries[0].pattern);
Entries[0].pattern[2]='\x65'; //0xC0000005 exception!!! :(
PrintData(Entries[0].pattern);
}
The second version causes an access violation exception on the assignment. I’m sure it’s because the second version allocates memory differently, but I’m starting to get a headache trying to figure out what’s what or how to get fix this. (I’m currently working around it by dynamically allocating a buffer of the same size as the pattern array, copying the pattern to the new buffer, making the changes to the buffer, using the buffer in the place of the pattern array, and then trying to remember to free the—temporary—buffer.)
(Specifically, the original version cast the pattern array—+offset—to a DWORD* and assigned a DWORD constant to it to overwrite the four target bytes. The new version cannot do that since the length of the source is unknown—may not be four bytes—so it uses memcpy instead. I’ve checked and re-checked and have made sure that the pointers to memcpy are correct, but I still get an access violation. I use memcpy instead of str(n)cpy because I am using plain chars (as an array of bytes), not Unicode chars and ignoring the null-terminator. Using an assignment as above causes the same problem.)
Any ideas?
It is illegal to attempt to modify string literals. Your
Entries[0].pattern[2]='\x65';
line attempts exactly that. In your second example you are not allocating any memory for the strings. Instead, you are making your pointers (in the struct objects) to point directly at string literals. And string literals are not modifiable.
This question gets asked several times every day. Read Why is this string reversal C code causing a segmentation fault? for more details.
The problem boils down to the fact that a char[] is not a char*, even if the char[] acts a lot like a char* in expressions.
Other answers have addressed the reason for the error: you're modifying a string literal which is not allowed.
This question is tagged C++ so the easy way to solve your problem is to use std::string.
struct ENTRY {
std::string pattern;
int somenum;
};
Based on your updates, your real problem is this: You want to know how to initialize the strings in your array of structs in such a way that they're editable. (The problem has nothing to do with what happens after the array of structs is created -- as you show with your example code, editing the strings is easy enough if they're initialized correctly.)
The following code sample shows how to do this:
// Allocate the memory for the strings, on the stack so they'll be editable, and
// initialize them:
char ptn1[] = "\x32\x33\x12\x13\xba\xbb\x9a\xbc";
char ptn2[] = "\x12\x34\x56\x78";
// Now, initialize the structs with their char* pointers pointing at the editable
// strings:
ENTRY Entries[] = {
{ptn1, 44}
, {ptn2, 555}
};
That should work fine. However, note that the memory for the strings is on the stack, and thus will go away if you leave the current scope. That's not a problem if Entries is on the stack too (as it is in this example), of course, since it will go away at the same time.
Some Q/A on this:
Q: Why can't we initialize the strings in the array-of-structs initialization? A: Because the strings themselves are not in the structs, and initializing the array only allocates the memory for the array itself, not for things it points to.
Q: Can we include the strings in the structs, then? A: No; the structs have to have a constant size, and the strings don't have constant size.
Q: This does save memory over having a string literal and then malloc'ing storage and copying the string literal into it, thus resulting in two copies of the string, right? A: Probably not. When you write
char pattern[] = "\x12\x34\x56\x78";
what happens is that that literal value gets embedded in your compiled code (just like a string literal, basically), and then when that line is executed, the memory is allocated on the stack and the value from the code is copied into that memory. So you end up with two copies regardless -- the non-editable version in the source code (which has to be there because it's where the initial value comes from), and the editable version elsewhere in memory. This is really mostly about what's simple in the source code, and a little bit about helping the compiler optimize the instructions it uses to do the copying.