String class e char in C++ - c++

When I have
char anything[20];
cout << sizeof anything;
it prints 20.
However
string anymore;
cout << sizeof anymore; // it prints 4
getline(cin, anymore); // let's suppose I type more than one hundred characters
cout << sizeof anymore; // it still prints 4 !
I would like to understand how c++ manages this. Thanks

sizeof is a compile-time construct. It has nothing to do with runtime, but rather gives a fixed result based on the type passed to it (or the type of the value passed to it). So char[20] is 20 bytes, but a string might be 4 or 8 bytes or whatever depending on the implementation. The sizeof isn't telling you how much storage the string allocated dynamically to hold its contents.

sizeof is a compile-time operator. It tells you the size of the type.

It's because of anything - is array with 20 characters. Sizeof of each character 1 byte - so, totally 20 bytes.
And string class contain pointer for the begin of char-array and size_t(unsigned int for example) - it's 4 bytes. sizeof doesn't know how many memory you allocated for the string, it know just that you have pointer for something, because it's compile-time function.

sizeof isn't what you have decided that it should be. It doesn't magically perceive the semantics of whatever type you throw at it. All it knows is how many bytes are used up, directly for storing the instance of a type.
For an array of five characters, that's 5. For a pointer (to anything, including an array), that's usually 4 or 8. For an std::string, it's however many bytes your C++ Standard Library implementation happens to need to do its work. This work usually involves dynamic allocation, so the four bytes you're looking likely represent just enough storage for a pointer.
It is not to be confused with specific "size" semantics. For std::string, that's anymore.length(), which uses whatever internal magic is required to calculate the length of the buffer of characters that it's stored somewhere, possibly (and usually) indirectly.
For what it's worth, I'm very surprised that a std::string could take up only four bytes. I'd expect it'd store at least "length" and a pointer, which is usually going to take more than four bytes.

The 'string' type is a class template. String instances are accessed by references, AKA pointers. 4 bytes on 32-bit systems.

Related

Size of string data type in C++ array

This is a very simple code in C++. The address of the strings are separated by a constant gap of 28 bytes. What does these 28 bytes contains. I am trying to find an analogy with the gap of 4 bytes of an array containing integers. As far as I know the 4 bytes ensures the upper limit of the value of an integer that can be reached. What happens in case of the 28 bytes. Does it really contain 28*8 bits of character data - I do not believe that. I have tried giving in a large text of data, and it still prints without any issues.
string str[3] = { "a", "b", "c" };
for (int i = 0; i < 3; ++i) {
cout << &str[i] << endl;
}
What does these 28 bytes contains.
It contains the object of type string. We don't know any more unless we know how you have defined that type.
If string is an alias of std::string, then it is a class defined by the standard library. The exact contents and thus the exact size depend on and vary between standard library implementations, and the target architecture.
If we consider what some implementation might do in practice:
Does it really contain 28*8 bits of character data - I do not believe that.
Believe it or not, (modern) string implementations really do contain ~ sizeof(string) (sans potential overhead) bytes of character data when those characters fit on that space.
They use advanced tricks to change the internal layout to support longer strings. For those, they use pointers. Typically, there would be a pointer to beginning, pointer to end of string (storing offset is another option) and pointer (or offset) to the end of dynamic storage. This representation is essentially identical to a vector.
If you read the standard library headers that you use, you'll find the exact definition of the class there.

Writing a string through serial port (c++ Linux) [duplicate]

This is a rather simple problem but is pretty confusing.
string R = "hhhh" ;
cout<< sizeof( R )<<endl;
OUTPUT:
4
Variation:
string R = "hhuuuuuuhh" ;
cout<< sizeof( R )<
OUTPUT2:
4
What is going wrong ? Should I use char array instead ?
Think of sizeof being compile-time evaluable. It evaluates to the size of the type, not the size of the contents. You can even write sizeof(std::string) which will be exactly the same as sizeof(foo) for any std::string instance foo.
To compute the number of characters in a std::string, use size().
If you have a character array, say char c[6] then the type of c is an array of 6 chars. So sizeof(c) (known at compile-time) will be 6 as the C++ standard defines the size of a single char to be 1.
sizeof expression returns the size required for storage of the type expression evaluates to (see http://en.cppreference.com/w/cpp/language/sizeof). In case of std::string, this contains a pointer to the data (and possibly a buffer for small strings), but not the data itself, so it doesn't (and can't) depend on string length.
Your string variable will consist of a part most often stored on the stack, which has fixed dimensions. The size of this part is what's returned by sizeof (). Inside this fixed part is a pointer (or reference) to a part stored on the heap, which actually contain your characters and has a varying size. However the size of this part is only known at runtime, while sizeof () is computed at compile time.
You may wonder why. Things like this are both the strength and the weakness of C++. C++ is a totally different beast from e.g. languages like Python and C#. While the latter languages can produce all kinds of dynamically changing meta-data (like the size or type of a variable), the price that is paid is that they're all slow. C++, while being a bit 'spartan', can run rings around such languages. In fact most 'dynamic' languages are in fact implemented (programmed) in C/C++.

Which is more efficient way for storing strings, an array or a pointer in C/C++? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 6 years ago.
We can store string using 2 methods.
Method 1: using array
char a[]="str";
Method 2:
char *b="str";
In method 1 the memory is used only in storing the string "str" so the memory used is 4 bytes.
In method 2 the memory is used in storing the string "str" on 'Read-Only-Memory' and then in storing the pointer to the 1st character of the string.
So the memory used must be 4 bytes for storing string in ROM and then 8 bytes for storing pointer (in 64-bit machine) to the first character.
In total the 1st method uses 4 bytes and the method 2 uses 12 bytes. So is the method 1 always better than method 2 for storing strings in C/C++.
Except if you use a highly resource limited system, you should not care too much for the memory used by a pointer. Anyway, optimizing compilers could lead to same code in both cases.
You should care more about Undefined Behaviour in second case!
char a[] = "str";
correctly declares a non const character array which is initialized to "str". That means that a[0] = 'S'; is perfectly allowed and will change a to "Str".
But with
char *b = "str";
you declare a non const pointer to a litteral char array which is implicitely const. That means that b[0] = 'S'; tries to modify a litteral string and is Undefined Behaviour => it can work, segfault or anything in between including not changing the string.
All of the numbers that you cite, and the type of the memory where the string literal is stored are platform specific.
Which is more efficient way for storing strings, an array or a pointer
Some pedantry about terminology: A pointer can not store a string; it stores an address. The string is always stored in an array, and a pointer can point to it. String literals in particular are stored in an array of static storage duration.
Method 1: using array char a[]="str";
This makes a copy of the content of the string literal, into a local array of automatic storage duration.
Method 2: char *b="str";
You may not bind a non const pointer to a string literal in standard C++. This is ill-formed in that language (since C++11; prior to that the conversion was merely deprecated). Even in C (and extensions of C++) where this conversion is allowed, this is quite dangerous, since you might accidentally pass the pointer to a function that might try to modify the pointed string. Const correctness replaces accidental UB with a compile time error.
Ignoring that, this doesn't make a copy of the literal, but points to it instead.
So is the method 1 always better than method 2 for storing strings in C/C++.
Memory use is not the only metric that matters. Method 1 requires copying of the string from the literal into the automatic array. Not making copies is usually faster than making copies. This becomes more and more important with increasingly longer strings.
The major difference between method 1 and 2, are that you may modify the local array of method 1 but you may not modify string literals. If you need a modifiable buffer, then method 2 doesn't give you that - regardless of its efficiency.
Additional considerations:
Suppose your system is not a RAM-based PC computer but rather a computer with true non-volatile memory (NVM), such as a microcontroller. The string literal "str" would then in both cases get stored in NVM.
In the array case, the string literal has to be copied down from NVM in run-time, whereas in the pointer case you don't have to make a copy, you can just point straight at the string literal.
This also means that on such systems, assuming 32 bit, the array version will occupy 4 bytes of RAM for the array, while the pointer version will occupy 4 bytes of RAM for the pointer. Both cases will have to occupy 4 bytes of NVM for the string literal.

Why sizeof() of a string variable always return the same number even when content changes?

This is a rather simple problem but is pretty confusing.
string R = "hhhh" ;
cout<< sizeof( R )<<endl;
OUTPUT:
4
Variation:
string R = "hhuuuuuuhh" ;
cout<< sizeof( R )<
OUTPUT2:
4
What is going wrong ? Should I use char array instead ?
Think of sizeof being compile-time evaluable. It evaluates to the size of the type, not the size of the contents. You can even write sizeof(std::string) which will be exactly the same as sizeof(foo) for any std::string instance foo.
To compute the number of characters in a std::string, use size().
If you have a character array, say char c[6] then the type of c is an array of 6 chars. So sizeof(c) (known at compile-time) will be 6 as the C++ standard defines the size of a single char to be 1.
sizeof expression returns the size required for storage of the type expression evaluates to (see http://en.cppreference.com/w/cpp/language/sizeof). In case of std::string, this contains a pointer to the data (and possibly a buffer for small strings), but not the data itself, so it doesn't (and can't) depend on string length.
Your string variable will consist of a part most often stored on the stack, which has fixed dimensions. The size of this part is what's returned by sizeof (). Inside this fixed part is a pointer (or reference) to a part stored on the heap, which actually contain your characters and has a varying size. However the size of this part is only known at runtime, while sizeof () is computed at compile time.
You may wonder why. Things like this are both the strength and the weakness of C++. C++ is a totally different beast from e.g. languages like Python and C#. While the latter languages can produce all kinds of dynamically changing meta-data (like the size or type of a variable), the price that is paid is that they're all slow. C++, while being a bit 'spartan', can run rings around such languages. In fact most 'dynamic' languages are in fact implemented (programmed) in C/C++.

std::string implementation in GCC and its memory overhead for short strings

I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.
I tried using the following:
std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";
But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I'm well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.
Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?
Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):
char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes
Well, at least with GCC 4.4.5, which is what I have handy on this
machine, std::string is a typdef for std::basic_string<char>, and
basic_string is defined in
/usr/include/c++/4.4.5/bits/basic_string.h. There's a lot of
indirection in that file, but what it comes down to is that nonempty
std::strings store a pointer to one of these:
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
Followed in-memory by the actual string data. So std::string is
going to have at least three words of overhead for each string, plus
any overhead for having a higher capacity than `length (probably
not, depending on how you construct your strings -- you can check by
asking the capacity() method).
There's also going to be overhead from your memory allocator for doing
lots of small allocations; I don't know what GCC uses for C++, but
assuming it's similar to the dlmalloc allocator it uses for C, that
could be at least two words per allocation, plus some space to align
the size to a multiple of at least 8 bytes.
I'm going to guess you are on a 32 bit, 8 bit per byte platform. I'm also going to guess that at least on the gcc version you are using, that they are using a reference counted implementation for std::string. The 4 byte sizeof you see is a pointer to a structure containing the reference count and the string data (and any allocator state if applicable).
In this design of gcc's the only "short" string has size == 0, in which case it can share a representation with every other empty string. Otherwise you get a refcounted COW string.
To investigate this yourself, code up an allocator that keeps track of how much memory it allocates and deallocates, and how many times. Use this allocator to investigate the implementation of the container you're interested in.
If it's guaranteed that ">100,000 strings of 4-16 characters each", then don't use std::string. Instead, write your own ShortString class. It's interesting that "sizeof(std::string) == 4", how is that possible? What are sizeof(char) and sizeof(void *)?
I've performed some comparisons about std::string overhead. In general it is about 48 bytes! Take a look at the article on my blog:
http://jovislab.com/blog/?p=76