Get size of a std::string's string in bytes - c++

I would like to get the bytes a std::string's string occupies in memory, not the number of characters. The string contains a multibyte string. Would std::string::size() do this for me?
EDIT: Also, does size() also include the terminating NULL?

std::string operates on bytes, not on Unicode characters, so std::string::size() will indeed return the size of the data in bytes (without the overhead that std::string needs to store the data, of course).
No, std::string stores only the data you tell it to store (it does not need the trailing NULL character). So it will not be included in the size, unless you explicitly create a string with a trailing NULL character.

You could be pedantic about it:
std::string x("X");
std::cout << x.size() * sizeof(std::string::value_type);
But std::string::value_type is char and sizeof(char) is defined as 1.
This only becomes important if you typedef the string type (because it may change in the future or because of compiler options).
// Some header file:
typedef std::basic_string<T_CHAR> T_string;
// Source a million miles away
T_string x("X");
std::cout << x.size() * sizeof(T_string::value_type);

std::string::size() is indeed the size in bytes.

To get the amount of memory in use by the string you would have to sum the capacity() with the overhead used for management. Note that it is capacity() and not size(). The capacity determines the number of characters (charT) allocated, while size() tells you how many of them are actually in use.
In particular, std::string implementations don't usually *shrink_to_fit* the contents, so if you create a string and then remove elements from the end, the size() will be decremented, but in most cases (this is implementation defined) capacity() will not.
Some implementations might not allocate the exact amount of memory required, but rather obtain blocks of given sizes to reduce memory fragmentation. In an implementation that used power of two sized blocks for the strings, a string with size 17 could be allocating as much as 32 characters.

Yes, size() will give you the number of char in the string. One character in multibyte encoding take up multiple char.

There is inherent conflict in the question as written: std::string is defined as std::basic_string<char,...> -- that is, its element type is char (1-byte), but later you stated "the string contains a multibyte string" ("multibyte" == wchar_t?).
The size() member function does not count a trailing null. It's value represents the number of characters (not bytes).
Assuming you intended to say your multibyte string is std::wstring (alias for std::basic_string<wchar_t,...>), the memory footprint for the std::wstring's characters, including the null-terminator is:
std::wstring myString;
...
size_t bytesCount = (myString.size() + 1) * sizeof(wchar_t);
It's instructive to consider how one would write a reusable template function that would work for ANY potential instantiation of std::basic_string<> like this**:
// Return number of bytes occupied by null-terminated inString.c_str().
template <typename _Elem>
inline size_t stringBytes(const std::basic_string<typename _Elem>& inString, bool bCountNull)
{
return (inString.size() + (bCountNull ? 1 : 0)) * sizeof(_Elem);
}
** For simplicity, ignores the traits and allocator types rarely specified explicitly for std::basic_string<> (they have defaults).

Related

Memory efficiency of C++ arrays

Somewhere in my brainstem a voice whispers:
In C++, an array does not need more memory than the number of elements
need.
std::string str = "aabbcc";
std::array<std::string, 3> str_array = {"aa", "bb", "cc"};
Accordingly, both should have the same size, because (unlike in Java), there is no separate size field or similar. But I haven't found a reference.
Is this true? Under which circumstances is it not?
Storing strings in any language is more complicated than you think. A C++ std::string must provide you contiguous storage for the contents. Apart from that, std::string can hold more things, like pointer/iterator to the last character, number of characters in it, etc. std::string::size is required to be O(1), so it must store more information than just a buffer. Also, most standard library implementations provide SSO (small string optimization). When SSO is enabled, std::string allocates a small buffer, to avoid unneccessary dynamic allocations. You can also reserve more memory than you need. Lets say, you need to collect 800-1000 characters in loop. You can do it like this:
std::string str;
for(...)
str += some_character;
But this will cause unneccessary memory allocations and deallocations. If you can estimate number of characters you want to store, you should reserve memory.
std::string str;
str.reserve(1000);
for(...)
str.push_back(some_character);
Then, you can always shrink_to_fit, to save memory:
str.shrink_to_fit();
There are also other things you must be aware of:
reserve increases capacity, but size stays the same. It means, that std::string must also store (or be able to calculate) for how many more characters buffer capacity allows.
string literals are null terminated
std::basic_string::c_str must return null terminated array of characters, so it is possible that std::string also contains null terminator (unluckily I am not sure how it is done)
there are more encodings and characters sets - ASCII is just one of them. UTF-8 and UTF-16 encoded strings may need to use few stored elements to add up to one code point, but this is more complicated.
In C++, an array does not need more memory than the number of elements need.
This is true. A raw array has a size equal to the size of it's element type times the number of elements. So,
int array[10];
has a size of sizeof(int) * std::size(array). std::array is the same but it is allowed to have padding so
std::array<int, 10> array;
has the size of sizeof(int) * std::size(array) + P where P is some integer amount of padding.
Your example though isn't quite the same thing.
A std::string is a container. It has it's own size that is separate of what it contains. So sizeof(std::string) will always be the same thing regardless of how many characters are in the string. So ignoring short string optimization
std::string str = "aabbcc";
Takes of sizeof(std::string) plus however much the string allocated for the underlying c-string. That is not the same value as
std::array<std::string, 3> str_array = {"aa", "bb", "cc"};
Since you now have 3 * sizeof(std::string) plus whatever each string allocated.
Accordingly, both should have the same size (e.g. 6 bytes),
Not a correct deduction.
The memory used by a std::string, if you want to call its size, consists of at least a pointer and the memory allocated to hold the data.
The memory allocated to hold the data can also include the space required to hold the terminating null character.
Given
std::string s = "aabbcc";
std::string a = "aa";
std::string b = "bb";
std::string c = "cc";
mem(s) != mem(a) + mem(b) + mem(c)
Virtually every string can hold following info:
The size of the string i.e. num of chars it contains.
The capacity of memory holding the string's chars.
The value of the string.
Additionally it may also hold:
A copy of it's allocator and reference count for the value.
They don’t have the same size. Strings are saved null-terminated, giving you an extra byte for each string.

Strncpy should only be used with fixed length arrays

According to this StackOverflow comment strncpy should never be used with a non-fixed length array.
strncpy should never be used unless you're working with fixed-width, not-necessarily-terminated string fields in structures/binary files. – R.. Jan 11 '12 at 16:22
I understand that it is redundant if you are dynamically allocating memory for the string but is there a reason why it would be bad to use strncpy over strcpy
strncpy will copy data up to the limit you specify--but if it reaches that limit before the end of the string, it'll leave the destination unterminated.
In other words, there are two possibilities with strncpy. One is that you get behavior precisely like strcpy would have produced anyway (except slower, since it fills the remainder of the destination buffer with NULs, which you virtually never actually want or care about). The other is that it produces a result you generally can't put to any real use.
If you want to copy a string up to a maximum length into a fixed-length buffer, you can (for example) use sprintf to do the job:
char buffer[256];
sprintf(buffer, "%255s", source);
Unlike strncpy, this always zero-terminates the result, so the result is always usable as a string.
If you don't want to use sprintf (or similar), I'd advise just writing a function that actually does what you want, something on this general order:
void copy_string(char const *dest, char const *source, size_t max_len) {
size_t i;
for (i=0; i<max_len-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
Since you've tagged this as C++ (in addition to C): my advice would be to generally avoid this whole mess in C++ by just using std::string.
If you really have to work with NUL-terminated sequences in C++, you might consider another possibility:
template <size_t N>
void copy_string(char const (&dest)[N], char const *source) {
size_t i;
for (i=0; i<N-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
This only works when the destination is an actual array (not a pointer), but for that case, it gets the compiler to deduce the size of the array, instead of requiring the user to pass it explicitly. This will generally make the code a tiny bit faster (less overhead in the function call) and much harder to screw up and pass the wrong size.
The argument against using strncpy is that it does not guarentee that your string will be null terminated.
The less error prone way to copy a string in C when using non-fixed length arrays is to use snprintf which does guarentee null termination of your string.
A good Blog Post Commenting on *n* functions.
These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer.
Which means that the use of strncpy and other such functions when not dealing with fixed arrays introduces unnessisary risk of non-null terminated strings which can be time-bombs in your code.
char * strncpy ( char * destination, const char * source, size_t num );
Limitations of strncpy():
It doesn't put a null-terminator on the destination string if it is completely filled. And, no null-character is implicitly appended at the end of destination if source is longer than num.
If num is greater than the length of source string, the destination string is padded with null characters up to num length.
Like strcpy, it is not a memory-safe operation. Because it does not check for sufficient space in destination before it copies source, it is a potential cause of buffer overruns.
Refer: Why should you use strncpy instead of strcpy?
We have 2 versions for copy string from one to another
1> strcpy
2> strncpy
These two versions is used for fixed and non-fixed length array. The strcpy don't check the upper bound for destination string when copy string, strncpy will check it. When the destination string is reached to this upper bound, the function strncpy will return error code, in the meantime the function strcpy will cause some effect in memory of the current process and terminate the process immediately. So that the strncpy is more secure than strcpy

Not sure why I am getting different lengths when using a string or a char

When I call gethostname using a char my length 25 but when I use a string my length is 64. Not really sure why. Both of them I am declaring the same size on HOST_NAME_MAX.
char hostname[HOST_NAME_MAX];
BOOL host = gethostname(hostname, sizeof hostname);
expectedComputerName = hostname;
int size2 = expectedComputerName.length();
std::string test(HOST_NAME_MAX, 0);
host = gethostname(&test[0], test.length());
int testSize = test.length();
An std::string object can contain NULs (i.e. '\0' characters). You are storing the name in the first bytes of a string object that was created with a size of HOST_NAME_MAX length.
Storing something in the beginning of the string data won't change the length of the string that remains therefore HOST_NAME_MAX.
When creating a string from a char pointer instead the std::string object created will contain up to, but excluding, the first NUL character (0x00). The reason is that a C string cannot contain NULs because the first NUL is used to mark the end of the string.
Consider what you're doing in each case. In the former code snippet, you're declaring a character array capable of holding HOST_NAME_MAX-1 characters (1 for the null terminator). You then load some string data into that buffer via the call to gethostname and then print out the length of buffer by assigning it to a std::string object using std::string::operator= that takes a const char *. One of the effects of this is that it will change an internal size variable of std::string to be strlen of the buffer, which is not necessarily the same as HOST_NAME_MAX. A call to std::string::length simply returns that variable.
In the latter case, you're using the std::string constructor that takes a size and initial character to construct test. This constructor sets the internal size variable to whatever size you passed in, which is HOST_NAME_MAX. The fact that you then copy in some data to std::strings internal buffer has no bearing on its size variable. As with the other case, a call to the length() member function simply returns the size - which is HOST_NAME_MAX - regardless of whether or not the actual length of the underlying buffer is smaller than HOST_NAME_MAX.
As #MattMcNabb mentioned in the comments, you could fix this by:
test.resize( strlen(test.c_str()) );
Why might you want to do this? Consistency with the char buffer approach might be a reason, but another reason may be performance oriented. In the latter case you're not only outright setting the length of the string to HOST_NAME_MAX, but also its capacity (omitting the SSO for brevity), which you can find starting on line 242 of libstdc++'s std::string implementation. What this means in terms of performance is that even though only, say, 25 characters are actually in your test string, the next time you append to that string (via +=,std::string::append,etc), it's more than likely to have to reallocate and grow the string, as shown here, because the internal size and internal capacity are equal. Following #MattMcNabb's suggestion, however, the string's internal size is reduced down to the length of the actual payload, while keeping the capacity the same as before, and you avoid the almost immediate re-growth and re-copy of the string, as shown here.

Count of bytes in std::string [duplicate]

Im using BSPlib and I want to use the bsp_put which requires me to set the size of the string I'm sending.
Even if you aren't familiar with BSP, this is not closely related. thanks.
Multiply the number of characters (given by size(), or capacity() if you want to know the total amount allocated rather than the amount in use) by the size of the character type.
If it's std::string itself, an alias for std::basic_string<char>, then the character size is one byte, so size() (or capacity()) alone will do.
strlen returns the length of string of a plain C string.
A C string is as long as the amount of characters between the beginning of the string and the terminating null character.
If you're using the String object you can use the length or size method of the object:
http://www.cplusplus.com/reference/string/string/length/
The number of characters in a std::string can be had by the "size()" member of std::string.
std::string s = "Hey, look, I'm a string!"
std::string::size_type len = s.size();
std::cout << "My string is " << len << "characters long." << std::endl;
As people have pointed out, you cannot rely upon the memory organization of std::string, except for two cases: std::string::data() and std::string::c_str(). Each of these functions return a pointer to contiguous memory, which memory holds the same characters as the string. (The memory may or may not point to the real string, but it doesn't matter, you can't write to it anyway.) The difference between the two calls is whether the memory has a terminating null byte: data() has no terminating character, c_str() does.
// assuming that bsp_put_bytes takes a pointer & len
bsp_put_bytes(s.data(), s.size());
// and bsp_put_string takes a C-style string
bsp_put_string(s.c_str());
Carefully read the caveats in the links I gave you, including the valid lifetime of the pointed-to characters.
std::string myString("this is the text of my string");
const char *copyOfString = strdup(myString.c_str());
size_t myStringLength = strlen(copyOfString);
free(copyOfString);
That's probably the most efficient way of getting the length of the string. Let me know how impressed your coworkers are when you show them your new solution using this example.

How to get the number of bytes occupied by a specific string in the program?

Im using BSPlib and I want to use the bsp_put which requires me to set the size of the string I'm sending.
Even if you aren't familiar with BSP, this is not closely related. thanks.
Multiply the number of characters (given by size(), or capacity() if you want to know the total amount allocated rather than the amount in use) by the size of the character type.
If it's std::string itself, an alias for std::basic_string<char>, then the character size is one byte, so size() (or capacity()) alone will do.
strlen returns the length of string of a plain C string.
A C string is as long as the amount of characters between the beginning of the string and the terminating null character.
If you're using the String object you can use the length or size method of the object:
http://www.cplusplus.com/reference/string/string/length/
The number of characters in a std::string can be had by the "size()" member of std::string.
std::string s = "Hey, look, I'm a string!"
std::string::size_type len = s.size();
std::cout << "My string is " << len << "characters long." << std::endl;
As people have pointed out, you cannot rely upon the memory organization of std::string, except for two cases: std::string::data() and std::string::c_str(). Each of these functions return a pointer to contiguous memory, which memory holds the same characters as the string. (The memory may or may not point to the real string, but it doesn't matter, you can't write to it anyway.) The difference between the two calls is whether the memory has a terminating null byte: data() has no terminating character, c_str() does.
// assuming that bsp_put_bytes takes a pointer & len
bsp_put_bytes(s.data(), s.size());
// and bsp_put_string takes a C-style string
bsp_put_string(s.c_str());
Carefully read the caveats in the links I gave you, including the valid lifetime of the pointed-to characters.
std::string myString("this is the text of my string");
const char *copyOfString = strdup(myString.c_str());
size_t myStringLength = strlen(copyOfString);
free(copyOfString);
That's probably the most efficient way of getting the length of the string. Let me know how impressed your coworkers are when you show them your new solution using this example.