Count of bytes in std::string [duplicate] - c++

Im using BSPlib and I want to use the bsp_put which requires me to set the size of the string I'm sending.
Even if you aren't familiar with BSP, this is not closely related. thanks.

Multiply the number of characters (given by size(), or capacity() if you want to know the total amount allocated rather than the amount in use) by the size of the character type.
If it's std::string itself, an alias for std::basic_string<char>, then the character size is one byte, so size() (or capacity()) alone will do.

strlen returns the length of string of a plain C string.
A C string is as long as the amount of characters between the beginning of the string and the terminating null character.
If you're using the String object you can use the length or size method of the object:
http://www.cplusplus.com/reference/string/string/length/

The number of characters in a std::string can be had by the "size()" member of std::string.
std::string s = "Hey, look, I'm a string!"
std::string::size_type len = s.size();
std::cout << "My string is " << len << "characters long." << std::endl;
As people have pointed out, you cannot rely upon the memory organization of std::string, except for two cases: std::string::data() and std::string::c_str(). Each of these functions return a pointer to contiguous memory, which memory holds the same characters as the string. (The memory may or may not point to the real string, but it doesn't matter, you can't write to it anyway.) The difference between the two calls is whether the memory has a terminating null byte: data() has no terminating character, c_str() does.
// assuming that bsp_put_bytes takes a pointer & len
bsp_put_bytes(s.data(), s.size());
// and bsp_put_string takes a C-style string
bsp_put_string(s.c_str());
Carefully read the caveats in the links I gave you, including the valid lifetime of the pointed-to characters.

std::string myString("this is the text of my string");
const char *copyOfString = strdup(myString.c_str());
size_t myStringLength = strlen(copyOfString);
free(copyOfString);
That's probably the most efficient way of getting the length of the string. Let me know how impressed your coworkers are when you show them your new solution using this example.

Related

in the C ++ stl, does the string container actually contain a string with a closing 0? [duplicate]

Will the below string contain the null terminator '\0'?
std::string temp = "hello whats up";
No, but if you say temp.c_str() a null terminator will be included in the return from this method.
It's also worth saying that you can include a null character in a string just like any other character.
string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';
prints
5 5
and not 5 1 as you might expect if null characters had a special meaning for strings.
Not in C++03, and it's not even guaranteed before C++11 that in a C++ std::string is continuous in memory. Only C strings (char arrays which are intended for storing strings) had the null terminator.
In C++11 and later, mystring.c_str() is equivalent to mystring.data() is equivalent to &mystring[0], and mystring[mystring.size()] is guaranteed to be '\0'.
In C++17 and later, mystring.data() also provides an overload that returns a non-const pointer to the string's contents, while mystring.c_str() only provides a const-qualified pointer.
This depends on your definition of 'contain' here. In
std::string temp = "hello whats up";
there are few things to note:
temp.size() will return the number of characters from first h to last p (both inclusive)
But at the same time temp.c_str() or temp.data() will return with a null terminator
Or in other words int(temp[temp.size()]) will be zero
I know, I sound similar to some of the answers here but I want to point out that size of std::string in C++ is maintained separately and it is not like in C where you keep counting unless you find the first null terminator.
To add, the story would be a little different if your string literal contains embedded \0. In this case, the construction of std::string stops at first null character, as following:
std::string s1 = "ab\0\0cd"; // s1 contains "ab", using string literal
std::string s2{"ab\0\0cd", 6}; // s2 contains "ab\0\0cd", using different ctr
std::string s3 = "ab\0\0cd"s; // s3 contains "ab\0\0cd", using ""s operator
References:
https://akrzemi1.wordpress.com/2014/03/20/strings-length/
http://en.cppreference.com/w/cpp/string/basic_string/basic_string
Yes if you call temp.c_str(), then it will return null-terminated c-string.
However, the actual data stored in the object temp may not be null-terminated, but it doesn't matter and shouldn't matter to the programmer, because when then programmer wants const char*, he would call c_str() on the object, which is guaranteed to return null-terminated string.
With C++ strings you don't have to worry about that, and it's possibly dependent of the implementation.
Using temp.c_str() you get a C representation of the string, which will definitely contain the \0 char. Other than that, i don't really see how it would be useful on a C++ string
std::string internally keeps a count of the number of characters. Internally it works using this count. Like others have said, when you need the string for display or whatever reason, you can its c_str() method which will give you the string with the null terminator at the end.

Is it Safe to strncpy Into a string That Doesn't Have Room for the Null Terminator?

Consider the following code:
const char foo[] = "lorem ipsum"; // foo is an array of 12 characters
const auto length = strlen(foo); // length is 11
string bar(length, '\0'); // bar was constructed with string(11, '\0')
strncpy(data(bar), foo, length);
cout << data(bar) << endl;
My understanding is that strings are always allocated with a hidden null element. If this is the case then bar really allocates 12 characters, with the 12th being a hidden '\0' and this is perfectly safe... If I'm wrong on that then the cout will result in undefined behavior because there isn't a null terminator.
Can someone confirm for me? Is this legal?
There have been a lot of questions about why to use strncpy instead of just using the string(const char*, const size_t) constructor. My intent has been to make my toy code close to my actual code which contains a vsnprintf. Unfortunately even after getting excellent answers here I've found that vsnprintf doesn't behave the same as strncpy, and I've asked a follow up question here: Why is vsnprintf Not Writing the Same Number of Characters as strncpy Would?
This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3
In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.
So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.
There are two different things here.
First, does strncpy add an additional \0 in this instance (11 non-\0 elements to be copied in a string of size 11). The answer is no:
Copies at most count characters of the byte string pointed to by src (including the terminating null character) to character array pointed to by dest.
If count is reached before the entire string src was copied, the resulting character array is not null-terminated.
So the call is perfectly fine.
Then data() gives you a proper \0-terminated string:
c_str() and data() perform the same function. (since C++11)
So it seems that for C++11, you are safe. Whether the string allocates an additional \0 or not doesn't seems to be indicated in the documentation, but the API is clear that what you are doing is perfectly fine.
You have allocated an 11-character std::string. You are not trying to read nor write anything past that, so that part will be safe.
So the real question is whether you have messed up the internals of the string. Since you haven't done anything that isn't allowed, how would that be possible? If it's required for the string to internally keep a 12-byte buffer with a null padding at the end in order to fulfill its contract, that will be the case no matter what operations you performed.
Yes it's safe according to the char * strncpy(char* destination, const char* source, size_t num):
Copy characters from string
Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

Does std::string have a null terminator?

Will the below string contain the null terminator '\0'?
std::string temp = "hello whats up";
No, but if you say temp.c_str() a null terminator will be included in the return from this method.
It's also worth saying that you can include a null character in a string just like any other character.
string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';
prints
5 5
and not 5 1 as you might expect if null characters had a special meaning for strings.
Not in C++03, and it's not even guaranteed before C++11 that in a C++ std::string is continuous in memory. Only C strings (char arrays which are intended for storing strings) had the null terminator.
In C++11 and later, mystring.c_str() is equivalent to mystring.data() is equivalent to &mystring[0], and mystring[mystring.size()] is guaranteed to be '\0'.
In C++17 and later, mystring.data() also provides an overload that returns a non-const pointer to the string's contents, while mystring.c_str() only provides a const-qualified pointer.
This depends on your definition of 'contain' here. In
std::string temp = "hello whats up";
there are few things to note:
temp.size() will return the number of characters from first h to last p (both inclusive)
But at the same time temp.c_str() or temp.data() will return with a null terminator
Or in other words int(temp[temp.size()]) will be zero
I know, I sound similar to some of the answers here but I want to point out that size of std::string in C++ is maintained separately and it is not like in C where you keep counting unless you find the first null terminator.
To add, the story would be a little different if your string literal contains embedded \0. In this case, the construction of std::string stops at first null character, as following:
std::string s1 = "ab\0\0cd"; // s1 contains "ab", using string literal
std::string s2{"ab\0\0cd", 6}; // s2 contains "ab\0\0cd", using different ctr
std::string s3 = "ab\0\0cd"s; // s3 contains "ab\0\0cd", using ""s operator
References:
https://akrzemi1.wordpress.com/2014/03/20/strings-length/
http://en.cppreference.com/w/cpp/string/basic_string/basic_string
Yes if you call temp.c_str(), then it will return null-terminated c-string.
However, the actual data stored in the object temp may not be null-terminated, but it doesn't matter and shouldn't matter to the programmer, because when then programmer wants const char*, he would call c_str() on the object, which is guaranteed to return null-terminated string.
With C++ strings you don't have to worry about that, and it's possibly dependent of the implementation.
Using temp.c_str() you get a C representation of the string, which will definitely contain the \0 char. Other than that, i don't really see how it would be useful on a C++ string
std::string internally keeps a count of the number of characters. Internally it works using this count. Like others have said, when you need the string for display or whatever reason, you can its c_str() method which will give you the string with the null terminator at the end.

strncpy equivalent for std::string?

Is there an exact equivalent to strncpy in the C++ Standard Library? I mean a function, that copies a string from one buffer to another until it hits the terminating 0? For instance when I have to parse strings from an unsafe source, such as TCP packets, so I'm able to perform checks in length while coping the data.
I already searched a lot regarding this topic and I also found some interesting topics, but all of those people were happy with std::string::assign, which is also able to take a size of characters to copy as a parameter. My problem with this function is, that it doesn't perform any checks if a terminating null was already hit - it takes the given size serious and copies the data just like memcpy would do it into the string's buffer. This way there is much more memory allocated and copied than it had to be done, if there were such a check while coping.
That's the way I'm working around this problem currently, but there is some overhead I'd wish to avoid:
// Get RVA of export name
const ExportDirectory_t *pED = (const ExportDirectory_t*)rva2ptr(exportRVA);
sSRA nameSra = rva2sra(pED->Name);
// Copy it into my buffer
char *szExportName = new char[nameSra.numBytesToSectionsEnd];
strncpy(szExportName,
nameSra.pSection->pRawData->constPtr<char>(nameSra.offset),
nameSra.numBytesToSectionsEnd);
szExportName[nameSra.numBytesToSectionsEnd - 1] = 0;
m_exportName = szExportName;
delete [] szExportName;
This piece of code is part of my parser for PE-binaries (of the routine parsing the export table, to be exact). rva2sra converts a relative virtual address into a PE-section relative address. The ExportDirectory_t structure contains the RVA to the export name of the binary, which should be a zero-terminated string. But that doesn't always have to be the case - if someone would like it, it would be able to omit the terminating zero which would make my program run into memory which doesn't belong to the section, where it would finally crash (in the best case...).
It wouldn't be a big problem to implement such a function by myself, but I'd prefer it if there were a solution for this implemented in the C++ Standard Library.
If you know that the buffer you want to make a string out of has at least one NUL in it then you can just pass it to the constructor:
const char[] buffer = "hello\0there";
std::string s(buffer);
// s contains "hello"
If you're not sure, then you just have to search the string for the first null, and tell the constructor of string to make a copy of that much data:
int len_of_buffer = something;
const char* buffer = somethingelse;
const char* copyupto = std::find(buffer, buffer + len_of_buffer, 0); // find the first NUL
std::string s(buffer, copyupto);
// s now contains all the characters up to the first NUL from buffer, or if there
// was no NUL, it contains the entire contents of buffer
You can wrap the second version (which always works, even if there isn't a NUL in the buffer) up into a tidy little function:
std::string string_ncopy(const char* buffer, std::size_t buffer_size) {
const char* copyupto = std::find(buffer, buffer + buffer_size, 0);
return std::string(buffer, copyupto);
}
But one thing to note: if you hand the single-argument constructor a const char* by itself, it will go until it finds a NUL. It is important that you know there is at least one NUL in the buffer if you use the single-argument constructor of std::string.
Unfortunately (or fortunately), there is no built in perfect equivalent of strncpy for std::string.
The std::string class in STL can contain null characters within the string ("xxx\0yyy" is a perfectly valid string of length 7). This means that it doesn't know anything about null termination (well almost, there are conversions from/to C strings). In other words, there's no alternative in the STL for strncpy.
There are a few ways to still accomplish your goal with a shorter code:
const char *ptr = nameSra.pSection->pRawData->constPtr<char>(nameSra.offset);
m_exportName.assign(ptr, strnlen(ptr, nameSra.numBytesToSectionsEnd));
or
const char *ptr = nameSra.pSection->pRawData->constPtr<char>(nameSra.offset);
m_exportName.reserve(nameSra.numBytesToSectionsEnd);
for (int i = 0; i < nameSra.numBytesToSectionsEnd && ptr[i]; i++)
m_exportName += ptr[i];
Is there an exact equivalent to strncpy in the C++ Standard Library?
I certainly hope not!
I mean a function, that copies a string from one buffer to another until it hits the terminating 0?
Ah, but that's not what strncpy() does -- or at least it's not all it does.
strncpy() lets you specify the size, n, of the destination buffer, and copies at most n characters. That's fine as far as it goes. If the length of the source string ("length" defined as the number of characters preceding the terminating '\0') exceeds n, the destination buffer is padded with additional \0's, something that's rarely useful. And if the length if the source string exceeds n, then the terminating '\0' is not copied.
The strncpy() function was designed for the way early Unix systems stored file names in directory entries: as a 14-byte fixed-size buffer that can hold up to a 14-character name. (EDIT: I'm not 100% sure that was the actual motivation for its design.) It's arguably not a string function, and it's not just a "safer" variant of strcpy().
You can achieve the equivalent of what one might assume strncpy() does (given the name) using strncat():
char dest[SOME_SIZE];
dest[0] = '\0';
strncat(dest, source_string, SOME_SIZE);
This will always '\0'-terminate the destination buffer, and it won't needlessly pad it with extra '\0' bytes.
Are you really looking for a std::string equivalent of that?
EDIT : After I wrote the above, I posted this rant on my blog.
There is no built-in equivalent. You have to roll your own strncpy.
#include <cstring>
#include <string>
std::string strncpy(const char* str, const size_t n)
{
if (str == NULL || n == 0)
{
return std::string();
}
return std::string(str, std::min(std::strlen(str), n));
}
The string's substring constructor can do what you want, although it's not an exact equivalent of strncpy (see my notes at the end):
std::string( const std::string& other,
size_type pos,
size_type count = std::string::npos,
const Allocator& alloc = Allocator() );
Constructs the string with a substring [pos, pos+count) of other. If count == npos or if the requested substring lasts past the end of the string, the resulting substring is [pos, size()).
Source: http://www.cplusplus.com/reference/string/string/string/
Example:
#include <iostream>
#include <string>
#include <cstring>
int main ()
{
std::string s0 ("Initial string");
std::string s1 (s0, 0, 40); // count is bigger than s0's length
std::string s2 (40, 'a'); // the 'a' characters will be overwritten
strncpy(&s2[0], s0.c_str(), s2.size());
std::cout << "s1: '" << s1 << "' (size=" << s1.size() << ")" << std::endl;
std::cout << "s2: '" << s2 << "' (size=" << s2.size() << ")" << std::endl;
return 0;
}
Output:
s1: 'Initial string' (size=14)
s2: 'Initial string' (size=40)
Differences with strncpy:
the string constructor always appends a null-terminating character to the result, strncpy does not;
the string constructor does not pad the result with 0s if a null-terminating character is reached before the requested count, strncpy does.
Use the class' constructor:
string::string str1("Hello world!");
string::string str2(str1);
This will yield an exact copy, as per this documentation: http://www.cplusplus.com/reference/string/string/string/
std::string has a constructor with next signature that can be used :
string ( const char * s, size_t n );
with next description:
Content is initialized to a copy of the string formed by the first n characters in the array of characters pointed by s.

How to get the number of bytes occupied by a specific string in the program?

Im using BSPlib and I want to use the bsp_put which requires me to set the size of the string I'm sending.
Even if you aren't familiar with BSP, this is not closely related. thanks.
Multiply the number of characters (given by size(), or capacity() if you want to know the total amount allocated rather than the amount in use) by the size of the character type.
If it's std::string itself, an alias for std::basic_string<char>, then the character size is one byte, so size() (or capacity()) alone will do.
strlen returns the length of string of a plain C string.
A C string is as long as the amount of characters between the beginning of the string and the terminating null character.
If you're using the String object you can use the length or size method of the object:
http://www.cplusplus.com/reference/string/string/length/
The number of characters in a std::string can be had by the "size()" member of std::string.
std::string s = "Hey, look, I'm a string!"
std::string::size_type len = s.size();
std::cout << "My string is " << len << "characters long." << std::endl;
As people have pointed out, you cannot rely upon the memory organization of std::string, except for two cases: std::string::data() and std::string::c_str(). Each of these functions return a pointer to contiguous memory, which memory holds the same characters as the string. (The memory may or may not point to the real string, but it doesn't matter, you can't write to it anyway.) The difference between the two calls is whether the memory has a terminating null byte: data() has no terminating character, c_str() does.
// assuming that bsp_put_bytes takes a pointer & len
bsp_put_bytes(s.data(), s.size());
// and bsp_put_string takes a C-style string
bsp_put_string(s.c_str());
Carefully read the caveats in the links I gave you, including the valid lifetime of the pointed-to characters.
std::string myString("this is the text of my string");
const char *copyOfString = strdup(myString.c_str());
size_t myStringLength = strlen(copyOfString);
free(copyOfString);
That's probably the most efficient way of getting the length of the string. Let me know how impressed your coworkers are when you show them your new solution using this example.