Is the: "std::string can hold '\0' character" by design? - c++

The fact that std::string can actually hold '\0' characters comes up all the time. This is of course inconsistent with C-style strings.
So I'm wondering, is this by design, or is it an omission, or is it just the fact that standard doesn't forbid it and compilers allow this to happen?

I'm wondering what your quarrel is. '\0' is just another character. There is no efficient way to forbid it in a general purpose 'char' string.
That the same character has a special meaning in C is unfortunate but has to be dealt with as every restriction that is imposed by legacy code as soon as you interoperate with it.
This shouldn't be an issue as long as you stick to code that uses std::string exclusively.
To address your comment we need to look at the constructor that takes a char* which would be basic_string(const charT* s, const Allocator& a = Allocator()) in 21.4.2 9/10 in n3242. It says that the size of the internal string is determined through traits::length(s) which in the case of std::string is strlen which requires its argument to be null terminated. So yes, if you try to construct a std::string from an const char* it needs to be null terminated.

There is a set of functions that accept 'char *' arguments and assume that the string is terminated by a zero. If you use them carefully, you can certainly have strings with 0's in them.
STL strings, in contrast, intentionally permit zero bytes, since they don't use 0 for termination. So the simple answer to your question is, 'yes, by design.'

The standard doesn't say that in case of an std::string '\0' is any special character. Therefore, any compliant implementation of std::string should not treat '\0' as any special character. Unless of course a const char* is passed to a member function of a string, which is assumed to be null-terminated.

By design.
C also can have not null terminated strings:
char sFoo[4];
strncpy(sFoo,"Test",sizeof(sFoo));
Where sFoo holds non-NULL terminated string.
And it have have Null-Terminated strings that can have 0 like
struct String {
char *str;
size_t length;
size_t capacity;
};
String literals are NUL terminated but this is not always refers to strings.
So having NUL terminated string is practice but it does mean that 0 in invalid character.

strncpy vs. strncat
That said, strncpy and strncat etc. will append a null terminator if there's room.
Actually strncpy and strncat are very different:
strncpy writes a "NUL-filled n-bytes string" to a n-bytes buffer: a string whose length l is at most n, such that the last n - l bytes are filled with NUL. Note the plural: all last bytes are zeroed, note just one. Also note the fact that the maximum allowed value for l is really n, so there can be zero NUL bytes: the buffer may no hold a NUL-terminated string. (GCC has a non-portable function to measure such "NUL-filled n-bytes string": strnlen.)
On the contrary, strncat outputs a NUL-terminated string to a buffer. In both cases, the string is truncated if it is too long, but in the case of strncpy, a n letters string will fit in a n-bytes buffer, whereas in the case of strncat, a result of n letters will only fit in (n+1)-bytes buffer.
This difference causes a lot of confusion to C beginners and even non-beginners. I have even seen lesson and books that teach "safe C programming" that had confused and contradicting informations about these standard functions.
These so-called "safe" C string manipulation functions (the "strn*" family) have been very criticized in the C "secure programming" community, and better designed (but non-standard) alternatives have been invented (notably the "strl*" family: strlcpy...).
Summary:
strncpy will append a null terminator if there's room;
strncat will append a null terminator always.

Related

Is it Safe to strncpy Into a string That Doesn't Have Room for the Null Terminator?

Consider the following code:
const char foo[] = "lorem ipsum"; // foo is an array of 12 characters
const auto length = strlen(foo); // length is 11
string bar(length, '\0'); // bar was constructed with string(11, '\0')
strncpy(data(bar), foo, length);
cout << data(bar) << endl;
My understanding is that strings are always allocated with a hidden null element. If this is the case then bar really allocates 12 characters, with the 12th being a hidden '\0' and this is perfectly safe... If I'm wrong on that then the cout will result in undefined behavior because there isn't a null terminator.
Can someone confirm for me? Is this legal?
There have been a lot of questions about why to use strncpy instead of just using the string(const char*, const size_t) constructor. My intent has been to make my toy code close to my actual code which contains a vsnprintf. Unfortunately even after getting excellent answers here I've found that vsnprintf doesn't behave the same as strncpy, and I've asked a follow up question here: Why is vsnprintf Not Writing the Same Number of Characters as strncpy Would?
This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3
In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.
So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.
There are two different things here.
First, does strncpy add an additional \0 in this instance (11 non-\0 elements to be copied in a string of size 11). The answer is no:
Copies at most count characters of the byte string pointed to by src (including the terminating null character) to character array pointed to by dest.
If count is reached before the entire string src was copied, the resulting character array is not null-terminated.
So the call is perfectly fine.
Then data() gives you a proper \0-terminated string:
c_str() and data() perform the same function. (since C++11)
So it seems that for C++11, you are safe. Whether the string allocates an additional \0 or not doesn't seems to be indicated in the documentation, but the API is clear that what you are doing is perfectly fine.
You have allocated an 11-character std::string. You are not trying to read nor write anything past that, so that part will be safe.
So the real question is whether you have messed up the internals of the string. Since you haven't done anything that isn't allowed, how would that be possible? If it's required for the string to internally keep a 12-byte buffer with a null padding at the end in order to fulfill its contract, that will be the case no matter what operations you performed.
Yes it's safe according to the char * strncpy(char* destination, const char* source, size_t num):
Copy characters from string
Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

Strncpy should only be used with fixed length arrays

According to this StackOverflow comment strncpy should never be used with a non-fixed length array.
strncpy should never be used unless you're working with fixed-width, not-necessarily-terminated string fields in structures/binary files. – R.. Jan 11 '12 at 16:22
I understand that it is redundant if you are dynamically allocating memory for the string but is there a reason why it would be bad to use strncpy over strcpy
strncpy will copy data up to the limit you specify--but if it reaches that limit before the end of the string, it'll leave the destination unterminated.
In other words, there are two possibilities with strncpy. One is that you get behavior precisely like strcpy would have produced anyway (except slower, since it fills the remainder of the destination buffer with NULs, which you virtually never actually want or care about). The other is that it produces a result you generally can't put to any real use.
If you want to copy a string up to a maximum length into a fixed-length buffer, you can (for example) use sprintf to do the job:
char buffer[256];
sprintf(buffer, "%255s", source);
Unlike strncpy, this always zero-terminates the result, so the result is always usable as a string.
If you don't want to use sprintf (or similar), I'd advise just writing a function that actually does what you want, something on this general order:
void copy_string(char const *dest, char const *source, size_t max_len) {
size_t i;
for (i=0; i<max_len-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
Since you've tagged this as C++ (in addition to C): my advice would be to generally avoid this whole mess in C++ by just using std::string.
If you really have to work with NUL-terminated sequences in C++, you might consider another possibility:
template <size_t N>
void copy_string(char const (&dest)[N], char const *source) {
size_t i;
for (i=0; i<N-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
This only works when the destination is an actual array (not a pointer), but for that case, it gets the compiler to deduce the size of the array, instead of requiring the user to pass it explicitly. This will generally make the code a tiny bit faster (less overhead in the function call) and much harder to screw up and pass the wrong size.
The argument against using strncpy is that it does not guarentee that your string will be null terminated.
The less error prone way to copy a string in C when using non-fixed length arrays is to use snprintf which does guarentee null termination of your string.
A good Blog Post Commenting on *n* functions.
These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer.
Which means that the use of strncpy and other such functions when not dealing with fixed arrays introduces unnessisary risk of non-null terminated strings which can be time-bombs in your code.
char * strncpy ( char * destination, const char * source, size_t num );
Limitations of strncpy():
It doesn't put a null-terminator on the destination string if it is completely filled. And, no null-character is implicitly appended at the end of destination if source is longer than num.
If num is greater than the length of source string, the destination string is padded with null characters up to num length.
Like strcpy, it is not a memory-safe operation. Because it does not check for sufficient space in destination before it copies source, it is a potential cause of buffer overruns.
Refer: Why should you use strncpy instead of strcpy?
We have 2 versions for copy string from one to another
1> strcpy
2> strncpy
These two versions is used for fixed and non-fixed length array. The strcpy don't check the upper bound for destination string when copy string, strncpy will check it. When the destination string is reached to this upper bound, the function strncpy will return error code, in the meantime the function strcpy will cause some effect in memory of the current process and terminate the process immediately. So that the strncpy is more secure than strcpy

why char* passed to FUNCTION always with the len of the string

i am learning c/c++ recently.but i don't understand the difference between
int a(chat* str,int len)
{
cout<<str<<len;
}
and
int a(char* str)
{
cout<<str<<strlen(str);
}
When you pass char* without a length, how would you know how many elements to process? char* means a pointer to a character. When you pass a pointer, you have no idea (and cannot find out) how much memory (if any) was allocated for the pointer.
That's why C-strings use are null-terminated (they end with a '\0' character), so you can detect their length by iterating the pointer. Hence, if you want to use a pointer without giving the length of its allocated memory, you need to obey some conventions. But in general, e.g. when passing a buffer, you shouldn't expect any end-signalling character, so in this case you need to pass the length, otherwise may end up reading/writing out of bounds.
For your particular example, you're fine with passing only a pointer provided you use your function only on C-strings, since strlen(str) uses this convention of counting until encountering a '\0'.
Buffer overflows are one the most messy and nightmarish programming errors, which can result in serious security issues. That's why you should try (whenever possible) to use std::string from the C++ standard library instead of C-style char* strings.
A C-String should always contain a termination character, we call it null character. It's technically 0 (not the number 0, but ASCII 0)
When we create a char* and initialize it with some text, it automatically adds the '\0' to the end.
char* c = "Hello";
This will create an array of char with six elements. Yes, six elements.
c = {'H', 'e', 'l', 'l', 'o', '\0'}
When you print c, it will search till it finds that '\0'. What if someone replaces it.
c[5] = '!';
Then the system can't determine the end of the text. Then it will keep on reading the memory (which does not belong to that variable, or may be even the program) until it hits a null char.
That is the main reason to pass the size (or number or chars to read) to a function.
On the other hand, if you need to read some data from a stream, you can use a buffer. In that case, you should specify how many bytes to read, in that way you will not cause buffer overflows.
Above answers are to the point. So I'm going to discuss other perspective behind of practise of passing length along with char *.
As others said, not always, the string pointed by char * end up with \0. Only when the string ends with \0 strlen() would actually work. There are certain use-cases for example binary coding, where data is represented as string. In such case, char * would not end with \0. Besides, there can be certain use-cases to read / write only up to certain length / size. In such case, it is always necessary to test whether the input length is within the range of length of total string. So as a common case, length has been passed explicitly, which can be used in any way as desired by the caller.

strncpy to already created char []

There is class
class Cow{
char name[20];
char* hobby;
double weight;
public:
[..]
Cow & operator=(const Cow &c);
[..]
};
and I'm wondering how to write definition of operator= method.
I wrote definition that equal to -
Cow & Cow::operator=(const Cow &c){
if(this==&c)
return *this;
delete [] hobby;
hobby=new char [strlen(c.hobby)+1];
weight=c.weight;
strncpy(name,c.name,20);
return *this;
}
but what if there is already created name[20] with like "Philip Maciejowsky" and I strncpy to it "Adam". After operator=(...) will name equal to "adamlip Maciejowsky"?
How to fix it if it overwrites like that?
Use strcpy() or add a null terminator after using strncpy(). strncpy() does not add the null terminator (\0), where as strcpy() does.
My advice: use std::string instead of c-styled null terminated string.
when in rome, do the romans!
From http://cplusplus.com
No null-character is implicitly appended at the end of destination if source is longer than >num (thus, in this case, destination may not be a null terminated C string).
Since Adam is lesser in length than Philip Maciejowsky - the strncpy() will NOT pad the remaining destination(that is Philip Maciejowsky) with \0. And hence the output looks like:
Adamip Maciejowsky - strcpy() or doing memset(destination, 0, lengthOfDestination) and then calling strncpy() will result in your output being Adam as well. Multiple ways to do what you're trying to do.
First, if you're using C++ you shouldn't be using C-style strings and should instead be using the class std::string which makes everything easier in every way.
Assuming you're required to use char* strings, strncpy takes care of this. C-style string are null-terminated, meaning that a string such as "test" takes up five bytes. The bytes are, in order, {'t', 'e', 's', t', 0}. The zero (or null) byte serves as a marker that the end of the string has been reached.
From the manpage for strncpy on my system:
The following sets chararray to abc\0\0\0:
char chararray[6];
(void)strncpy(chararray, "abc", sizeof(chararray));
So this means that the string will contain "adam\0\0\0\0\0\0\0[etc.]" where \0 represents the null byte. String functions will stop processing when they read the first null (because, remember, with C-style strings, there's no way to know the length of the string without scanning through it looking for \0).

Is there a safe version of strlen?

std::strlen doesn't handle c strings that are not \0 terminated. Is there a safe version of it?
PS I know that in c++ std::string should be used instead of c strings, but in this case my string is stored in a shared memory.
EDIT
Ok, I need to add some explanation.
My application is getting a string from a shared memory (which is of some length), therefore it could be represented as an array of characters. If there is a bug in the library writing this string, then the string would not be zero terminated, and the strlen could fail.
You've added that the string is in shared memory. That's guaranteed readable, and of fixed size. You can therefore use size_t MaxPossibleSize = startOfSharedMemory + sizeOfSharedMemory - input; strnlen(input, MaxPossibleSize) (mind the extra n in strnlen).
This will return MaxPossibleSize if there's no \0 in the shared memory following input, or the string length if there is. (The maximal possible string length is of course MaxPossibleSize-1, in case the last byte of shared memory is the first \0)
C strings that are not null-terminated are not C strings, they are simply arrays of characters, and there is no way of finding their length.
If you define a c-string as
char* cowSays = "moo";
then you autmagically get the '\0' at the end and strlen would return 3. If you define it like:
char iDoThis[1024] = {0};
you get an empty buffer (and array of characters, all of which are null characters). You can then fill it with what you like as long as you don't over-run the buffer length. At the start strlen would return 0, and once you have written something you would also get the correct number from strlen.
You could also do this:
char uhoh[100];
int len = strlen(uhoh);
but that would be bad, because you have no idea what is in that array. It could hit a null character you might not. The point is that the null character is the defined standard manner to declare that the string is finished.
Not having a null character means by definition that the string is not finished. Changing that will break the paradigm of how the string works. What you want to do is make up your own rules. C++ will let you do that, but you will have to write a lot of code yourself.
EDIT
From your newly added info, what you want to do is loop over the array and check for the null character by hand. You should also do some validation if you are expecting ASCII characters only (especially if you are expecting alpha-numeric characters). This assumes that you know the maximum size.
If you do not need to validate the content of the string then you could use one of the strnlen family of functions:
http://msdn.microsoft.com/en-us/library/z50ty2zh%28v=vs.80%29.aspx
http://linux.about.com/library/cmd/blcmdl3_strnlen.htm
size_t safe_strlen(const char *str, size_t max_len)
{
const char * end = (const char *)memchr(str, '\0', max_len);
if (end == NULL)
return max_len;
else
return end - str;
}
Yes, since C11:
size_t strnlen_s( const char *str, size_t strsz );
Located in <string.h>
Get a better library, or verify the one you have - if you can't trust you library to do what it says it will, then how the h%^&l do you expect your program to?
Thats said, Assuming you know the length of the buiffer the string resides, what about
buffer[-1+sizeof(buffer)]=0 ;
x = strlen(buffer) ;
make buffer bigger than needed and you can then test the lib.
assert(x<-1+sizeof(buffer));
C11 includes "safe" functions such as strnlen_s. strnlen_s takes an extra maximum length argument (a size_t). This argument is returned if a null character isn't found after checking that many characters. It also returns the second argument if a null pointer is provided.
size_t strnlen_s(const char *, size_t);
While part of C11, it is recommended that you check that your compiler supports these bounds-checking "safe" functions via its definition of __STDC_LIB_EXT1__. Furthermore, a user must also set another macro, __STDC_WANT_LIB_EXT1__, to 1, before including string.h, if they intend to use such functions. See here for some Stack Overflow commentary on the origins of these functions, and here for C++ documentation.
GCC and Clang also support the POSIX function strnlen, and provide it within string.h. Microsoft too provide strnlen which can also be found within string.h.
You will need to encode your string. For example:
struct string
{
size_t len;
char *data;
} __attribute__(packed);
You can then accept any array of characters if you know the first sizeof(size_t) bytes of the shared memory location is the size of the char array. It gets tricky when you want to chain arrays this way.
It's better to trust your other end to terminate it's strings or roll your own strlen that does not go outside the bounderies of the shared memory segment (providing you know at least the size of that segment).
If you need to get the size of shared memory, try to use
// get memory size
struct shmid_ds shm_info;
size_t shm_size;
int shm_rc;
if((shm_rc = shmctl(shmid, IPC_STAT, &shm_info)) < 0)
exit(101);
shm_size = shm_info.shm_segsz;
Instead of using strlen you can use shm_size - 1 if you are sure that it is null terminated. Otherwise you can null terminate it by data[shm_size - 1] = '\0'; then use strlen(data);
a simple solution:
buff[BUFF_SIZE -1] = '\0'
ofc this will not tell you if the string originally was exactly BUFF_SIZE-1 long or it was just not terminated... so you need xtra logic for that.
How about this portable nugget:
int safeStrlen(char *buf, int max)
{
int i;
for(i=0;buf[i] && i<max; i++){};
return i;
}
As Neil Butterworth already said in his answer above: C-Strings which are not terminated by a \0 character, are no C-Strings!
The only chance you do have is to write an immutable Adaptor or something which creates a valid copy of the C-String with a \0 terminating character. Of course, if the input is wrong and there is an C-String defined like:
char cstring[3] = {'1','2','3'};
will indeed result in unexpected behavior, because there can be something like 123#4x\0 in the memory now. So the result of of strlen() for example is now 6 and not 3 as expected.
The following approach shows how to create a safe C-String in any case:
char *createSafeCString(char cStringToCheck[]) {
//Cast size_t to integer
int size = static_cast<int>(strlen(cStringToCheck)) ;
//Initialize new array out of the stack of the method
char *pszCString = new char[size + 1];
//Copy data from one char array to the new
strncpy(pszCString, cStringToCheck, size);
//set last character to the \0 termination character
pszCString[size] = '\0';
return pszCString;
}
This ensures that if you manipulate the C-String to not write on the memory of something else.
But this is not what you wanted. I know, but there is no other way to achieve the length of a char array without termination. This isn't even an approach. It just ensures that even if the User (or Dev) is inserting ***** to work fine.