C++: Does strcat() overwrite or move the null? - c++

Now lets see this small program
char s[20]="One";
strcat(s,"Two");
cout<<s<<endl;
Here at first s has the value "One" and for visual representation this is the value of s:
O - n - e - \0
Then I add "Two" to the end of the string producing this:
O - n - e - T - w - o - \0
Now as you can see the only null in the string at first was after "One" now it is after "OneTwo"
My question is:
Is the null overwritten by the string "Two" and then it adds it's own null at the end.
Or is the null that was already there in the beginning moved back to be at the end again?
(This question might seem not to make a difference but I like to know about everything I learn)
Thank you

The first \0 is overwritten, and a new \0 is added at the end of the concatenated string. There is no scope for "moving" anything here. These are locations to which values get assigned.

Although the question has been answered correctly and repeatedly, it may be nice to get a most officialest answer from the source™. Or at least from the sources I can find with Google.
This document, which claims to be the C++ Standard (or a working draft thereof), says:
The C++ Standard library provides 209 standard functions from the C
library [including strcat].
--"Standard C library", C.2.7, pg 811
Jumping over to this document claiming to be the C International Standard, we see:
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1. If copying takes place between objects
that overlap, the behavior is undefined.
--"The strcat function", 7.21.3.1, pg 327
strcat does indeed overwrite the null character.

Yes, the \0 of the first argument to strcat gets overwritten, and it becomes the last character of the concatenated string.
It doesn't move as such, the function just appends \0 as the last character of the concatenated string.

The only way to know for sure is to look at the source of your particular version of strcat. The typical implementation will overwrite the first null and copy the null from the second string into the last position.
This is really nit-picking though, you won't be able to detect the difference in the output no matter which method is used.

Related

Why doesn't strlen() count the byte of the terminating NUL-character, when the NUL-character is defined to be part of a string?

I know that strlen() does not count the NUL-terminating character with. I really know that this is a fact. Thus, this question is NOT about asking for why strlen() might "presumably" not return the right string length, which is already asked and answered alot well here on StackOverflow, f.e. in this thread, or this one.
So lets go ahead to my question:
In ISO/IEC 9899:1990 (E); 7.1.1., is stated:
A string is a contiguous sequence of characters terminated by and including the first null character.
What is the reason, why strlen() deviate from this formed standard, and does not "want" to accept a string with its NUL-terminating character?
Why?
Because you would expect this pseudocode's assertion to hold true:
str1 = "foo"
str2 = "bar"
str3 = concatenate(str1, str2)
Assert strlen(str1) + strlen(s2) == strlen(str3)
If terminating '\0' was counted by strlen, above assertion would not hold, which would be much more of overall headache, than what the current C string behavior is. More importantly, it would in my opinion be quite unintuitive and illogical.
Taking your doubt as a reasonable point we can state that: The C-string consists of two parts:
the string's useful content ("the text");
the null terminating character;
The null terminating character is purely a technical measure for determination of the end of the string by the C-originated library functions. Still, if one types a declaration:
char * str = "some string";
they logically would rather expect its length to be 11 which is as many as they can see in this statement. Hence the strlen() value yields only the length of the part 1. of the string.
Not really an answer to your question, but consider this example:
char string[] = "string";
printf("sizeof: %zu\n", sizeof(string));
printf("strlen: %zu\n", strlen(string));
This prints
sizeof: 7
strlen: 6
So sizeof counts the \0, but strlen doesn't.
Questions like this, that ask why a certain age-old decision was made one way and not another way, are hard to answer. I can say that it's perfectly obvious to me, anyway, that strlen should count just the real, "interesting" characters that are in the string, and ignore the \0 at the end that merely terminates it. I'm used to accounting for the \0 separately. I imagine it would have been considerably more of a nuisance overall if strlen had been defined the other way. But I can't prove this with convincing arguments, and I've been using strlen with its current definition for so long that I'm probably hopelessly biased; I might be saying "it's perfectly obvious to me that..." even if strlen's definition were quite wrong.
There is a difference between the physical, stored representation of a C style string and the logical representation of a C style string.
The physical representation, how the string is actually stored in memory or other media includes the null character. The null character is included when discussing the physical representation because it take up an additional piece of storage. In order to be a C style string the null character must be stored.
However the logical representation of a string does not include the null character. The logical representation of a string includes only the text characters that the programmer is wanting to manipulate.
I suspect that the null character, a value of binary zero, was chosen because of the original ASCII character set defined a character value of zero as the NULL character. Part of the lower values among the various teletype control codes, it seems to be the least likely ASCII character that may appear in text. See ASCII Character Codes.
Another nice quality of using a binary zero as the string terminator is that is the value that represents logical false so iterating over a string is often a matter of incrementing an array index or incrementing a pointer while logical true since all characters other than the end of string indicator have a non-zero or logical true value.
Due to how close to the hardware that the C programming language is, the programmer needs to be concerned about both representations, the physical representation when allocating memory to store a string which includes the null character and the logical representation which is the string without the null character.
The various C style string manipulation functions in the Standard Library (strlen(), strcpy(), etc.) are all designed around the logical representation of a C style string. They perform their actions by using the null character as not being part of the text but rather as a special indicator character which indicates the end of the string. However as a part of their operations they need to be aware of the null character and its use as a special symbol. For instance when strcpy() or strcat() are used to copy strings, they must also copy the null character that indicates the end of the string even though it is not part of the actual text of the logical representation.
This choice allows text strings to be stored as arrays of characters, as befits the hardware orientation and efficiency characteristics of C. There is no need to create an additional built in type for text strings and it fits well with the lean character of the C programming language.
C++ is able to provide the std::string because of being object oriented and having the additional facilities of the language that allows for objects to be created and managed. The C programming language, due to its simple syntax and lack of object oriented facilities does not have this convenience.
The problem with this approach is that the programmer needs to be aware of both the physical representation and the logical representation of text strings and be able to accommodate the needs of both when writing programs.

Please explain me the need for appending '\0' while converting string to char

While using proc/mysql for c++ I have taken string as user input and converted into char via strcpy(c,s.c_str()); function, where c is the binding variable through which I'll add value in the database table and s is the string (user input), it is working fine but my teacher is asking me append '\0' at the end - I can't understand the reason why I need to?
Your teacher is deluded.
c_str() in itself appends a zero [or rather, std::string reserves space for an extra character when creating the string, and makes sure this is zero at least at the point of c_str() returning - in C++11, it is guaranteed that there is an extra character space filled with zero at the end of the string, always].
You DO need a zero at the end of a string to mark the end of the string in a C-style string, such as those used by strcpy.
[As others have pointed out, you should also check that the string fits before copying, and I would suggest reject if it won't fit, as truncating it will lead to other problems - as well as checking that there isn't any sql-injection attacks and a multitude of other things required for "good pracice in an SQL environment"]
While the teacher is deluded on the appending '\0' to the string, your code exhibits another very bad bug.
You should never use strcpy in such a fashion. You should always use some routine which controls the nubmer of characters copied, like strncpy(), or other alternatives, and provide it with the size of receiving variable. Otherwise you are just asking for troubles.
Just guessing, it's a protection against buffer overflow. If c is only N bytes long and s.c_str() returns a pointer to a N+k length string, you'd write k bytes after c, which is bad.
Now let's say (if you didn't SEGFAULT already) you pass this c NUL-terminated string to a C function, you have no guarantee that the \0 you wrote after c is still there. This C function will then read an undefined amount of bytes after c, which is badder worse.
Anyway, use ::strncpy():
char c[64];
::strncpy(c, s.c_str(), sizeof(c));
c[sizeof(c)-1] = '\0';

how to make a not null-terminated c string?

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\0' in it?
i write these lines:
char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));
i get the result :"aaa","3",but i think this result is because that a '\0'(or a 0 byte) happens to reside in the location s2+3.
how to make a not null-terminated c string? strlen and other c string function relies heavily on the '\0' byte,what if there is no '\0',i just want know this rule deeper and better.
ps: my curiosity is aroused by studying the follw post on SO.
How to convert a const char * to std::string
and these word in that post :
"This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated."
If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.
You can still print a non-terminated character array with printf, as long as you give the length:
printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);
or, if you have access to the array itself, not a pointer:
printf("str is %.*s", (int)(sizeof s2), s2);
You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.
A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.
So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.
Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.
Your supposition is correct: your strlen is returning the correct value out of sheer luck, because there happens to be a zero on the stack right after your improperly terminated string. It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries.
You cannot depend on this. C strings need NUL characters (zeroes) at the end to work correctly. C string handling is messy, and error-prone; there are libraries and APIs that help make it less so… but it's still easy to screw up. :)
In this particular case, your string could be initialized as one of these:
A: char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
B: char *s2 = "aaa"; // if you don't need to modify the string after creation
C: char s2[]="aaa"; // if you DO need to modify the string afterwards
Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end.
What happens is that strlen keeps going, reading memory values until it eventually gets to a null. it then assumes that is the terminator and returns the length that could be massively large. If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer.
Copying a non-null terminated C string into a std:string will do this. If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. That's inefficient.
The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. Its no different to any other API, you must follow the rules that API sets down for correct usage.
Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always, and manually keep track of the correct lengths.
Convention states that a char array with a terminating \0 is a null terminated string. This means that all str*() functions expect to find a null-terminator at the end of the char-array. But that's it, it's convention only.
By convention also strings should contain printable characters.
If you create an array like you did char arr[3] = {'a', 'a', 'a'}; you have created a char array. Since it is not terminated by a \0 it is not called a string in C, although its contents can be printed to stdout.
The C standard does not define the term string until the section 7 - Library functions. The definition in C11 7.1.1p1 reads:
A string is a contiguous sequence of characters terminated by and including the first null character.
(emphasis mine)
If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period.
What you have done is undefined behavior.
You are trying to write to a memory location that is not yours.
Change it to
char s2[] = {'a','a','a','\0'};

C++: strcpy Function copies null?

While using string manipulation functions specificaly strcpy I did this small program.
char s1[8]="Hellopo";
char s2[4]="sup";
strcpy(s1,s2);
cout<<s1<<endl;
When I printed out s1 It actually just printed out "sup". I expected it to print "suplopo".
Then I did this:
cout<<s1+4 << endl;
It printed out "opo";
And The output of this: cout<<s1+3<<endl; was nothing
So after thinking a bit about it.
I came to this conclusion. Since C++ stops outputing the string once it reaches the null terminator. Therefore the null must have been copied in the strcpy function. Resulting in this string:
s - u - p - \0 - o - p - o -\0;
Please tell me if this is correct or not. And if im not please correct me.
And if you have any more info to provide please do.
Your reasoning is correct, and would have easily been confirmed by any decent manual:
The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest.
Your reasoning regarding the copying of the terminating character is correct. The C++ standard (which is the definitive specification for the language) defers to C on this matter (for example, C++14 defers to C99, and C++17 defers to C11).
The C11 standard has this to say about strcpy:
7.24.2.3 The strcpy function
Synopsis:
#include <string.h>
char *strcpy(char * restrict s1, const char * restrict s2);
Description:
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
Returns:
The strcpy function returns the value of s1.
If you just wanted to replace the first three characters of your string, you can use memcpy() to copy a specific number of bytes:
memcpy(s1, s2, strlen(s2));
Keep in mind that this will just copy those bytes and nothing more. If s1 isn't already a string of at least the length of s2, it's unlikely to end well :-)
And just keep one thing in mind re your comment "... resulting in this string: sup\0opo\0".
That is not a string. A string in C (and a legacy string in C++) is defined as a series of characters up to and including the first \0 terminator.
You may well have a series of characters up to the original (now second) \0 but the string is actually shorter than that. This may seem a little pedantic but it's important to understand the definitions.
You are correct. For the effect you initially expected, you would use strncopy. strncopy copies the null terminator as long as you specify the correct length of the string that is being copied.
This is correct.
http://pubs.opengroup.org/onlinepubs/009695399/functions/strcpy.html
The strcpy() function shall copy the string pointed to by s2
(including the terminating null byte) into the array pointed to by s1.
Yes, this is correct. strcpy will include the null terminator. It's important as if you copy the string to a new memory block you want it null terminated by default. I believe strncpy might be what you're after in this case.
Also I know it was only testing code but I'd be careful in this day and age of using +X offsets in strings... ascii normally bites people in the rear end in the utf8/unicode world we now live in.
From the man page for strcpy:
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer pointed to
by dest. The strings may not overlap, and the destination string
dest must be large enough to receive the copy.
Yes, you are correct in your reasoning and if you would have explicitly typecast-ed the 4th character 's1[3]' as integer like this:
cout<<(int)s1[3];
You would have gotten "0" as the output which is the ASCII value of NULL character.

Why are strings in C++ usually terminated with '\0'?

In many code samples, people usually use '\0' after creating a new char array like this:
string s = "JustAString";
char* array = new char[s.size() + 1];
strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';
Why should we use '\0' here?
The title of your question references C strings. C++ std::string objects are handled differently than standard C strings. \0 is important when using C strings, and when I use the term string in this answer, I'm referring to standard C strings.
\0 acts as a string terminator in C. It is known as the null character, or NUL, and standard C strings are null-terminated. This terminator signals code that processes strings - standard libraries but also your own code - where the end of a string is. A good example is strlen which returns the length of a string: strlen works using the assumption that it operates on strings that are terminated using \0.
When you declare a constant string with:
const char *str = "JustAString";
then the \0 is appended automatically for you. In other cases, where you'll be managing a non-constant string as with your array example, you'll sometimes need to deal with it yourself. The docs for strncpy, which is used in your example, are a good illustration: strncpy copies over the null terminator character except in the case where the specified length is reached before the entire string is copied. Hence you'll often see strncpy combined with the possibly redundant assignment of a null terminator. strlcpy and strcpy_s were designed to address the potential problems that arise from neglecting to handle this case.
In your particular example, array[s.size()] = '\0'; is one such redundancy: since array is of size s.size() + 1, and strncpy is copying s.size() characters, the function will append the \0.
The documentation for standard C string utilities will indicate when you'll need to be careful to include such a null terminator. But read the documentation carefully: as with strncpy the details are easily overlooked, leading to potential buffer overflows.
Why are strings in C++ usually terminated with '\0'?
Note that C++ Strings and C strings are not the same.
In C++ string refers to std::string which is a template class and provides a lot of intuitive functions to handle the string.
Note that C++ std::string are not \0 terminated, but the class provides functions to fetch the underlying string data as \0 terminated c-style string.
In C a string is collection of characters. This collection usually ends with a \0.
Unless a special character like \0 is used there would be no way of knowing when a string ends.
It is also aptly known as the string null terminator.
Ofcourse, there could be other ways of bookkeeping to track the length of the string, but using a special character has two straight advantages:
It is more intuitive and
There are no additional overheads
Note that \0 is needed because most of Standard C library functions operate on strings assuming they are \0 terminated.
For example:
While using printf() if you have an string which is not \0terminated then printf() keeps writing characters to stdout until a \0 is encountered, in short it might even print garbage.
Why should we use '\0' here?
There are two scenarios when you do not need to \0 terminate a string:
In any usage if you are explicitly bookkeeping length of the string and
If you are using some standard library api will implicitly add a \0 to strings.
In your case you already have the second scenario working for you.
array[s.size()] = '\0';
The above code statement is redundant in your example.
For your example using strncpy() makes it useless. strncpy() copies s.size() characters to your array, Note that it appends a null termination if there is any space left after copying the strings. Since arrayis of size s.size() + 1 a \0 is automagically added.
'\0' is the null termination character. If your character array didn't have it and you tried to do a strcpy you would have a buffer overflow. Many functions rely on it to know when they need to stop reading or writing memory.
strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';
Why should we use '\0' here?
You shouldn't, that second line is waste of space. strncpy already adds a null termination if you know how to use it. The code can be rewritten as:
strncpy(array, s.c_str(), s.size()+1);
strncpy is sort of a weird function, it assumes that the first parameter is an array of the size of the third parameter. So it only copies null termination if there is any space left after copying the strings.
You could also have used memcpy() in this case, it will be slightly more efficient, though perhaps makes the code less intuitive to read.
In C, we represent string with an array of char (or w_char), and use special character to signal the end of the string. As opposed to Pascal, which stores the length of the string in the index 0 of the array (thus the string has a hard limit on the number of characters), there is theoretically no limit on the number of characters that a string (represented as array of characters) can have in C.
The special character is expected to be NUL in all the functions from the default library in C, and also other libraries. If you want to use the library functions that relies on the exact length of the string, you must terminate the string with NUL. You can totally define your own terminating character, but you must understand that library functions involving string (as array of characters) may not work as you expect and it will cause all sorts of errors.
In the snippet of code given, there is a need to explicitly set the terminating character to NUL, since you don't know if there are trash data in the array allocated. It is also a good practice, since in large code, you may not see the initialization of the array of characters.