printf("... %c ...",'\0') and family - what will happen? - c++

How will various functions that take printf format string behave upon encountering the %c format given value of \0/NULL?
How should they behave? Is it safe? Is it defined? Is it compiler-specific?
e.g. sprintf() - will it crop the result string at the NULL? What length will it return?
Will printf() output the whole format string or just up to the new NULL?
Will va_args + vsprintf/vprintf be affected somehow? If so, how?
Do I risk memory leaks or other problems if I e.g. shoot this NULL at a point in std::string.c_str()?
What are the best ways to avoid this caveat (sanitize input?)

Any function that takes a standard C string will stop at the first null, no matter how it got there.
When you use the %c in a format and use 0 for the character value, it will insert a null into the output. printf will output the null as part of the output. sprintf will also insert the null into the result string, but the string will appear to end at that point when you pass the output to another function.
A std::string will happily contain a null anywhere within its contents, but when you take the c_str method to pass it to a function see the above answer.

What happens when you output a NUL depends on the output device.
It is a non printing character, i.e. isprint('\0') == 0; so when output to a display device, it has no visible affect. If redirected to a file however (or if calling fprintf()), it will insert a NUL (zero byte) into the file; the meaning of that will depend on how the file is used.
When output to a C string, it will be interpreted as a string terminator by standard string handling functions, although any other subsequent format specifiers will still result in data being placed in the buffer after the NUL, which will be invisible to standard string handling functions. This may still be useful if ultimately the array is not to be interpreted as a C string.
Do I risk memory leaks or other problems if I e.g. shoot this NULL at a point in std::string.c_str()?
It is entirely unclear what you mean by that, but if you are suggesting using the pointer returned by std::string.c_str() as the buffer for sprintf(); don't! c_str() returns a const char*, modifying the string through such a pointer is undefined. That however is a different problem, and not at all related to inserting a NUL into a string.
What are the best ways to avoid this caveat (sanitize input?)
I am struggling to think of a circumstance where you could "accidentally" write such code, so why would you need to guard against it!? Do you have a particular circumstance in mind? Even though I find it implausible, and probably unnecessary, what is so hard about:
if( c != 0 )
{
printf( "%c", c ) ;
}
or perhaps more usefully (since there are other characters you might want to avoid in the output)
if( isgraph(c) || isspace(c) )
{
printf( "%c", c ) ;
}
which will output only visible characters and whitespace (space, '\t','\f','\v','\n','\r').
Note that you might also consider isprint() rather than isgraph(c) || isspace(c), but that excludes '\t','\f','\v','\n' and '\r'

printf() and sprintf() will continue past a '\0' character inserted with %c, because their output is defined in terms of the content of the format string, and %c does not denote the end of the format string.
This includes their count; thus:
sprintf(x, "A%cB", '\0')
must always return 3 (although strlen(x) afterwards would return 1).

Related

string copied to character array

I'm using C++14 and as per my understanding std::string in C++11 and above are terminated by null character.
I'm unable to understand why the below code does not work.
string a="your";
char b[5];
for(int i=0; a[i]! ='\0';i++)
b[i] =a[i] ;
cout<<b;
Output is:
"yourul"(followed by 2 random characters)
The problem is that you don't copy the terminator to the destination array. Instead you end the loop when you encounter it (without copying it).
Without knowing the use-case or the need for the array (there seldom is any), don't use manual copying like that. If, for whatever reason you can't use e.g. a.c_str() or a.data() or even &a[0], then use strncpy instead:
strncpy(b, a.c_str(), sizeof b - 1); // -1 to leave space for the terminator
b[sizeof b - 1] = '\0'; // And make sure string is terminated
Do note that the guarantee of an existing terminator in the std::string object depends on the C++ standard used. Before C++11 there were no guarantees that the terminator existed in the string. In practice it still did (for simplicity's sake) but there were no guarantees by the specification.
So std::string does have the backwards compatibility guaranteed that the sequences returned by c_str() and data() are 0-terminated, and that s[s.size()] gives you a NUL. (Note that writing over any of these terminators is undefined behavior.)
However, your code has a different bug: the loop terminates the moment it encounters the NUL, and does not copy it over to b. So the last element of b has unspecified content, and reading it is undefined behavior.
Don't write the copy loop yourself. Use strlen or one of its variants. Or better yet, if at all possible, don't use writeable C-style strings at all.

wcslen or strlen should prevent infinite loop?

This function count number of chars between the begging of string and terminating null character.
size_t wcslen(const wchar_t* sz)
{
size_t l = 0;
while (*sz++) ++l;
return l;
}
Now if there is no terminating null character this functions should detect that or no? How they detect? Is there a limit for the loop and it is not an infinite indeed?
And how is the function to do that? The definition of the
length is "until the terminating nul character". A "safer"
version of the function might take an additional maximum length,
which would correspond to the length of the buffer where the
data was held. But the use of nul terminated strings is
universal in C, and most of the time, if you're calling this
function, it's because the function calling you only gave you a
pointer, and you don't know the actual length of the buffer.
In practice, if the input doesn't have a terminating nul
character, you'll get a buffer overrun, reading memory beyond
the end of your buffer. When doing so, sooner or later, you'll
either encounter a byte which contains a 0, and consider that
the end of your string, or you'll end up with an address which
isn't mapped to your process, and you'll crash.
This function stop at the first 0 value, if there is no wchar_t with a 0 at the end of sz parameter, it will continue in the rest of the memory till a 0 value is reached
If there is no null terminator for a string the behavior is undefined. You cannot tell what you will get.
Well, the contract is that you supply a NUL-terminated string. If you don't, all bets are off and the behaviour of the function is undefined.
If there is no null-terminator (though I have trouble imagining memory space all filled with garbage without any zeroes), then eventually sz will overflow, after which you'll try to dereference NULL which will get you an exception.

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?
That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.
Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.
If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.
There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.
Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.
In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.
In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';
Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.
A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.
The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.
String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"

STL basic_string length with null characters

Why is it that you can insert a '\0' char in a std::basic_string and the .length() method is unaffected but if you call char_traits<char>::length(str.c_str()) you get the length of the string up until the first '\0' character?
e.g.
string str("abcdefgh");
cout << str.length(); // 8
str[4] = '\0';
cout << str.length(); // 8
cout << char_traits<char>::length(str.c_str()); // 4
Great question!
The reason is that a C-style string is defined as a sequence of bytes that ends with a null byte. When you use .c_str() to get a C-style string out of a C++ std::string, then you're getting back the sequence the C++ string stores with a null byte after it. When you pass this into strlen, it will scan across the bytes until it hits a null byte, then report how many characters it found before that. If the string contains a null byte, then strlen will report a value that's smaller than the whole length of the string, since it will stop before hitting the real end of the string.
An important detail is that strlen and char_traits<char>::length are NOT the same function. However, the C++ ISO spec for char_traits<charT>::length (ยง21.1.1) says that char_traits<charT>::length(s) returns the smallest i such that char_traits<charT>::eq(s[i], charT()) is true. For char_traits<char>, the eq function just returns if the two characters are equal by doing a == comparison, and constructing a character by writing char() produces a null byte, and so this is equal to saying "where is the first null byte in the string?" It's essentially how strlen works, though the two are technically different functions.
A C++ std::string, however, it a more general notion of "an arbitrary sequence of characters." The particulars of its implementation are hidden from the outside world, though it's probably represented either by a start and stop pointer or by a pointer and a length. Because this representation does not depend on what characters are being stored, asking the std::string for its length tells you how many characters are there, regardless of what those characters actually are.
Hope this helps!

Defining a string with no null terminating char(\0) at the end

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.
Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.
The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;
C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).
In C++ you can use the string class and not deal with the null char at all.
Just for the sake of completeness and nail this down completely.
vector<char>
Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).
In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.
The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.
Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.