wcslen or strlen should prevent infinite loop? - c++

This function count number of chars between the begging of string and terminating null character.
size_t wcslen(const wchar_t* sz)
{
size_t l = 0;
while (*sz++) ++l;
return l;
}
Now if there is no terminating null character this functions should detect that or no? How they detect? Is there a limit for the loop and it is not an infinite indeed?

And how is the function to do that? The definition of the
length is "until the terminating nul character". A "safer"
version of the function might take an additional maximum length,
which would correspond to the length of the buffer where the
data was held. But the use of nul terminated strings is
universal in C, and most of the time, if you're calling this
function, it's because the function calling you only gave you a
pointer, and you don't know the actual length of the buffer.
In practice, if the input doesn't have a terminating nul
character, you'll get a buffer overrun, reading memory beyond
the end of your buffer. When doing so, sooner or later, you'll
either encounter a byte which contains a 0, and consider that
the end of your string, or you'll end up with an address which
isn't mapped to your process, and you'll crash.

This function stop at the first 0 value, if there is no wchar_t with a 0 at the end of sz parameter, it will continue in the rest of the memory till a 0 value is reached

If there is no null terminator for a string the behavior is undefined. You cannot tell what you will get.

Well, the contract is that you supply a NUL-terminated string. If you don't, all bets are off and the behaviour of the function is undefined.

If there is no null-terminator (though I have trouble imagining memory space all filled with garbage without any zeroes), then eventually sz will overflow, after which you'll try to dereference NULL which will get you an exception.

Related

Unpredictable string length when second argument of cin.getline() is greater than the length of array

using namespace std;
char str1[10],str2[10];
cin.getline(str1,14);
cin.getline(str2,10);
cout<<strlen(str1)<<'\t'<<strlen(str2);
The Output of the above code was as follows-
1234567890123
bye
13 3
How could be the length of str1 greater than 10?
It can't. You overran your buffer and overwrote memory outside of the array. Your program happened not to crash or teleport a cat into your monitor before it found a '\0' no earlier than 13 bytes in memory from the start of your 10-element array.
The behaviour of your overrunning a char array is undefined. To be clear, you need to ensure there is sufficient space for your data and a \0 string terminator else the behaviour of cout will be undefined.
The compiler is allowed to do anything if it encounters this.
Your output is a common manifestation, but you must not rely on such behaviour.
Because it is likely to use the space reserved for str2.
But this is undefined behaviour, it could do anything (likely a segfault(access violation or whatever is named on your OS)

Char pointer giving me some really strange characters

When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.

char[] vs LPCSTR strange behavior

Could you please explain why, in order to convert a char array like this:
char strarr[5] = {65,83,67,73,73}; //ASCII
Into LPCSTR to be accepted by GetModuleHandleA() and GetProcAddress(), I have to first append 0 to the end ?
i.e. I have:
char strarr[6] = {65,83,67,73,73,0};
And only then convert as (LPCSTR)&strarr.
For some reason I don't get the first one works only sometimes (i.e. if I do not add 0 at the end), while if I do add zero at the end - this work all the time. Why do I have to add zero?
Oh and a side question - why in C++ do I have to explicitly state the size of array in [], when I am initializing it with elements right away? (If I don't state the size, then it does not work)
Thanks.
Those functions expect NULL terminated strings.
Since you only give them a pointer to a char array, they can't possibily know its size, hence the need for a particular value (the terminating NULL character) to indicate the end of the string.

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?
That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.
Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.
If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.
There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.
Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.
In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.
In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';
Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.
A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.
The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.
String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"

printf("... %c ...",'\0') and family - what will happen?

How will various functions that take printf format string behave upon encountering the %c format given value of \0/NULL?
How should they behave? Is it safe? Is it defined? Is it compiler-specific?
e.g. sprintf() - will it crop the result string at the NULL? What length will it return?
Will printf() output the whole format string or just up to the new NULL?
Will va_args + vsprintf/vprintf be affected somehow? If so, how?
Do I risk memory leaks or other problems if I e.g. shoot this NULL at a point in std::string.c_str()?
What are the best ways to avoid this caveat (sanitize input?)
Any function that takes a standard C string will stop at the first null, no matter how it got there.
When you use the %c in a format and use 0 for the character value, it will insert a null into the output. printf will output the null as part of the output. sprintf will also insert the null into the result string, but the string will appear to end at that point when you pass the output to another function.
A std::string will happily contain a null anywhere within its contents, but when you take the c_str method to pass it to a function see the above answer.
What happens when you output a NUL depends on the output device.
It is a non printing character, i.e. isprint('\0') == 0; so when output to a display device, it has no visible affect. If redirected to a file however (or if calling fprintf()), it will insert a NUL (zero byte) into the file; the meaning of that will depend on how the file is used.
When output to a C string, it will be interpreted as a string terminator by standard string handling functions, although any other subsequent format specifiers will still result in data being placed in the buffer after the NUL, which will be invisible to standard string handling functions. This may still be useful if ultimately the array is not to be interpreted as a C string.
Do I risk memory leaks or other problems if I e.g. shoot this NULL at a point in std::string.c_str()?
It is entirely unclear what you mean by that, but if you are suggesting using the pointer returned by std::string.c_str() as the buffer for sprintf(); don't! c_str() returns a const char*, modifying the string through such a pointer is undefined. That however is a different problem, and not at all related to inserting a NUL into a string.
What are the best ways to avoid this caveat (sanitize input?)
I am struggling to think of a circumstance where you could "accidentally" write such code, so why would you need to guard against it!? Do you have a particular circumstance in mind? Even though I find it implausible, and probably unnecessary, what is so hard about:
if( c != 0 )
{
printf( "%c", c ) ;
}
or perhaps more usefully (since there are other characters you might want to avoid in the output)
if( isgraph(c) || isspace(c) )
{
printf( "%c", c ) ;
}
which will output only visible characters and whitespace (space, '\t','\f','\v','\n','\r').
Note that you might also consider isprint() rather than isgraph(c) || isspace(c), but that excludes '\t','\f','\v','\n' and '\r'
printf() and sprintf() will continue past a '\0' character inserted with %c, because their output is defined in terms of the content of the format string, and %c does not denote the end of the format string.
This includes their count; thus:
sprintf(x, "A%cB", '\0')
must always return 3 (although strlen(x) afterwards would return 1).