This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How does “while(*s++ = *t++)” work?
I had the following question during an interview. Can someone please explain it to me?
void question( char *s, char *t)
{
while (*s++ = *t++);
}
It introduces a massive security vulnerability into your program. Do not write, or use, code like this under any circumstances.
If we break the code down, we get:
*t++ reads the character pointed to by t, and increments t; the expression's value is the character that was read.
*s++ = expression writes that character to where s points, and increments s; the expression's value is the character that was written.
while (expression); keeps looping as long as the expression's value is non-zero; in this case, until we wrote a character with the value zero.
So the function keeps copying characters from t to s until it reaches a zero-valued character. There is no way to tell whether s points to a large enough array to hold these, so in general it will write beyond the end of the array and cause undefined behaviour; anything from subtle behaviour with no unwanted effects, to a crash, to the execution of malicious code.
You can only call this function if you know in advance (an upper bound for) how many characters will be copied; if you know that, then there are (usually) more efficient ways to copy the data than to check the value of each. Therefore, you should (almost) never use this function, or the C library function (strcpy) that it approximates.
This use of a zero-valued character to terminate a string is a common idiom in C; in C++ it is usually more convenient to use the std::string class to represent strings instead. In that case, the equivalent code would be simply s = t, which would manage the strings' memory safely.
Copies the string, pointer by t to the memory, pointed by s.
operator= will return the assigned value. t is supposed to point to a NULL-terminated string and s should point to memory, large enough to store that string.
So, the while loop will stop when \0 is hit, which is the end of the string, pointed by t. During this while loop, all chars (different from \0) in t will be copied into s.
Expanded a little, it's the same as:
while( *t != '\0' ) // while the current char is not NULL
{
*s = *t; // copy it into s
++s; // increment s, to point to the next byte
++t; // increment t, to point to the next char, that will be copied
}
*s = *t; // copy the last char of t - the '\0'
It copies null-terminated string t into s. Semantics as strcpy.
Related
I'm learning about pointers in c++.
I have researched and found the manual function that defines strlen to be something like this.
int strlen(const char *a){
const char *b;
for (b=a;*b;++b);
return b-a;
}
Would anyone be able to explain this block of code in plain english? In particular, why is *b set as the terminating condition in the for loop?
This is not an answer to homework. It's just a question that arose while I was researching. Thanks.
This is a particularly terse piece of C code, with a for loop that does not have a body.
The idea is to set pointer b to the beginning of the string a, and keep advancing it until you hit character '\0', which indicates the end of the stirng (i.e. serves as null terminator). Nothing else needs to be done in that loop, hence its body is empty.
Once the loop is over, subtracting a from b yields the number of characters between the initial character of the string and its null terminator, i.e. the length of the string.
Here is a more readable way to write the same loop:
for (b=a ; *b != '\0' ; ++b) // Use explicit comparison to zero
; // Put semicolon on a separate line
When C expression is used in a statement that requires a logical expression, an implicit comparison to zero is applied. Hence, *b != '\0' is the same as *b.
In both C and C++ strings are really called null terminated byte strings. That null terminator is equal to zero. And in both C and C++ the value zero is equivalent to false.
What the loop does is to iterate until the "current character" (pointed to by b) becomes equal to the terminator.
When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.
I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.
First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.
The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).
Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).
input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.
I'm working on writing my own string class and am having trouble with overloading the += operator for a MyString being +='d to a char. I figured this would work but with no luck. Here's the implementation I tried. Any assistance on getting it to work correctly will be much appreciated.
MyString& MyString::operator +=(char c)
{
char derp[1] = {c};
strcat(value, derp);
return *this;
}
This is not going to work for several reasons:
derp is not a null-terminated array, which it has to be if you pass it as a parameter to strcat
There is no check that the buffer that value represents can actually hold more data; neither is there a facility to make sure that the buffer is always null-terminated (which again it needs to be because you are passing it to strcat)
Even if you correct the above, your string class will never be able to include the character \0 as part of a string value because that will be mistaken for a null terminator; in technical terms, your string class would not be "binary safe"; to fix this you need to drop strcat and similar functions entirely and switch to memcpy and friends
Apart from the above, overloading operator += like this allows for code such as
MyString str("foo");
foo += 80; // this compiles, but should it?
Finally, the str*** family of functions is going to get needlessly slower as your strings are getting larger (because they have to scan the string from the beginning each time in order to determine where it ends). Keeping your own length variable and switching to mem*** is going to fix this issue as well.
The use of strcat is incorrect as it requires a null terminated source string and is being provided with a buffer with no null terminator.
value will only be capable of holding a finite number of characters, and there is no attempt to increase the size of value.
Assuming value is large enough and you retain the length of the string inside your instance, I'd say:
value[size] = c;
value[size+1] = '\0';
Sounds easy, but I've got a bug and I'm not sure what's causing it?
nopunccount = 0;
char *ra = new char[sizeof(npa)];
while (nopunccount <= strlen(npa)) {
ra[nopunccount] = npa[strlen(npa) - nopunccount];
nopunccount++;
}
ra never gets a value into it and I have verified that npa has char values to provide within the nopunccount range.
Any help is appreciated // :)
nopunccountstarts as 0, so in the first iteration of the loop the character assigned to ra[0] is npa[strlen(npa)]. This is the terminating '\0' of that string. So the resulting string in ra starts with a '\0' and is therefore considered to be ending at that first byte by the usual string functions.
What does the declaration of npa look like? If it is a pointer, sizeof(npa) will be the size of a pointer, rather than the allocated size. If these are zero-terminated strings (also known as "C strings"), then use strlen, not sizeof. If these aren't strings, you need to track how much you allocated in a separate variable.
I have some other critiques of this code, possibly unrelated to your problem.
while (nopunccount <= strlen(npa)) {
strlen is an O(n) operation. This code will traverse the string npa in every loop iteration. It's best to only compute the length once.
ra[nopunccount] = npa[strlen(npa) - nopunccount];
Same problem here.