pointers and increment operator - c++

I have the following simple code:
#include<iostream>
const char str[]={'C','+','+'};
int main()
{
const char *cp=str;
std::cout<<*str<<std::endl;
while (*cp++>0)
std::cout<<*cp;
}
can't understand why it prints
C
++
Shouldn't the postfix increment operator evaluate the expression but return the value unchanged? (I double checked precedence of increment, dereference and relational operators and it should work)

This line prints: C and the carriage return.
std::cout<<*str<<std::endl;
Then you are looping on following characters but there is no ending character (c.f. buffer overflow)
Here's the code fixed up.
#include <iostream>
const char str[]={'C','+','+', '\0'};
int main()
{
const char* cp = str;
std::cout<< *str << std::endl;
while (*cp++ > 0)
std::cout << *cp;
return 0;
}
This code is even simpler if you want to display "C++"
#include <iostream>
const char str[]={'C','+','+', '\0'};
int main()
{
const char* cp = str;
while (*cp > 0)
std::cout << *cp++;
std::cout << std::endl;
return 0;
}

Your problem is here:
while (*cp++>0)
std::cout<<*cp;
The postfix operator increments the value after the expression it was used in is evaluated. There are two different expressions referring to cp in this while statement though: the first one tests and increments, but the second one prints. Since the second one is a separate statement, the increment has already happened by then.
i.e. you currently test against 'C', then increment regardless of the result, then print '+' through the now-incremented pointer. If it were iterating a string literal, this code would also increment once after reaching the 0, although you don't see a result of that because it skips the print (because it iterates over a hand-created array with no null terminator, what it actually does here is fall off the end and exhibit undefined behaviour; this is a big error in itself, you can't rely on there being a zero at the end if one wasn't put there).

shouldn't the postfix increment operator evaluate the expression but return the value unchanged?
No, postfix operator returns the value of operand first then, only it is increased by 1.
here,
while (*cp++>0)
std::cout<<*cp;
by the time it reaches std::cout<<*cp; the value of cp is incremented, and hence the result ++.

In the line
while(*cp++>0)
The post increment operator is executed, after the evaluation happens.
i.e. cp points to C during the first evaluation of the while condition.
So,
*cp => 'C' which is greater than 0.
Before moving to the next line (Inside the loop), the post increment operator gets executed, making cp point to the first +
After + is printed, the while condition is executed again, this time *cp returns '+'. since '+' > 0, the control enters the loop for the second time.
Before entering the loop, the post increment operator executes again, making cp point to the second '+', which is printed.
Now, while condition is executed for the third time. Here, *cp returns +. Thus, control enters the loop again.
Before entering the loop, post increment executes again. This time, it makes cp point to the next character, which is \0.
The \0 is printed, which makes no difference in this code. Then, when the while condition executes again, *cp returns \0, which is not > 0. So, the condition fails.
Edit: Saw in the comments that you wanted to print the entire String in the same line. Change the loop to:
while(*cp > 0)
std:cout<<*cp++;

Related

Explanation of the Function that Defines strlen

I'm learning about pointers in c++.
I have researched and found the manual function that defines strlen to be something like this.
int strlen(const char *a){
const char *b;
for (b=a;*b;++b);
return b-a;
}
Would anyone be able to explain this block of code in plain english? In particular, why is *b set as the terminating condition in the for loop?
This is not an answer to homework. It's just a question that arose while I was researching. Thanks.
This is a particularly terse piece of C code, with a for loop that does not have a body.
The idea is to set pointer b to the beginning of the string a, and keep advancing it until you hit character '\0', which indicates the end of the stirng (i.e. serves as null terminator). Nothing else needs to be done in that loop, hence its body is empty.
Once the loop is over, subtracting a from b yields the number of characters between the initial character of the string and its null terminator, i.e. the length of the string.
Here is a more readable way to write the same loop:
for (b=a ; *b != '\0' ; ++b) // Use explicit comparison to zero
; // Put semicolon on a separate line
When C expression is used in a statement that requires a logical expression, an implicit comparison to zero is applied. Hence, *b != '\0' is the same as *b.
In both C and C++ strings are really called null terminated byte strings. That null terminator is equal to zero. And in both C and C++ the value zero is equivalent to false.
What the loop does is to iterate until the "current character" (pointed to by b) becomes equal to the terminator.

C ++ While statement and string handling confusion?

const char ca[] = {'h','e','l','l','o'};
const char *cp = ca;
// printf("%s\n",cp);
while(*cp) {
cout << *cp << endl;
++cp;
}
first it print:
h
e
l
l
o
// end
then I uncomment "printf" statement:
hello%s
h
e
l
l
o
%
s
// end
Why are the results so different?,how does the "While" condition exit(the exact value of *cp)?
C++ primer 5th page 110
while(*cp)
This loop condition is looping until it finds a NUL character. It's equivalent to:
while(*cp != '\0')
ca isn't NUL-terminated, so the loop runs off the end of the array and invokes undefined behavior. Undefined behavior means anything can happen.†
To fix this, add a NUL terminator with
const char ca[] = {'h','e','l','l','o','\0'};
or, equivalently,
const char ca[] = "hello";
† It appears that in the first case the loop ends right away because there happens to be a NUL byte in memory following the 'o'. But in the second case the "%s" string happens to be adjacent to ca and so that gets printed, too. "%s" is a proper NUL-terminated, so the loop ends after printing it.
The fact that an innocuous printf() call can change the behavior of an unrelated loop is an example of how unpredictable undefined behavior can be. It isn't always so benign. It could make your program crash. It could even make it continue working for a while and then misbehave later in a completely baffling way. Don't rely on undefined behavior behaving predictably.

C++ toupper Syntax

I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."

Recursive Reverse Function

I was wondering whether someone could explain how this small code snippet works.
void reverse(char *s)
{
if(*s)
reverse(s+1);
else
return;
cout << *s;
}
When you call this function in main, it supposedly prints out the reverse of any string put in (ie. hello would cout as olleh) but I don't understand how. As I understand it, the if statement increases the value of s until it reaches the end of the string, printing out each value in the string as it goes. Why doesn't that just re-print the string. Clearly I am missing something. (Sorry btw, I am new to C++ and recursive functions)
Consider how the "hello" string is stored in memory: let's say the address of its 'h' character happens to be 0xC000. Then the rest of the string would be stored as follows:
0xC000 'h'
0xC001 'e'
0xC002 'l'
0xC003 'l'
0xC004 'o'
0xC005 '\0'
Now consider a series of invocations of reverse: the initial invocation passes 0xC000; the call of reverse from inside the reverse passes s+1, so the next level gets 0xC001; the next one gets 0xC002, and so on.
Note that each level calls the next level, until the level that sees '\0'. Before we get to zero, the stack is "loaded" like this:
reverse(0xC004) // the last invocation before we hit '\0'
reverse(0xC003)
reverse(0xC002)
reverse(0xC001)
reverse(0xC000) // the earliest invocation
Now when the top invocation calls reverse(0xC005), the check for *s fails, and the function returns right away without printing anything. At this point the stack starts "unwinding", printing whatever is pointed to by its s argument:
0xC004 -> prints 'o', then returns to the previous level
0xC003 -> prints 'l', then returns to the previous level
0xC002 -> prints 'l', then returns to the previous level
0xC001 -> prints 'e', then returns to the previous level
0xC000 -> prints 'h', then returns for good.
That's how the reverse of the original "hello" string gets printed.
Try visualizing how the call stack builds up. Each recursive call creates another stack frame with a copy of s incremented by one. When the exit condition occurs, the stack starts unwinding and cout statement gets called for each frame. Because of LIFO principle, the string is printed in reverse.
Let's consider a string of length 3. reverse(i) is short for the function called on the i-th index of the string (well technically it's the char pointer + i, but that explanation requires a little more advanced understanding).
It's also useful to note (as Robert pointed out) that *s returns false if s points to \0, which indicates the end of the string, so, in this case, it will simply return.
Here's what happens:
reverse(0)
calls reverse(1)
calls reverse(2)
calls reverse(3)
end of string - return
prints 2
prints 1
prints 0
Lets look at a simple example, assume s = "the"
We would have:
rev(t) ->increment pointer
rev(h)->increment pointer
rev(e) ->increment pointer
rev('\0') (this would just return)
Then we would be back in the body of rev(e) which would print e
Then back to rev(h) which would print h
Then finally back to rev(t) which would print t
In that order then we would have: eht ("the" in reverse)
Perhaps it would be clearer to write the function as:
void reverse(const char *s)
{
if(*s)
{
reverse(s+1);
std::cout << *s;
}
}
i.e each not null characther will be printed after calling the next one (by the calling to reverse)
PD: Use "const char*" instead of "char*" if you will not modify the given string.

'\0' related issue

Looking at this loop that copies one c-string to another:
void strcpyr(char *s, char *t)
{
while(*s++=*t++)// Why does this work?
;
}
Why do we not check for the '\0' character in the while loop, like this?
while((*s++=*r++)!='\0')..
How does the first loop terminate?
The statement *s++=*t++ not only assigns the next character from t to s but also returns the current value of *t as the result of the expression. The while loop terminates on any false value, including '\0'.
Think of it this way. If you did:
char c = *s++ = *t++;
in addition to copying a char from *t to *s and incrementing both, it would also set c to the current value of *t.
When we hit the '\0' in the string initially pointed to by t, the *s++=*t++, which does the assignment, also returns the value that's assigned to the position pointed to by s, or '\0', which evaluates to false and terminates the loop.
In your second example, you explicitly rely on the fact that the assignment returns the assigned character, while the first example implicitly uses this fact (and the fact that the 0 character (also written '\0') is considered to be false, while all other characters evaluate to true, so the expression c != '\0' will yield the same result as c.
The loop is going to terminate because '\0' is effectively 0, and the what the "while" is evaluating is not a the result of an equality test (==), but the right-value of the assignment expression.
The reason we are not explicitly checking for zero is that in C 0 is false.
Therefore the loop
while(*s++=*t++)
;
will terminate when the character pointed to by t is 0.
-Adam
I think you mean to write this:
void strcpyr(char *s, char *t) {
while (*s++ = *t++);
}
The loop terminates when the value pointed to by "t" is zero. For C (and C++) loops and conditionals, any integer that is non-zero is true.
The while loop is testing the result of the assignment.
The result of an assignment is the value assigned into the left-hand side of the statement. On the last iteration, when *t == '\0', the '\0' is assigned into s, which becomes the value the while loop considers before deciding it is time to quit.
In C, a=b is actually an expression which is evaluated as 'b'. It is easier to write:
if(a=b) {
//some block
}
then:
a=b;
if(a!=0) {
//some block
}
In C language within if, while, for statement the check that is made is: is expression not zero?