Pointer increment and decrement - c++

I was solving a question my teacher gave me and hit a little snag.
I am supposed to give the output of the following code:(It's written in Turbo C++)
#include<iostream.h>
void main()
{
char *p = "School";
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
}
The output the program gives is:
T,chool
ijool,j,i
I got the part where the pointer itself increments and then increments the value which the pointer points to. But i don't get the part where the string prints out ijool
Can someone help me out?

The program you showed is non-standard and ill-formed (and should not compile).
"Small" problems:
The proper header for input/output streams in C++ is <iostream>, not <iostream.h>
main() returns an int, not a void.
cout and endl cannot be used without a using namespace std; at the beginning of the file, or better: use std::cout and std::endl.
"Core" problems:
char* p = "School"; is a pointer to string litteral. This conversion is valid in C++03 and deprecated in C++11. Aside from that, normally string litterals are read only, and attempts to modify them often result in segfaults (and modifying a string litteral is undefined behvior by the standard). So, you have undefined behavior everytime you use p, because you modify what it points to, which is the string litteral.
More subtle (and the practical explanation): you are modifying p several times in the line std::cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<std::endl;. It is undefined behavior. The order used for the operations on p is not defined, here it seems the compiler starts from the right. You can see sequence points, sequence before/after for a better explanation.
You might be interested with the live code here, which is more like what you seemed to expect from your program.

Let's assume you correct:
the header to <iostream> - there is no iostream.h header
your uses of cout and endl with std::cout and std::endl respectively
the return type of main to int
Okay,
char *p = "School";
The string literal "School" is of type "array of 7 const char." The conversion to char* was deprecated in C++03. In C++11, this is invalid.
c=++(*(p++));
Here we hit undefined behaviour. As I said before, the chars in a string literal are const. You simply can't modify them. The prefix ++ here will attempt to modify the S character in the string literal.
So from this point onwards, there's no use making conjectures about what should happen. You have undefined behaviour. Anything can happen.

Even if the preceding lines were legal, this line is also undefined behavior, which means that you cannot accurately predict what the output will be:
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
Notice how it modifies the value of p multiple times on that line (really between sequence points)? That's not allowed. At best you can say "on this compiler with this run-time library and this environment at this moment of execution I observed the following behavior", but because it is undefined behavior you can't count on it to do the same thing every time you run the program, or even if the same code is encountered multiple times within the same run of the program.

There are at least three problems with this code (and maybe more; I'm not a C++ expert).
The first problem is that string constants like should not be modified as they can be placed in read-only parts of the program memory that the OS maps directly to the exe file on disk (the OS may share them between several running instances of that same program for example, or avoid those parts of memory needing to be written to the swap file when RAM is low, as it knows it can get the original from the exe). The example crashes on my compiler, for example. To modify the string you should allocate a modifiable duplicate of the string, such as with strdup.
The second problem is it's using cout and endl from the std namespace without declaring that. You should prefix their accesses with std:: or add a using namespace std; declaration.
The third problem is that the order in which the operations on the second cout line happen is undefined behavior, leading to the apparently mysterious change of the string between the time it was displayed at the end of the first cout line and the next line.
Since this code is not intended to do anything in particular, there are different, valid ways you could fix it. This will probably run:
#include <iostream>
#include <string.h>
#include <stdlib.h>
using namespace std;
int main()
{
char *string = strdup("School");
char *p = string;
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
free(string);
}
(On my compiler this outputs: T,chool, diool,i,d.)
It still has undefined behavior though. To fix that, rework the second cout line as follows:
cout << p << ",";
cout << ++(*(p--)) << ",";
cout << ++(*(p++)) << endl;
That should give T,chool, chool,d,U (assuming a character set that has A to Z in order).

p++ moves the position of p from "School" to "chool". Before that, since it is p++, not ++p, it increments the value of the char. Now c = "T" from "S"
When you output p, you output the remainder of p, which we identified before as "chool".
Since it is best to learn from trial and error, run this code with a debugger. That is a great tool which will follow you forever. That will help for the second set of cout statements. If you need help with gdb or VS debugger, we can walk through it.

Related

Is character array size dynamic in C/CPP/C++?

My knowledge till now was that arrays in C and CPP/C++ have fixed sizes. However recently I encountered 2 pieces of code which seems to contradict this fact. I am attaching the pics here. Want to hear everyone's thoughts on how these are working. Also pasting the code and doubts here:
1.
#include <iostream>
#include <string.h>
using namespace std;
int main()
{
char str1[]="Good"; //size of str1 should be 5
char str2[]="Afternoon"; //size of str2 should be 10
cout<<"\nSize of str1 before the copy: "<<sizeof(str1);
cout<<"\nstr1: "<<str1;
strcpy(str1,str2); //copying str1 into str2
cout<<"\nSize of str1 after the copy: "<<sizeof(str1);
cout<<"\nstr1: "<<str1;
return 0;
}
your text
O/P:
Size of str1 before the copy: 5
str1: Good
Size of str1 after the copy: 5
str1: Afternoon
In first snippet I am using strcpy to copy char str2[] contents that is "Afternoon" into char str1[] whose size is 5 less than size of str2. So theoritically the line strcpy(str1,str2) should give error as size of str1 is less than size of str2 and fixed. But it executes, and more surprising is the fact that even after str1 contain the word "afternoon" the size is still the same.
2.
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char first_string[10]; // declaration of char array variable
char second_string[20]; // declaration of char array variable
int i; // integer variable declaration
cout<<"Enter the first string: ";
cin>>first_string;
cout<<"\nEnter the second string: ";
cin>>second_string;
for(i=0;first_string[i]!='\0';i++);
for(int j=0;second_string[j]!='\0';j++)
{
first_string[i]=second_string[j];
i++;
}
first_string[i]='\0';
cout<<"After concatenation, the string would look like: "<<first_string;
return 0;
}
O/P:
Enter the first string: good
Enter the second string: afternoon
After concatenation, the string would look like: goodafternoon
Here also even if I provide a string of length 20 as input to second_string[] it's still able to concatenate both the strings and put them in first_string[], even though the size of the concatenated string will be clearly greater than size of first_string[] which is 10.
I tried to assign a string of greater length to a string variable of smaller length. techincally it should not work but it worked anyway
There are two misunderstandings here
sizeof is the size of the array at compile time. It has nothing to do with the contents of the array. You can change the contents all you like and sizeof will still be the same. If you want the length of a string use the function strlen.
Most of the time when you break the rules of C++ it leads to undefined behaviour. Copying a string into an array that is too small to hold that string is one example of undefined behaviour.
You said
So theoritically the line strcpy(str1,str2) should give error as size
of str1 is less than size of str2 and fixed.
This is untrue. Undefined behaviour does not mean that there must be an error. It means exactly what it says, the behaviour of your program is undefined, anything could happen. That might mean an error message, or it might mean a crash, or it might mean that your program seems to work. The behaviour is undefined.
You aren't alone in thinking as you did. I reckon the purpose of sizeof and the nature of undefined behaviour are two of the commonest beginner misunderstandings.
And to answer the question in the title. The size of a character array is fixed in C++, nothing in your example contradicts that.
I've honestly never seen a C++ programmer write char stringname[20] = "string";, that just isn't the way you'd handle strings in C++⁰.
And neither would a C programmer use array notation, because well, it's just not common; you'd typically use arrays for things that aren't strings, even if the type of a "string literal" is actually char[length + 1].
Your access beyond the end of an array is simply a bug. It is undefined behaviour. A buffer overflow. A static code analyzer, quite possibly even a compiler, would tell you that this is a mortal sin. The str* functions know literally nothing about the size of your array, they only see a pointer to the first element, and your array literally knows nothing about the length of the string it contains, which is given by the terminating zero character's position. You're mixing up two things here!
In C++, you'd definitely use the std::string class to read from cin, exactly to avoid the problem with buffer overflows.
So, honestly: If you're a C++ beginner, maybe try to ignore C strings for now. It's not a C++ way of dealing with string data other than fixed string literals (i.e., things between "" in your source code), and the C way of string handling is literally still the dominant cause for remote-exploitable bugs in software, far as I can tell. C++ is not C, and, honestly, when it comes to handling strings, for the better. Including both <string.h> and <iostreams> is a pretty reliable indication of a programming beginner who has access to bad guides that treat C++ as extended C. But that's simply not true; it's a very different programming language with some far-reaching C compatibility, but you would, and should, not mix these two languages – as a beginner, it's hard enough to learn one¹.
⁰ Technically speaking, it even feels wrong; a string literal in C++ is a const char pointer, whereas it's just a char pointer in C. C and C++ are not the same language.
¹If you feel like you're explaining C++ to people, and sometimes feel overwhelmed with making a good explanation for things to people who are not expert C programmers already, Kate Gregory made a nice talk why teaching C to teach C++ is a really bad idea, which I agree to, even if she overstresses a few points.

strange behavior of std::string assign,clear and operator[]

I am observing some strange behavior of string operation.
Ex :
int main()
{
std::string name("ABCDEFGHIJ");
std::cout << "Hello, " << name << "!\n";
name.clear();
std::cout << "Hello, " << name << "!\n";
name.assign("ABCDEF",6);
std::cout << "Hello, " << name << "!\n";
std::cout << "Hello, " << name[8] << "!\n";
}
Output:
Hello, ABCDEFGHIJ!
Hello, !
Hello, ABCDEF!
Hello, I!
string::clear is actually not clearing because I am able to access the data even after clear. As per documentation when we are accessing something out of bound the result is undefined. But here I am getting the same result every time.
Can somebody explains how it works at memory level when we call clear or opeartor[].
Welcome to C++'s amazing attraction called "undefined behavior".
When name contains a six-character string, "ABCDEF", name[8] attempts to access a nonexistent member of the string, which is undefined behavior.
Which means that the result of this operation are completely meaningless.
The C++ standard does not define the result of accessing a nonexistent member character of the string; hence the undefined behavior. The potential results of this operation can be:
Some previous value that was in the string, at the given position.
Some garbage, random character.
Your program crashes.
Anything else.
A result that's different every time you execute the program, selected from options 1 through 4.
name.assign("ABCDEF",6);
Now the string has length 6. So you may legally only access elements 0 through 5.
std::cout << "Hello, " << name[8] << "!\n";
Therefore this is Undefined Behaviour. The compiler is free to do whatever the hell it pleases. Not just with the statement, but with the whole program, even the preceding lines!
At this time, it returned the character that used to be at that position earlier. It could have returned anything else, it could have crashed, it could have skipped that statement altogether, it could have skipped the assignment and many other funny things (up to and including making daemons fly out of your nose!).
And I am saying that because all that behaviour (except the daemons) can be actually observed in the wild in various circumstances.
As others said, accessing an std::string outside it's logical boundaries (i.e. [0, size()], notice that size() is included) is undefined behavior, so the compiler can make anything happen.
Now, the particular flavor of UB you are seeing is nothing particularly unexpected.
clear() just zeroes the logical length of the string, but the memory that it used is retained (it's actually required by the standard, and quite some code would work way slower without this behavior).
Given that there's no good reason to waste time in zeroing out the old data, by accessing the string out of bounds you are seeing what was at that index previously.
This may change if you e.g. call the shrink_to_fit() method after clear(), which asks to the string to free all the extra memory it's keeping.
I'd like to add to the other answers that you can use std::string::at instead of using the operator[].
std::string::at does boundary checking and will throw a std::out_of_range when you try to access an element that is out of range.
[I ran your code through a debugger. Take note of the capacity of the string. It is still 15. "assign" did not change the capacity. SO you won't get "garbage" value as everyone is saying. You're getting the exact same data which is stored in the same location. As stated the string is just a pointer to a memory address. It will go over x bytes to access the element. name[8] is a constant value it will go to the exact same memory location.
Here is a picture of the string in debugger

C++ toupper Syntax

I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."

Able to Access Elements with Index Greater than Array Length

The following code seems to be running when it shouldn't. In this example:
#include <iostream>
using namespace std;
int main()
{
char data[1];
cout<<"Enter data: ";
cin>>data;
cout<<data[2]<<endl;
}
Entering a string with a length greater than 1 (e.g., "Hello"), will produce output as if the array were large enough to hold it (e.g., "l"). Should this not be throwing an error when it tried to store a value that was longer than the array or when it tried to retrieve a value with an index greater than the array length?
The following code seems to be running when it shouldn't.
It is not about "should" or "shouldn't". It is about "may" or "may not".
That is, your program may run, or it may not.
It is because your program invokes undefined behavior. Accessing an array element beyond the array-length invokes undefined behavior which means anything could happen.
The proper way to write your code is to use std::string as:
#include <iostream>
#include <string>
//using namespace std; DONT WRITE THIS HERE
int main()
{
std::string data;
std::cout<<"Enter data: ";
std::cin>>data; //read the entire input string, no matter how long it is!
std::cout<<data<<std::endl; //print the entire string
if ( data.size() > 2 ) //check if data has atleast 3 characters
{
std::cout << data[2] << std::endl; //print 3rd character
}
}
It can crash under different parameters in compilation or compiled on other machine, because running of that code giving undefined result according to documentaton.
It is not safe to be doing this. What it is doing is writing over the memory that happens to lie after the buffer. Afterwards, it is then reading it back out to you.
This is only working because your cin and cout operations don't say: This is a pointer to one char, I will only write one char. Instead it says: enough space is allocated for me to write to. The cin and cout operations keep reading data until they hit the null terminator \0.
To fix this, you can replace this with:
std::string data;
C++ will let you make big memory mistakes.
Some 'rules' that will save you most of the time:
1:Don't use char[]. Instead use string.
2:Don't use pointers to pass or return argument. Pass by reference, return by value.
3:Don't use arrays (e.g. int[]). Use vectors. You still have to check your own bounds.
With just those three you'll be writing some-what "safe" code and non-C-like code.

C/C++: Optimization of pointers to string constants

Have a look at this code:
#include <iostream>
using namespace std;
int main()
{
const char* str0 = "Watchmen";
const char* str1 = "Watchmen";
char* str2 = "Watchmen";
char* str3 = "Watchmen";
cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
cerr << static_cast<void*>( str2 ) << endl;
cerr << static_cast<void*>( str3 ) << endl;
return 0;
}
Which produces an output like this:
0x443000
0x443000
0x443000
0x443000
This was on the g++ compiler running under Cygwin. The pointers all point to the same location even with no optimization turned on (-O0).
Does the compiler always optimize so much that it searches all the string constants to see if they are equal? Can this behaviour be relied on?
It can't be relied on, it is an optimization which is not a part of any standard.
I'd changed corresponding lines of your code to:
const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";
The output for the -O0 optimization level is:
0x8048830
0x8048839
0x8048841
0x8048848
But for the -O1 it's:
0x80487c0
0x80487c1
0x80487c2
0x80487c3
As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.
It's an extremely easy optimization, probably so much so that most compiler writers don't even consider it much of an optimization at all. Setting the optimization flag to the lowest level doesn't mean "Be completely naive," after all.
Compilers will vary in how aggressive they are at merging duplicate string literals. They might limit themselves to a single subroutine — put those four declarations in different functions instead of a single function, and you might see different results. Others might do an entire compilation unit. Others might rely on the linker to do further merging among multiple compilation units.
You can't rely on this behavior, unless your particular compiler's documentation says you can. The language itself makes no demands in this regard. I'd be wary about relying on it in my own code, even if portability weren't a concern, because behavior is liable to change even between different versions of a single vendor's compiler.
You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.
If you find that you need to rely on that, then be safe and recode as follows:
char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;
You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.
It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.
He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:
if (3 = x) break;
Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.
Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.
I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.
All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:
char *s = "literal";
You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)
No, it can't be relied on, but storing read-only string constants in a pool is a pretty easy and effective optimization. It's just a matter of storing an alphabetical list of strings, and then outputting them into the object file at the end. Think of how many "\n" or "" constants are in an average code base.
If a compiler wanted to get extra fancy, it could re-use suffixes too: "\n" can be represented by pointing to the last character of "Hello\n". But that likely comes with very little benifit for a significant increase in complexity.
Anyway, I don't believe the standard says anything about where anything is stored really. This is going to be a very implementation-specific thing. If you put two of those declarations in a separate .cpp file, then things will likely change too (unless your compiler does significant linking work.)