How does C++ interpret a cout with a '+' in it? - c++

I've been moving back and forth with Java/C++ so I messed up with my console output and accidentally wrote lines like:
cout << "num" + numSamples << endl;
cout << "max" + maxSampleValue << endl;
Each of which gave me bits and pieces of other strings I had in my program. I realize my mistake now, but what was C++ interpreting those lines as so that they output different parts of different strings in my program?

There is a simple reason for this:
"num" and "max" are string literal. Their type is const char *.
Assuming numSamples is an integer, what you are doing is is pointer arithmetic.
You are basically printing a string that points to "num" + numSamples bytes.
If you did cout << "num" + 1 << endl this would print "um".
You probably figured it out, but the correct way to do this is: cout << "num" << numSamples << endl;
Also, you asked:
But what was C++ interpreting those lines as so that they output
different parts of different strings in my program?
As stated before, "num" is a string literal. Generally string literals sits in the same place in the binary program: .rodata. All other string literals sits in this region, so when you are advancing your pointer by a certain amount of bytes, you will likely points to some other string literals, thus printing (part of) them.

It is just pointer arithmetic. For instance,
cout << "num" + 1 << endl;
would print a string staring from ((address of)"num" + 1), i.e. "um".
If the value we add exceeds the length of the string, then we have undefined behavior.

Keep in mind that although iostreams overload << (and >>) to do I/O, these were originally defined as bit-shift operators. When the compiler's parsing the expression, it's basically seeing "operand, operator, operand", then checking whether the operands are of types to which that operator can be applied.
In this case, we have a string literal and (apparently) some sort of integer, and the compiler knows how to do math on those (after converting the string literal to a pointer).
From the compiler's viewpoint, however, this isn't a whole lot different from something like:
int a, b=1, c=2, d = 3;
a = b << c + d;
The difference is that with operands of type int, the meaning is fairly obvious: it's doing addition and bit-shifting. With iostreams, the meaning attached to << and >> change, but the syntax allowed for them in an expression is unchanged.
From the compiler's "viewpoint" the only question is whether the operands to + are of types for which + is allowed--in this case, you have pointer to char + integer, so that's allowed. The result is a pointer to char, and it has an overload of << that takes a left-hand operand of type ostream and a right-hand operator of type pointer to char, so the expression as a whole is fine (as far as it cares).

"num" type is char const [4]. If numSamples is integer, "num" + numSamples type is const char *. So you call operator << cor std::cout, that overloaded for const char * and prints string that starts from address addres("num") + numSamples in pointer arithmetic.
Try this:
cout << "num" + std::to_string(numSamples) << endl;
cout << "max" + std::to_string(maxSampleValue) << endl;
std::to_string() function you can find in <string>

if you look at operator_precedence, you will see that + gets evaluated before <<, which leaves this expression to be evaluated before being passed to the operator <<:
"num" + numSamples
Now "num" is going to be a static const char * and I'm assuming numSamples is an integral type. Since the left-hand side of the + is a pointer type and you are adding to it, that's pointer arithmetic. The cout now gets a pointer to a location in memory that is numSamples or maxSamplueValue more than the location of "num" or "max". Most likely all of your static strings were lined up in in the same region of memory which is why you saw them rather than random gibberish.

Related

C++ Comparison of String Literals

I'm a c++ newbie (just oldschool c). My son asked for help with this and I'm unable to explain it. If he had asked me "how do I compare strings" I would have told him to use strcmp(), but that isn't what is confusing me. Here is what he asked:
int main()
{
cout << ("A"< "Z");
}
will print 1
int main()
{
cout << ("Z"< "A");
}
will also print 1, but
int main()
{
cout << ("Z"< "A");
cout << ("A"< "Z");
}
will then print 10. Individually both cout statements print 1, but executed in a row I get a different answer?
You are comparing memory addresses. Apparently your compiler places the string literals in memory in the order it encounters them, so the first is "lesser" than the second.
Since in the first snippet it sees "A" first and "Z" second, "A" is lesser. Since it sees "Z" first in the second, "Z" is lesser. In the last snippet, it already has literals "A" and "Z" placed when the second command rolls around.
String literals have static storage duration. In all these comparisons there are compared addresses of memory allocated by the compiler for string literals. It seems that the first string literal that is encountered by the compiler is stored in memory with a lower address compared with the next encountered string literal.
Thus in this program
int main()
{
cout << ("Z"< "A");
cout << ("A"< "Z");
}
string literal "Z" was alllocated with a lower address than string literal "A" because it was found first by the compiler.
Take into account that comparison
cout << ("A"< "A");
can give different results depending on the options of the compiler because the compiler may either allocate two extents of memory for the string literals or use only one copy of the string literals that are the same.
From the C++ Standard (2.14.5 String literals)
12 Whether all string literals are distinct (that is, are stored in
nonoverlapping objects) is implementation defined. The effect of
attempting to modify a string literal is undefined.
The same is valid for C.
In the statement:
cout << ("A"< "Z");
You have created 2 string literals: "A" and "Z". These are of type const char * which is a pointer to a null terminated array of characters. The comparison here is comparing the pointers and not the values that they point to. It's this comparing of memory addresses here which is what gives you the compiler warning. The result of the comparison is going to be determined by where the compiler allocated the memory to which is going to be somewhat arbitrary from compiler to compiler. In this case it looks like the first literal found is getting assigned the first memory address by your compiler.
Just like in C to compare these string literals properly you need to use strcmp which will do a value comparison.
However when you do something the more idiomatic c++ way by doing:
cout << (std::string("A") < std::string("Z"));
Then you get the proper comparison of the values as that comparison operator is defined for std::string.
If you want to compare actual C++ strings, you need to declare C++ strings:
int main()
{
const std::string a("A");
const std::string z("Z");
cout << (z < a) << endl; // false
cout << (a < z) << endl; // true
}
In C++, the results are unspecified. I will be using N3337 for C++11.
First, we have to look at what the type of a string literal is.
§2.14.5
9 Ordinary string literals and UTF-8 string literals are also
referred to as narrow string literals. A narrow string literal has
type "array of n const char", where n is the size of the string
as defined below, and has static storage duration (3.7).
Arrays are colloquially said to decay to pointers.
§4.2
1 An lvalue or rvalue of type "array of N T" or "array of unknown
bound of T" can be converted to a prvalue of type "pointer to T".
The result is a pointer to the first element of the array.
Since your string literals both contain one character, they're the same type (char[2], including the null character.)
Therefore the following paragraph applies:
§5.9
2 [...]
Pointers to objects or functions of the same type (after pointer
conversions) can be compared, with a result defined as follows:
[...]
— If two pointers p and q of the same type point to different
objects that are not members of the same object or elements of the
same array or to different functions, or if only one of them is null,
the results of p<q, p>q, p<=q, and p>=q are unspecified.
Unspecified means that the behavior depends on the implementation. We can see that GCC gives a warning about this:
warning: comparison with string literal results in unspecified behaviour [-Waddress]
std::cout << ("Z" < "A");
The behavior may change across compilers or compiler settings but in practice for what happens, see Wintermute's answer.
You are comparing memory addresses. The example that follow explains how to compare 2 strings:
#include "stdafx.h"
#include <iostream>
#include <cstring> //prototype for strcmp()
int _tmain(int argc, _TCHAR* argv[])
{
using namespace std;
cout << strcmp("A", "Z"); // will print -1
cout << strcmp("Z", "A"); // will print 1
return 0;
}
The string constants ("A" and "Z") in C++ are represented by the C concept - array of characters where the last character is '\0'. Such constants have to be compared with strcmp() type of function.
If you would like to use the C++ std::string comparison you have to explicitly state it:
cout << (std::string( "A") < "Z");
A String is representing a pointer to memory area. So you at first compare only memory addresses with such code
"Z"< "A"
comparing strings is done with functions. They depend on "what kind of string" you have. You have char array strings, but they mid also be objects. These objects have other comparision functions. For instance the CString in MFC has the Compare but also the CompareNoCase function.
For your strings you best use the strcmp. If you debug and step in you see what the function does: it compares every char of both strings and return an integer if the first difference occurs or zero if the same.
int result = strcmp("Z", "A");
Here you find some further sample code

Why does cout << &r give different output than cout << (void*)&r?

This might be a stupid question, but I'm new to C++ so I'm still fooling around with the basics. Testing pointers, I bumped into something that didn't produce the output I expected.
When I ran the following:
char r ('m');
cout << r << endl;
cout << &r << endl;
cout << (void*)&r << endl;
I expected this:
m
0042FC0F
0042FC0F
..but I got this:
m
m╠╠╠╠ôNh│hⁿB
0042FC0F
I was thinking that perhaps since r is of type char, cout would interpret &r as a char* and [for some reason] output the pointer value - the bytes comprising the address of r - as a series of chars, but then why would the first one would be m, the content of the address pointed to, rather than the char representation of the first byte of the pointer address.. It was as if cout interprets &r as r but instead of just outputting 'm', it goes on to output more chars - interpreted from the byte values of the subsequent 11 memory addresses.. Why? And why 11?
I'm using MSVC++ (Visual Studio 2013) on 64 bit Win7.
Postscript: I got a lot of correct answers here (as expected, given the trivial nature of the question). Since I can only accept one, I made it the first one I saw. But thanks, everyone.
So to summarize and expand on the instinctive theories mentioned in my question:
Yes, cout does interpret &r as char*, but since char* is a 'special thing' in C++ that essentially means a null terminated string (rather than a pointer [to a single char]), cout will attempt to print out that string by outputting chars (interpreted from the byte contents of the memory address of r onwards) until it encounters '\0'. Which explains the 11 extra characters (it just coincidentally took 11 more bytes to hit that NUL).
And for completeness - the same code, but with int instead of char, performs as expected:
int s (3);
cout << s << endl;
cout << &s << endl;
cout << (void*)&s << endl;
Produces:
3
002AF940
002AF940
A char * is a special thing in C++, inherited from C. It is, in most circumstances, a C-style string. It is supposed to point to an array of chars, terminated with a 0 (a NUL character, '\0').
So it tries to print this, following on in to the memory after the 'm', looking for a terminating '\0'. This makes it print some random garbage. This is known as Undefined Behaviour.
There is an operator<< overload specifically for char* strings. This outputs the null-terminated string, not the address. Since the pointer you're passing this overload isn't a null-terminated string, you also get Undefined Behavior when operator<< runs past the end of the buffer.
Conversely, the void* overload will print the address.
Because operator<< is overloaded based on the data type.
If you give it a char, it assumes you want that character.
If you give it a void*, it assumes you want an address.
However, if you give it a char*, it takes that as a C-style string and attempts to output it as such. Since the original intent of C++ was "C with classes", handling of C-style strings was a necessity.
The reason you get all the rubbish at the end is simply because, despite your assertion to the compiler, it isn't actually a C-style string. Specifically, it is not guaranteed to have a string-terminating NUL character at the end so the output routines will just output whatever happens to be in memory after it.
This may work (if there's a NUL there), it may print gibberish (if there's a NUL nearby), or it may fall over spectacularly (if there's no NUL before it gets to memory it cannot read). It's not something you should rely on.
Because there's an overload of operator<< which takes a const char pointer as it's second argument and prints out a string. The overload that takes a void pointer prints only the address.
A char * is often - usually even - a pointer to a C-style null-terminated string (or a string literal) and is treated as such by ostreams. A void * by contrast unambiguously indicates a pointer value is required.
The output operator (operator<<()) is overloaded for char const* and void const*. When passing a char* the overload for char const* is a better match and chosen. This overload expects a pointer to the start of a null terminated string. You give it a pointer to an individual char, i.e., you get undefined behavior.
If you want to try with a well-defined example you can use
char s[] = { 'm', 0 };
std::cout << s[0] << '\n';
std::cout << &s[0] << '\n';
std::cout << static_cast<void*>(&s[0]) << '\n';

Why doesn't a character array return a memory address when called directly?

I understand from here that the name of an array is the address of the first element in the array, so this makes sense to me:
int nbrs[] = {1,2};
cout << nbrs << endl; // Outputs: 0x28ac60
However, why is the entire C-string returned here and not the address of ltrs?
char ltrs[] = "foo";
cout << ltrs << endl; // Outputs: foo
Because iostreams have an overload for char * that prints out what the pointer refers to, up to the first byte that contains a \0.
If you want to print out the address, cast to void * first.
cout has operator<<() overloaded for char* arrays so that it outputs every element of the array until it reaches a null character rather than outputting the address of the pointer
cout, and generally, C++ streams, can handle C strings in a special way. cout operators <<, >> are overloaded to handle a number of different things, and this is one of them.

C++ Concatenation Oops

So I've been going back and forth from C++, C# and Java lately and well writing some C++ code I did something like this.
string LongString = "Long String";
char firstChar = LongString.at(0);
And then tried using a method that looks like this,
void MethodA(string str)
{
//some code
cout << str;
//some more code }
Here's how I implemented it.
MethodA("1. "+ firstChar );
though perfectly valid in C# and Java this did something weird in C++.
I expected something like
//1. L
but it gave me part of some other string literal later in the program.
what did I actually do?
I should note I've fixed the mistake so that it prints what I expect but I'm really interested in what I mistakenly did.
Thanks ahead of time.
C++ does not define addition on string literals as concatenation. Instead, a string literal decays to a pointer to its first element; a single character is interpreted as a numeric value so the result is a pointer offset from one location in the program's read-only memory segment to another.
To get addition as concatenation, use std::string:
MethodA(std::string() + "1. " + firstChar);
MethodA(std::string("1. ")+ firstChar );
since "1. " is const char[4] and has no concat methods)
The problem is that "1. " is a string literal (array of characters), that will decay into a pointer. The character itself is a char that can be promoted to an int, and addition of a const char* and an int is defined as calculating a new pointer by offsetting the original pointer by that many positions.
Your code in C++ is calling MethodA with the result of adding (int)firstChar (ASCII value of the character) to the string literal "1. ", which if the value of firstChar is greater than 4 (which it probably is) will be undefined behavior.
MethodA("1. "+ firstChar ); //your code
doesn't do what you want it to do. It is a pointer arithmetic : it just adds an integral value (which is firstChar) to the address of string-literal "1. ", then the result (which is of char const* type) is passed to the function, where it converts into string type. Based on the value of firstChar, it could invoked undefined behavior. In fact, in your case, it does invoke undefined behavior, because the resulting pointer points to beyond the string-literal.
Write this:
MethodA(string("1. ")+ firstChar ); //my code
String literals in C++ are not instances of std::string, but rather constant arrays of chars. So by adding a char to it an implicit cast to character pointer which is then incremented by the numerical value of the character, whick happened to point to another string literal stored in .data section.

What does the + operator do in cout?

In the following code I got confused and added a + where it should be <<
#include <iostream>
#include "Ship.h"
using namespace std;
int main()
{
cout << "Hello world!" << endl;
char someLetter = aLetter(true);
cout <<"Still good"<<endl;
cout << "someLetter: " + someLetter << endl;
return 0;
}
Should be
cout << "someLetter: " << someLetter << endl;
The incorrect code outputted:
Hello world!
Still good
os::clear
What I don't understand is why the compiler didn't catch any errors and what does os::clear mean? Also why wasn't "someLetter: " at the start of the line?
Here, "someLetter: " is a string literal, i.e. a const char * pointer, usually pointing to a read-only area of memory where all the string literals are stored.
someLetter is a char, so "someLetter: " + someLetter performs pointer arithmetic and adds the value of someLetter to the address stored in the pointer. The end result is a pointer that points somewhere past the string literal you intended to print.
In your case, it seems the pointer ends up in the symbol table and pointing to the second character of the name of the ios::clear method. This is completely arbitrary though, the pointer might end up pointing to another (possibly inaccessible) location, depending on the value of someLetter and the content of the string literal storage area. In summary, this behavior is undefined, you cannot rely on it.
The + operator has nothing to do with cout.
As seen in this table, + has higher precedence than <<, so the offending line of code gets parsed as follows:
(cout << ("someLetter: " + someLetter)) << endl;
In other words, + is applied to a char pointer and a char. A char is an integral data type, so you are really performing pointer arithmetics, adding the integer value of the char on the right-hand side to the pointer on the left-hand side, producing a new char pointer.
The + is doing pointer arithmatic on "someLetter: ".
I think that the C string "someLetter: " is using the char someLetter as an index, and therefore pointing to some part of memory. Hence the behaviour.
In C++ if you do silly things you get strange behaviour. The language gives you plenty of rope to hang yourself with.
You have to remember that literal string are just pointers to some memory area. What "someLetter: " + someLetter does is adding a value to that pointer and then trying to print that.