C++ Comparison of String Literals - c++

I'm a c++ newbie (just oldschool c). My son asked for help with this and I'm unable to explain it. If he had asked me "how do I compare strings" I would have told him to use strcmp(), but that isn't what is confusing me. Here is what he asked:
int main()
{
cout << ("A"< "Z");
}
will print 1
int main()
{
cout << ("Z"< "A");
}
will also print 1, but
int main()
{
cout << ("Z"< "A");
cout << ("A"< "Z");
}
will then print 10. Individually both cout statements print 1, but executed in a row I get a different answer?

You are comparing memory addresses. Apparently your compiler places the string literals in memory in the order it encounters them, so the first is "lesser" than the second.
Since in the first snippet it sees "A" first and "Z" second, "A" is lesser. Since it sees "Z" first in the second, "Z" is lesser. In the last snippet, it already has literals "A" and "Z" placed when the second command rolls around.

String literals have static storage duration. In all these comparisons there are compared addresses of memory allocated by the compiler for string literals. It seems that the first string literal that is encountered by the compiler is stored in memory with a lower address compared with the next encountered string literal.
Thus in this program
int main()
{
cout << ("Z"< "A");
cout << ("A"< "Z");
}
string literal "Z" was alllocated with a lower address than string literal "A" because it was found first by the compiler.
Take into account that comparison
cout << ("A"< "A");
can give different results depending on the options of the compiler because the compiler may either allocate two extents of memory for the string literals or use only one copy of the string literals that are the same.
From the C++ Standard (2.14.5 String literals)
12 Whether all string literals are distinct (that is, are stored in
nonoverlapping objects) is implementation defined. The effect of
attempting to modify a string literal is undefined.
The same is valid for C.

In the statement:
cout << ("A"< "Z");
You have created 2 string literals: "A" and "Z". These are of type const char * which is a pointer to a null terminated array of characters. The comparison here is comparing the pointers and not the values that they point to. It's this comparing of memory addresses here which is what gives you the compiler warning. The result of the comparison is going to be determined by where the compiler allocated the memory to which is going to be somewhat arbitrary from compiler to compiler. In this case it looks like the first literal found is getting assigned the first memory address by your compiler.
Just like in C to compare these string literals properly you need to use strcmp which will do a value comparison.
However when you do something the more idiomatic c++ way by doing:
cout << (std::string("A") < std::string("Z"));
Then you get the proper comparison of the values as that comparison operator is defined for std::string.

If you want to compare actual C++ strings, you need to declare C++ strings:
int main()
{
const std::string a("A");
const std::string z("Z");
cout << (z < a) << endl; // false
cout << (a < z) << endl; // true
}

In C++, the results are unspecified. I will be using N3337 for C++11.
First, we have to look at what the type of a string literal is.
§2.14.5
9 Ordinary string literals and UTF-8 string literals are also
referred to as narrow string literals. A narrow string literal has
type "array of n const char", where n is the size of the string
as defined below, and has static storage duration (3.7).
Arrays are colloquially said to decay to pointers.
§4.2
1 An lvalue or rvalue of type "array of N T" or "array of unknown
bound of T" can be converted to a prvalue of type "pointer to T".
The result is a pointer to the first element of the array.
Since your string literals both contain one character, they're the same type (char[2], including the null character.)
Therefore the following paragraph applies:
§5.9
2 [...]
Pointers to objects or functions of the same type (after pointer
conversions) can be compared, with a result defined as follows:
[...]
— If two pointers p and q of the same type point to different
objects that are not members of the same object or elements of the
same array or to different functions, or if only one of them is null,
the results of p<q, p>q, p<=q, and p>=q are unspecified.
Unspecified means that the behavior depends on the implementation. We can see that GCC gives a warning about this:
warning: comparison with string literal results in unspecified behaviour [-Waddress]
std::cout << ("Z" < "A");
The behavior may change across compilers or compiler settings but in practice for what happens, see Wintermute's answer.

You are comparing memory addresses. The example that follow explains how to compare 2 strings:
#include "stdafx.h"
#include <iostream>
#include <cstring> //prototype for strcmp()
int _tmain(int argc, _TCHAR* argv[])
{
using namespace std;
cout << strcmp("A", "Z"); // will print -1
cout << strcmp("Z", "A"); // will print 1
return 0;
}

The string constants ("A" and "Z") in C++ are represented by the C concept - array of characters where the last character is '\0'. Such constants have to be compared with strcmp() type of function.
If you would like to use the C++ std::string comparison you have to explicitly state it:
cout << (std::string( "A") < "Z");

A String is representing a pointer to memory area. So you at first compare only memory addresses with such code
"Z"< "A"
comparing strings is done with functions. They depend on "what kind of string" you have. You have char array strings, but they mid also be objects. These objects have other comparision functions. For instance the CString in MFC has the Compare but also the CompareNoCase function.
For your strings you best use the strcmp. If you debug and step in you see what the function does: it compares every char of both strings and return an integer if the first difference occurs or zero if the same.
int result = strcmp("Z", "A");
Here you find some further sample code

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

difference beеween compare array of char and pointer to char string with char string

#include <iostream>
int main() {
char* a = "test";
char b[] = "test";
if ( a == "test" ) // work
std::cout << "1";
if ( b == "test" ) // don't
std::cout << "2";
}
What exactly happend in both variants? Just memmory adress compare?
In both cases you are not comparing the actual strings (use strcmp for this), but addresses:
In the first case, you are comparing the address stored in a - the start address of a string literal "test" - with the start address of a (conceptually) different string literal, that happens to have the same content. However, if there are multiple identical string literals in your code, the compiler is allowed to store them all in the same place to save memory and as a result, the comparison yields true (although this is not guaranteed to happen every time).
In the second case however, you are comparing the address of the first element of b with that of the string literal. Here, b is a local array that contains a copy of the string "test" but resides at a completely different memory region, so this comparison fails (and will always fail)
Note: Unless you have a very good reason not to, you should of course - as mentioned by PaulEvans - use std::string instead of an char array to store strings. This will give you all the nice value semantic properties and operator overloads you'd expect.
In both cases you're comparing pointers not strings so it's blind luck any of them worked.
The best way to to compare strings is with std::string. Something like:
std::string c = "test";
if (c == "test")
std::cout << "c really is \"test\"!\n";

Const char* pointers

I have a miss-understanding with pointers.
const char* p = "Some text";
const char* q = "Some text";
if (p == q)
cout << "\nSAME ADDRESS!";
cout << "\np = " << p;
cout << "\nq = " << q; //same outputs "Some text"
In Visual Studio 2015 the syntax if (p == q) doesn't compare addresses, it compares the values... I thought it was supposed to be something like if (*p == *q).
So how do I compare addresses? I thought using if (&p == &q), but it's been said that it would be addresses of pointers, not the address of what they point to.
I think you're mixing up two different concepts. If you have two pointers p and q of any type, then p == q always compares the addresses stored in the pointers rather than the objects being pointed at. As a result, the statement
if (p == q)
tests whether p and q are literally pointing at the same object. As to why you're getting back the same objects - many compilers, as an optimization, will pool together all string literals of the same value into a single string literal and set all pointers to share it. This optimization saves space in the binary. It's also why it's undefined behavior to write to a string literal - if you could write to a string literal, then you might accidentally pollute other strings initialized to that literal.
Independently, the streams library is specifically programmed so that if you try to print out a const char *, it will treat that const char* as a pointer to the start of a null-terminated string, and then print out all of the characters in that string.
If you wanted to instead print out the addresses, you can cast the pointers to void* first:
cout << static_cast<const void*>(p) << endl;
cout << static_cast<const void*>(q) << endl;
The reason this works is that the streams library is designed so that if you try to print out a pointer that isn't a character pointer, it displays the address. The typecast basically tells the compiler "please forget that this is actually a char* and instead treat it like a generic pointer."
Similarly, if you want to compare the strings themselves, use strcmp:
if (strcmp(p, q) == 0) {
// Equal!
}
That said, in C++, you should probably consider using std::string instead of const char*, since it's safer and easier to use.
In Visual studio 2015 the syntax if (p == q) doesn't comparing addresses, it compares the values...
It does compare addresses. String (or other) literals with the exact same content will be optimized to be instantiated only once (at the same address).
To check the actual address use a statement like
cout << (void*)p << ' ' << (void*)q << endl;
Usually it depends on compiler options (implementation-defined) whether the compiler will store identical string literals as one string literal or separatly.
So for this code snippet
const char* p = "Some text";
const char* q = "Some text";
if (p == q)
the condition in the if statement can either evaluate to true or to false.
From the C++ Standard (2.14.5 String literals)
12 Whether all string literals are distinct (that is, are stored in
nonoverlapping objects) is implementationdefined. The effect of
attempting to modify a string literal is undefined.
Thus in this statement
if (p == q)
there are indeed compared pointers but the result of the comparison is implementation-defined and in general can be controlled by the programmer by means of setting compiler options.

Why does cout << &r give different output than cout << (void*)&r?

This might be a stupid question, but I'm new to C++ so I'm still fooling around with the basics. Testing pointers, I bumped into something that didn't produce the output I expected.
When I ran the following:
char r ('m');
cout << r << endl;
cout << &r << endl;
cout << (void*)&r << endl;
I expected this:
m
0042FC0F
0042FC0F
..but I got this:
m
m╠╠╠╠ôNh│hⁿB
0042FC0F
I was thinking that perhaps since r is of type char, cout would interpret &r as a char* and [for some reason] output the pointer value - the bytes comprising the address of r - as a series of chars, but then why would the first one would be m, the content of the address pointed to, rather than the char representation of the first byte of the pointer address.. It was as if cout interprets &r as r but instead of just outputting 'm', it goes on to output more chars - interpreted from the byte values of the subsequent 11 memory addresses.. Why? And why 11?
I'm using MSVC++ (Visual Studio 2013) on 64 bit Win7.
Postscript: I got a lot of correct answers here (as expected, given the trivial nature of the question). Since I can only accept one, I made it the first one I saw. But thanks, everyone.
So to summarize and expand on the instinctive theories mentioned in my question:
Yes, cout does interpret &r as char*, but since char* is a 'special thing' in C++ that essentially means a null terminated string (rather than a pointer [to a single char]), cout will attempt to print out that string by outputting chars (interpreted from the byte contents of the memory address of r onwards) until it encounters '\0'. Which explains the 11 extra characters (it just coincidentally took 11 more bytes to hit that NUL).
And for completeness - the same code, but with int instead of char, performs as expected:
int s (3);
cout << s << endl;
cout << &s << endl;
cout << (void*)&s << endl;
Produces:
3
002AF940
002AF940
A char * is a special thing in C++, inherited from C. It is, in most circumstances, a C-style string. It is supposed to point to an array of chars, terminated with a 0 (a NUL character, '\0').
So it tries to print this, following on in to the memory after the 'm', looking for a terminating '\0'. This makes it print some random garbage. This is known as Undefined Behaviour.
There is an operator<< overload specifically for char* strings. This outputs the null-terminated string, not the address. Since the pointer you're passing this overload isn't a null-terminated string, you also get Undefined Behavior when operator<< runs past the end of the buffer.
Conversely, the void* overload will print the address.
Because operator<< is overloaded based on the data type.
If you give it a char, it assumes you want that character.
If you give it a void*, it assumes you want an address.
However, if you give it a char*, it takes that as a C-style string and attempts to output it as such. Since the original intent of C++ was "C with classes", handling of C-style strings was a necessity.
The reason you get all the rubbish at the end is simply because, despite your assertion to the compiler, it isn't actually a C-style string. Specifically, it is not guaranteed to have a string-terminating NUL character at the end so the output routines will just output whatever happens to be in memory after it.
This may work (if there's a NUL there), it may print gibberish (if there's a NUL nearby), or it may fall over spectacularly (if there's no NUL before it gets to memory it cannot read). It's not something you should rely on.
Because there's an overload of operator<< which takes a const char pointer as it's second argument and prints out a string. The overload that takes a void pointer prints only the address.
A char * is often - usually even - a pointer to a C-style null-terminated string (or a string literal) and is treated as such by ostreams. A void * by contrast unambiguously indicates a pointer value is required.
The output operator (operator<<()) is overloaded for char const* and void const*. When passing a char* the overload for char const* is a better match and chosen. This overload expects a pointer to the start of a null terminated string. You give it a pointer to an individual char, i.e., you get undefined behavior.
If you want to try with a well-defined example you can use
char s[] = { 'm', 0 };
std::cout << s[0] << '\n';
std::cout << &s[0] << '\n';
std::cout << static_cast<void*>(&s[0]) << '\n';

C++ Concatenation Oops

So I've been going back and forth from C++, C# and Java lately and well writing some C++ code I did something like this.
string LongString = "Long String";
char firstChar = LongString.at(0);
And then tried using a method that looks like this,
void MethodA(string str)
{
//some code
cout << str;
//some more code }
Here's how I implemented it.
MethodA("1. "+ firstChar );
though perfectly valid in C# and Java this did something weird in C++.
I expected something like
//1. L
but it gave me part of some other string literal later in the program.
what did I actually do?
I should note I've fixed the mistake so that it prints what I expect but I'm really interested in what I mistakenly did.
Thanks ahead of time.
C++ does not define addition on string literals as concatenation. Instead, a string literal decays to a pointer to its first element; a single character is interpreted as a numeric value so the result is a pointer offset from one location in the program's read-only memory segment to another.
To get addition as concatenation, use std::string:
MethodA(std::string() + "1. " + firstChar);
MethodA(std::string("1. ")+ firstChar );
since "1. " is const char[4] and has no concat methods)
The problem is that "1. " is a string literal (array of characters), that will decay into a pointer. The character itself is a char that can be promoted to an int, and addition of a const char* and an int is defined as calculating a new pointer by offsetting the original pointer by that many positions.
Your code in C++ is calling MethodA with the result of adding (int)firstChar (ASCII value of the character) to the string literal "1. ", which if the value of firstChar is greater than 4 (which it probably is) will be undefined behavior.
MethodA("1. "+ firstChar ); //your code
doesn't do what you want it to do. It is a pointer arithmetic : it just adds an integral value (which is firstChar) to the address of string-literal "1. ", then the result (which is of char const* type) is passed to the function, where it converts into string type. Based on the value of firstChar, it could invoked undefined behavior. In fact, in your case, it does invoke undefined behavior, because the resulting pointer points to beyond the string-literal.
Write this:
MethodA(string("1. ")+ firstChar ); //my code
String literals in C++ are not instances of std::string, but rather constant arrays of chars. So by adding a char to it an implicit cast to character pointer which is then incremented by the numerical value of the character, whick happened to point to another string literal stored in .data section.