I was looking at some example C++ code for a hardware interface I'm working with and noticed a lot of statements along the following lines:
if ( NULL == pMsg ) return rv;
I'm sure I've heard people say that putting the constant first is a good idea, but why is that? Is it just so that if you have a large statement you can quickly see what you're comparing against or is there more to it?
So that you don't mix comparison (==) with assignment (=).
As you know, you can't assign to a constant. If you try, the compiler will give you an error.
Basically, it's one of defensive programming techniques. To protect yourself from yourself.
To stop you from writing:
if ( pMsg = NULL ) return rv;
by mistake. A good compiler will warn you about this however, so most people don't use the "constant first" way, as they find it difficult to read.
It stops the single = assignment bug.
Eg,
if ( NULL = pMsg ) return rv;
won't compile, where as
if ( pMsg = NULL) return rv;
will compile and give you headaches
To clarify what I wrote in some of the comments, here is a reason not to do this in C++ code.
Someone writes, say, a string class and decides to add a cast operator to const char*:
class BadString
{
public:
BadString(const char* s) : mStr(s) { }
operator const char*() const { return mStr.c_str(); }
bool operator==(const BadString& s) { return mStr == s.mStr; }
// Other stuff...
private:
std::string mStr;
};
Now someone blindly applies the constant == variable "defensive" programming pattern:
BadString s("foo");
if ("foo" == s) // Oops. This compares pointers and is never true.
{
// ...
}
This is, IMO, a more insidious problem than accidental assignment because you can't tell from the call site that anything is obviously wrong.
Of course, the real lessons are:
Don't write your own string classes.
Avoid implicit cast operators, especially when doing (1).
But sometimes you're dealing with third-party APIs you can't control. For example, the _bstr_t string class common in Windows COM programming suffers from this flaw.
When the constant is first, the compiler will warn you if you accidentally write = rather than == since it's illegal to assign a value to a constant.
Compilers outputting warnings is good, but some of us in the real world can't afford to treat warnings as errors. Reversing the order of variable and constant means this simple slip always shows up as an error and prevents compilation. You get used to this pattern very quickly, and the bug it protects against is a subtle one, which is often difficult to find once introduced.
They said, "to prevent mixing of assignment and comparison".
In reality I think it is nonsense: if you are so disciplined that you don't forget to put constant at the left side, you definitely won't mix up '=' with '==', would you? ;)
I forget the article, but the quote went something like:
Evidently it's easier remembering to put the constant first, than it is remembering to use ==" ;))
Related
I'm reading the documentation of std::experimental::optional and I have a good idea about what it does, but I don't understand when I should use it or how I should use it. The site doesn't contain any examples as of yet which leaves it harder for me to grasp the true concept of this object. When is std::optional a good choice to use, and how does it compensate for what was not found in the previous Standard (C++11).
The simplest example I can think of:
std::optional<int> try_parse_int(std::string s)
{
//try to parse an int from the given string,
//and return "nothing" if you fail
}
The same thing might be accomplished with a reference argument instead (as in the following signature), but using std::optional makes the signature and usage nicer.
bool try_parse_int(std::string s, int& i);
Another way that this could be done is especially bad:
int* try_parse_int(std::string s); //return nullptr if fail
This requires dynamic memory allocation, worrying about ownership, etc. - always prefer one of the other two signatures above.
Another example:
class Contact
{
std::optional<std::string> home_phone;
std::optional<std::string> work_phone;
std::optional<std::string> mobile_phone;
};
This is extremely preferable to instead having something like a std::unique_ptr<std::string> for each phone number! std::optional gives you data locality, which is great for performance.
Another example:
template<typename Key, typename Value>
class Lookup
{
std::optional<Value> get(Key key);
};
If the lookup doesn't have a certain key in it, then we can simply return "no value."
I can use it like this:
Lookup<std::string, std::string> location_lookup;
std::string location = location_lookup.get("waldo").value_or("unknown");
Another example:
std::vector<std::pair<std::string, double>> search(
std::string query,
std::optional<int> max_count,
std::optional<double> min_match_score);
This makes a lot more sense than, say, having four function overloads that take every possible combination of max_count (or not) and min_match_score (or not)!
It also eliminates the accursed "Pass -1 for max_count if you don't want a limit" or "Pass std::numeric_limits<double>::min() for min_match_score if you don't want a minimum score"!
Another example:
std::optional<int> find_in_string(std::string s, std::string query);
If the query string isn't in s, I want "no int" -- not whatever special value someone decided to use for this purpose (-1?).
For additional examples, you could look at the boost::optional documentation. boost::optional and std::optional will basically be identical in terms of behavior and usage.
An example is quoted from New adopted paper: N3672, std::optional:
optional<int> str2int(string); // converts int to string if possible
int get_int_from_user()
{
string s;
for (;;) {
cin >> s;
optional<int> o = str2int(s); // 'o' may or may not contain an int
if (o) { // does optional contain a value?
return *o; // use the value
}
}
}
but I don't understand when I should use it or how I should use it.
Consider when you are writing an API and you want to express that "not having a return" value is not an error. For example, you need to read data from a socket, and when a data block is complete, you parse it and return it:
class YourBlock { /* block header, format, whatever else */ };
std::optional<YourBlock> cache_and_get_block(
some_socket_object& socket);
If the appended data completed a parsable block, you can process it; otherwise, keep reading and appending data:
void your_client_code(some_socket_object& socket)
{
char raw_data[1024]; // max 1024 bytes of raw data (for example)
while(socket.read(raw_data, 1024))
{
if(auto block = cache_and_get_block(raw_data))
{
// process *block here
// then return or break
}
// else [ no error; just keep reading and appending ]
}
}
Edit: regarding the rest of your questions:
When is std::optional a good choice to use
When you compute a value and need to return it, it makes for better semantics to return by value than to take a reference to an output value (that may not be generated).
When you want to ensure that client code has to check the output value (whoever writes the client code may not check for error - if you attempt to use an un-initialized pointer you get a core dump; if you attempt to use an un-initialized std::optional, you get a catch-able exception).
[...] and how does it compensate for what was not found in the previous Standard (C++11).
Previous to C++11, you had to use a different interface for "functions that may not return a value" - either return by pointer and check for NULL, or accept an output parameter and return an error/result code for "not available".
Both impose extra effort and attention from the client implementer to get it right and both are a source of confusion (the first pushing the client implementer to think of an operation as an allocation and requiring client code to implement pointer-handling logic and the second allowing client code to get away with using invalid/uninitialized values).
std::optional nicely takes care of the problems arising with previous solutions.
I often use optionals to represent optional data pulled from configuration files, that is to say where that data (such as with an expected, yet not necessary, element within an XML document) is optionally provided, so that I can explicitly and clearly show if the data was actually present in the XML document. Especially when the data can have a "not set" state, versus an "empty" and a "set" state (fuzzy logic). With an optional, set and not set is clear, also empty would be clear with the value of 0 or null.
This can show how the value of "not set" is not equivalent to "empty". In concept, a pointer to an int (int * p) can show this, where a null (p == 0) is not set, a value of 0 (*p == 0) is set and empty, and any other value (*p <> 0) is set to a value.
For a practical example, I have a piece of geometry pulled from an XML document that had a value called render flags, where the geometry can either override the render flags (set), disable the render flags (set to 0), or simply not affect the render flags (not set), an optional would be a clear way to represent this.
Clearly a pointer to an int, in this example, can accomplish the goal, or better, a share pointer as it can offer cleaner implementation, however, I would argue it's about code clarity in this case. Is a null always a "not set"? With a pointer, it is not clear, as null literally means not allocated or created, though it could, yet might not necessarily mean "not set". It is worth pointing out that a pointer must be released, and in good practice set to 0, however, like with a shared pointer, an optional doesn't require explicit cleanup, so there isn't a concern of mixing up the cleanup with the optional having not been set.
I believe it's about code clarity. Clarity reduces the cost of code maintenance, and development. A clear understanding of code intention is incredibly valuable.
Use of a pointer to represent this would require overloading the concept of the pointer. To represent "null" as "not set", typically you might see one or more comments through code to explain this intention. That's not a bad solution instead of an optional, however, I always opt for implicit implementation rather than explicit comments, as comments are not enforceable (such as by compilation). Examples of these implicit items for development (those articles in development that are provided purely to enforce intention) include the various C++ style casts, "const" (especially on member functions), and the "bool" type, to name a few. Arguably you don't really need these code features, so long as everyone obeys intentions or comments.
everyone, I have some question about C++, what do You actually prefer to use
int* var = 0;
if(!var)...//1)
or
if(var == 0)..//2)
what are the pros and cons? thanks in advance
I prefer if (!var), because then you cannot accidentally assign to var, and you do not have to pick between 0 and NULL.
I've always been taught to use if (!var), and it seems that all the idiomatic C(++) I've ever read follows this. This has a few nice semantics:
There's no accidental assignment
You're testing the existence of something (ie. if it's NULL). Hence, you can read the statement as "If var does not exist" or "If var is not set"
Maps closely to what you'd idiomatically write if var was a boolean (bools aren't available in C, but people still mimic their use)
Similar to the previous point, it maps closely to predicate function calls, ie. if (!isEmpty(nodeptr)) { .. }
Mostly personal preference. Some will advise that the first is preferred because it becomes impossible to accidentally assign instead of compare. But by that same token, if you make a habit of putting the rvalue on the left of the comparison, the compiler will catch you when you blow it:
if( 0 == var )
...which is certainly true. I simply find if( !var ) to be a little more expressive, and less typing.
They will both evaluate the same, so there's no runtime difference. The most important thing is that you pick one and stick with it. Don't mix one with the other.
Either one is good, though I personally prefer 2 - or something like it.
Makes more sense to me to read:
if ( ptr != NULL )
than
if ( ptr )
The second I may confuse for just being a boolean to look at, but the first I'd be able to tell immediately that it's a pointer.
Either way, I think it's important to pick one and stick with that for consistency, though - rather than having it done in different ways throughout your product.
A few years has passed...
The C++11 standard introduced nullptr to check for null pointers and it should be used as it ensure that the comparison is actually done on a pointer.
So the most robust check would be to do:
if(myPtr != nullptr)
The problem with !var is that it's valid if var is a boolean or a numerical values (int, byte etc...) it will test if they are equal 0 or not.
While if you use nullptr, you are sure that you are checking a pointer not any other type:
For example:
int valA = 0;
int *ptrA = &valA;
if(!valA) // No compile error
if(!*ptrA) // No compile error
if(!ptrA) // No compile error
if(valA != nullptr) // Compile error, valA is not a pointer
if(*ptrA != nullptr) // Compile error, *ptrA is not a pointer
if(ptrA != nullptr) // No compile error
So, it's pretty easy to make a mistake when manipulating pointer to an int as in your example, that's why nullptr should be used.
I would prefer option 3:
if(not var)
{
}
The new (well since 1998) operator keywords for C++ can make code easier to read sometimes.
Strangely enough, they seem to be very unknown. There seem to be more people who know what trigraphs are (thank you IOCCC!), than people who know what operator keywords are. Operator keywords have a similar reason to exist: they allow to program in C++ even if the keyboard does not provide all ANSII characters. But in contradiction to trigraphs, operator keywords make code more readable instead of obfuscating it.
DON'T use NULL, use 0.
Strictly speaking, pointer is not bool, so use '==' and '!=' operators.
NEVER use 'if ( ptr == 0 )', use 'if ( 0 == ptr )' instead.
DON'T use C-pointers, use smart_ptr instead. :-)
According to the High Integrity C++ coding Standard Manual: for comparison between constants and variables you should put the constant on the left side to prevent an assignment instead of an equality comparison, for example:
Avoid:
if(var == 10) {...}
Prefer
if(10 == var) {...}
Particularly in your case I prefer if(0 == var) because it is clear that you are comparing the pointer to be null (0). Because the ! operator can be overloaded then it could have other meanings depending on what your pointer is pointing to: if( !(*var) ) {...}.
My program is crash intermittently when it tries to copy a character array which is not ended by a NULL terminator('\0').
class CMenuButton {
TCHAR m_szNode[32];
CMenuButton() {
memset(m_szNode, '\0', sizeof(m_szNode));
}
};
int main() {
....
CString szTemp = ((CMenuButton*)pButton)->m_szNode; // sometime it crashes here
...
return 0;
}
I suspected someone had not copied the character well ended by '\0', and it ended like:
Stack
m_szNode $%#^&!&!&!*#*#&!(*#(!*##&#&*&##!^&*&#(*!#*((*&*SDFKJSHDF*(&(*&(()(**
Can you tell me what is happening and what should i do to prevent the copying of wild pointer? Help will be very much appreciated!
I guess I'm unable to check if the character array is NULL before copying...
I suspect that your real problem could be that pButton is a bad pointer, so check that out first.
The only way to be 100% sure that a pointer is correct, and points to a correctly sized/allocated object is to never use pointers you didn't create, and never accept/return pointers. You would use cookies, instead, and look up your pointer in some sort of cookie -> pointer lookup (such as a hash table). Basically, don't trust user input.
If you are more concerned with finding bugs, and less about 100% safety against things like buffer overrun attacks, etc. then you can take a less aggressive approach. In your function signatures, where you currently take pointers to arrays, add a size parameter. E.g.:
void someFunction(char* someString);
Becomes
void someFunction(char* someString, size_t size_of_buffer);
Also, force the termination of arrays/strings in your functions. If you hit the end, and it isn't null-terminated, truncate it.
Make it so you can provide the size of the buffer when you call these, rather than calling strlen (or equivalent) on all your arrays before you call them.
This is similar to the approach taken by the "safe string functions" that were created by Microsoft (some of which were proposed for standardization). Not sure if this is the perfect link, but you can google for additional links:
http://msdn.microsoft.com/en-us/library/ff565508(VS.85).aspx
There are two possibilities:
pButton doesn't point to a CMenuButton like you think it does, and the cast is causing undefined behavior.
The code that sets m_szNode is incorrect, overflowing the given size of 32 characters.
Since you haven't shown us either piece of code, it's difficult to see what's wrong. Your initialization of m_szNode looks OK.
Is there any reason that you didn't choose a CString for m_szNode?
My approach would be to make m_szNode a private member in CMenuButton, and explicitly NULL-terminate it in the mutator method.
class CMenuButton {
private:
TCHAR m_szNode[32];
public:
void set_szNode( TCHAR x ) {
// set m_szNode appropriately
m_szNode[ 31 ] = 0;
}
};
For a specific example, consider atoi(const std::string &). This is very frustrating, since we as programmers would need to use it so much.
More general question is why does not C++ standard library reimplement the standard C libraries with C++ string,C++ vector or other C++ standard element rather than to preserve the old C standard libraries and force us use the old char * interface?
Its time consuming and the code to translate data types between these two interfaces is not easy to be elegant.
Is it for compatible reason,considering there was much more legacy C code than these days and preserving these C standard interfaces would make translation from C code to C++ much easier?
In addition,I have heard many other libraries available for C++ make a lot of enhancement and extensions to STL.So does there libraries support these functions?
PS: Considering much more answers to the first specific question, I edit a lot to clarify the question to outline the questions that I am much more curious to ask.
Another more general question is why do not STL reimplementate all the standard C libraries
Because the old C libraries do the trick. The C++ standard library only re-implements existing functionality if they can do it significantly better than the old version. And for some parts of the C library, the benefit of writing new C++-implementations just isn't big enough to justify the extra standardization work.
As for atoi and the like, there are new versions of these in the C++ standard library, in the std::stringstream class.
To convert from a type T to a type U:
T in;
U out;
std::stringstream sstr(in);
sstr >> out;
As with the rest of the IOStream library, it's not perfect, it's pretty verbose, it's impressively slow and so on, but it works, and usually it is good enough. It can handle integers of all sizes, floating-point values, C and C++ strings and any other object which defines the operator <<.
EDIT:In addition,I have heard many other libraries avaliable for C++ make a lot of enhancement and extensions to STL.So does there libraries support these functions?
Boost has a boost::lexical_cast which wraps std::stringstream. With that function, you can write the above as:
U out = boost::lexical_cast<U>(in);
Even in C, using atoi isn't a good thing to do for converting user input. It doesn't provide error checking at all. Providing a C++ version of it wouldn't be all that useful - considering that it wouldn't throw and do anything, you can just pass .c_str() to it and use it.
Instead you should use strtol in C code, which does do error checking. In C++03, you can use stringstreams to do the same, but their use is error-prone: What exactly do you need to check for? .bad(), .fail(), or .eof()? How do you eat up remaining whitespace? What about formatting flags? Such questions shouldn't bother the average user, that just want to convert his string. boost::lexical_cast does do a good job, but incidentally, C++0x adds utility functions to facilitate fast and safe conversions, through C++ wrappers that can throw if conversion failed:
int stoi(const string& str, size_t *idx = 0, int base = 10);
long stol(const string& str, size_t *idx = 0, int base = 10);
unsigned long stoul(const string& str, size_t *idx = 0, int base = 10);
long long stoll(const string& str, size_t *idx = 0, int base = 10);
unsigned long long stoull(const string& str, size_t *idx = 0, int base = 10);
Effects: the first two functions call strtol(str.c_str(), ptr, base), and the last three functions
call strtoul(str.c_str(), ptr, base), strtoll(str.c_str(), ptr, base), and strtoull(str.c_str(), ptr, base), respectively. Each function returns the converted result, if any. The argument ptr designates a pointer to an object internal to the function that is used to determine what to store at *idx. If the function does not throw an exception and idx != 0, the function stores in *idx the index of the first unconverted element of str.
Returns: the converted result.
Throws: invalid_argument if strtol, strtoul, strtoll, or strtoull reports that no conversion could be performed. Throws out_of_range if the converted value is outside the range of representable values for the return type.
There's no good way to know if atoi fails. It always returns an integer. Is that integer a valid conversion? Or is the 0 or -1 or whatever indicating an error? Yes it could throw an exception, but that would change the original contract, and you'd have to update all your code to catch the exception (which is what the OP is complaining about).
If translation is too time consuming, write your own atoi:
int atoi(const std::string& str)
{
std::istringstream stream(str);
int ret = 0;
stream >> ret;
return ret;
}
I see that solutions are offered that use std::stringstream or std::istringstream.
This might be perfectly OK for single threaded applications but if an application has lots of threads and often calls atoi(const std::string& str) implemented in this way that will result in performance degradation.
Read this discussion for example: http://gcc.gnu.org/ml/gcc-bugs/2009-05/msg00798.html.
And see a backtrace of the constructor of std::istringstream:
#0 0x200000007eb77810:0 in pthread_mutex_unlock+0x10 ()
from /usr/lib/hpux32/libc.so.1
#1 0x200000007ef22590 in std::locale::locale (this=0x7fffeee8)
at gthr-default.h:704
#2 0x200000007ef29500 in std::ios_base::ios_base (this=<not available>)
at /tmp/gcc-4.3.1.tar.gz/gcc-4.3.1/libstdc++-v3/src/ios.cc:83
#3 0x200000007ee8cd70 in std::basic_istringstream<char,std::char_traits<char>,std::allocator<char> >::basic_istringstream (this=0x7fffee4c,
__str=#0x7fffee44, __mode=_S_in) at basic_ios.h:456
#4 0x4000f70:0 in main () at main.cpp:7
So every time you enter atoi() and create a local varibale of type std::stringstream you will lock a global mutex and in a multithreaded application it is likely to result in waiting on this mutex.
So, it's better in a multithreaded application not to use std::stringstream. For example simply call atoi(const char*):
inline int atoi(const std::string& str)
{
return atoi(str.c_str());
}
For your example, you've got two options:
std::string mystring("4");
int myint = atoi(mystring.c_str());
Or something like:
std::string mystring("4");
std::istringstream buffer(mystring);
int myint = 0;
buffer >> myint;
The second option gives you better error management than the first.
You can write a more generic string to number convert as such:
template <class T>
T strToNum(const std::string &inputString,
std::ios_base &(*f)(std::ios_base&) = std::dec)
{
T t;
std::istringstream stringStream(inputString);
if ((stringStream >> f >> t).fail())
{
throw runtime_error("Invalid conversion");
}
return t;
}
// Example usage
unsigned long ulongValue = strToNum<unsigned long>(strValue);
int intValue = strToNum<int>(strValue);
int intValueFromHex = strToNum<int>(strHexValue,std::hex);
unsigned long ulOctValue = strToNum<unsigned long>(strOctVal, std::oct);
For conversions I find simplest to use boost's lexical_cast (except it might be too rigorously checking the validity of the conversions of string to other types).
It surely isn't very fast (it just uses std::stringstream under the hood, but significantly more convenient), but performance is often not needed where you convert values (e.g to create error output messages and such). (If you do lots of these conversions and need extreme performance, chances are you are doing something wrong and shouldn't be performing any conversions at all.)
Because the old C libraries still work with standard C++ types, with a very little bit of adaptation. You can easily change a const char * to a std::string with a constructor, and change back with std::string::c_str(). In your example, with std::string s, just call atoi(s.c_str()) and you're fine. As long as you can switch back and forth easily there's no need to add new functionality.
I'm not coming up with C functions that work on arrays and not container classes, except for things like qsort() and bsearch(), and the STL has better ways to do such things. If you had specific examples, I could consider them.
C++ does need to support the old C libraries for compatibility purposes, but the tendency is to provide new techniques where warranted, and provide interfaces for the old functions when there isn't much of an improvement. For example, the Boost lexical_cast is an improvement over such functions as atoi() and strtol(), much as the standard C++ string is an improvement over the C way of doing things. (Sometimes this is subjective. While C++ streams have considerable advantages over the C I/O functions, there's times when I'd rather drop back to the C way of doing things. Some parts of the C++ standard library are excellent, and some parts, well, aren't.)
There are all sorts of ways to parse a number from a string, atoi can easily be used with a std::string via atoi(std.c_str()) if you really want, but atoi has a bad interface because there is no sure way to determine if an error occurred during parsing.
Here's one slightly more modern C++ way to get an int from a std::string:
std::istringstream tmpstream(str);
if (tmpstream >> intvar)
{
// ... success! ...
}
The tongue in cheek answer is: Because STL is a half-hearted attempt to show how powerful C++ templates could be. Before they got to each corner of the problem space, time was up.
There are two reasons: Creating an API takes time and effort. Create a good API takes a lot of time and a huge effort. Creating a great API takes an insane amount of time and an incredible effort. When the STL was created, OO was still pretty new to the C++ people. They didn't have the ideas how to make fluent and simple API. Today, we think iterators are so 1990 but at the time, people thought "Bloody hell, why would I need that? for (int i=0; i<...) has been good enough for three decades!"
So STL didn't became the great, fluent API. This isn't all C++ fault because you can make good APIs with C++. But it was the first attempt to do that and it shows. Before the API could mature, it was turned into a standard and all the ugly shortcomings were set into stone. And on top of this, there was all this legacy code and all the libraries which already could do everything, so the pressure wasn't really there.
To solve your misery, give up on STL and have a look at the successors: Try boost and maybe Qt. Yeah, Qt is a UI library but it also has a pretty good standard library.
Since C++11, you can use std::stoi. It is like atoi but for std::string.
Have a look at this code:
#include <iostream>
using namespace std;
int main()
{
const char* str0 = "Watchmen";
const char* str1 = "Watchmen";
char* str2 = "Watchmen";
char* str3 = "Watchmen";
cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
cerr << static_cast<void*>( str2 ) << endl;
cerr << static_cast<void*>( str3 ) << endl;
return 0;
}
Which produces an output like this:
0x443000
0x443000
0x443000
0x443000
This was on the g++ compiler running under Cygwin. The pointers all point to the same location even with no optimization turned on (-O0).
Does the compiler always optimize so much that it searches all the string constants to see if they are equal? Can this behaviour be relied on?
It can't be relied on, it is an optimization which is not a part of any standard.
I'd changed corresponding lines of your code to:
const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";
The output for the -O0 optimization level is:
0x8048830
0x8048839
0x8048841
0x8048848
But for the -O1 it's:
0x80487c0
0x80487c1
0x80487c2
0x80487c3
As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.
It's an extremely easy optimization, probably so much so that most compiler writers don't even consider it much of an optimization at all. Setting the optimization flag to the lowest level doesn't mean "Be completely naive," after all.
Compilers will vary in how aggressive they are at merging duplicate string literals. They might limit themselves to a single subroutine — put those four declarations in different functions instead of a single function, and you might see different results. Others might do an entire compilation unit. Others might rely on the linker to do further merging among multiple compilation units.
You can't rely on this behavior, unless your particular compiler's documentation says you can. The language itself makes no demands in this regard. I'd be wary about relying on it in my own code, even if portability weren't a concern, because behavior is liable to change even between different versions of a single vendor's compiler.
You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.
If you find that you need to rely on that, then be safe and recode as follows:
char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;
You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.
It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.
He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:
if (3 = x) break;
Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.
Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.
I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.
All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:
char *s = "literal";
You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)
No, it can't be relied on, but storing read-only string constants in a pool is a pretty easy and effective optimization. It's just a matter of storing an alphabetical list of strings, and then outputting them into the object file at the end. Think of how many "\n" or "" constants are in an average code base.
If a compiler wanted to get extra fancy, it could re-use suffixes too: "\n" can be represented by pointing to the last character of "Hello\n". But that likely comes with very little benifit for a significant increase in complexity.
Anyway, I don't believe the standard says anything about where anything is stored really. This is going to be a very implementation-specific thing. If you put two of those declarations in a separate .cpp file, then things will likely change too (unless your compiler does significant linking work.)