Does multiple #define of the same string use the same constant string? - c++

Does multiple #define of the same string use the same constant string? Say I do the following in multiple places:
#define TEST #"test"
Compiler is smart enough to know it refers to the same constant string in the data section right?

Truly your question does not have much to do with what #define does, but rather about how string literals are treated by the compiler. The compiler inserts the string object into the program image, which is read-only and doesn't implement the retain count. This is an optimization so that the string doesn't need to be created at runtime.
Usually the compiler is enough smart to recognize that you are using the same string literal, and the same constant string will be used, but it will not be in the heap.
Also check this question: Authoritative description of ObjectiveC string literals?

The compiler does something called string interning. It is not a necessary operation so if your code relies on test being at the same address then you may have some problems. For the most part yes, it will try to reuse strings that are the same and just make them all point to the same string (in read only memory).

Related

Are repeated constant c_strings duplicated?

So lets say for instance in my program I pass a string to a method.
someMethod("hello World");
On compilation, i'm assuming the literal, "Hello world" recognized as constant without directly declaring it so.
If it does recognize it as constant, does it store duplicates as the same address?
More specifically c++11?
So, lets have a case scenario, Lets say I populate a map with a Object to String List.
map<std::string,Shader> list;
list["shaders/sprite.vs"] = Shader("shaders/sprite.vs");
... (Sometime later in another file)
//Some call that needs a shader, that I have stored in a map.
SomeGLFunction("shaders/sprite.vs");
Excuse the obvious need to use a variable to hold it.
Without out the compiler option of "/GF" to enable string pooling, The compiler will commonly take all three literals and store them separately?
From the C++ Standard (2.13.5 String literals)
16 Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above. Whether all string literals are distinct (that is,
are stored in nonoverlapping objects) and whether successive
evaluations of a string-literal yield the same or a different object
is unspecified
So it is implementation defined whether the same string literals are distinct objects or not. Usually it depends on compiler options.
If you have for example such a call like this
someMethod("hello World");
in a loop then there is used only one string literal. So the function will get the same address of the first character of the string literal in each iteration of the loop.
However if you will write
if ( "hello World" == "hello World" )
{
//...
}
then the condition can yield either true or false depending on the corresponding compiler option.
Maybe. A compiler should do that. It doesn't have to.

Why is wrong to modify the contents of a pointer to a string litteral?

If I write:
char *aPtr = "blue"; //would be better const char *aPtr = "blue"
aPtr[0]='A';
I have a warning. The code above can work but isn't standard, it has a undefined behavior because it's read-only memory with a pointer at string litteral. The question is:
Why is it like this?
with this code rather:
char a[]="blue";
char *aPtr=a;
aPtr[0]='A';
is ok. I want to understand under the hood what happens
The first is a pointer to a read-only value created by the compiler and placed in a read-only section of the program. You cannot modify the characters at that address because they are read-only.
The second creates an array and copies each element from the initializer (see this answer for more details on that). You can modify the contents of the array, because it's a simple variable.
The first one works the way it does because doing anything else would require dynamically-allocating a new variable, and would require garbage collection to free it. That is not how C and C++ work.
The primary reason that string literals can't be modified (without undefined behavior) is to support string literal merging.
Long ago, when memory was much tighter than today, compiler authors noticed that many programs had the same string literals repeated many times--especially things like mode strings being passed to fopen (e.g., f = fopen("filename", "r");) and simple format strings being passed to printf (e.g., printf("%d\n", a);).
To save memory, they'd avoid allocating separate memory for each instance of these strings. Instead, they'd allocate one piece of memory, and point all the pointers at it.
In a few cases, they got even trickier than that, to merge literals that were't even entirely identical. For example consider code like this:
printf("%s\t%d\n", a);
/* ... */
printf("%d\n", b);
In this case, the string literals aren't entirely identical, but the second one is identical part of the end of the first. In this case, they'd still allocate one piece of memory. One pointer would point to the beginning of the memory, and the other to the position of the %d in that same block of memory.
With a possibility (but no requirement for) string literal merging, it's essentially impossible to say what behavior you'll get when you modify a string literal. If string literals are merged, modifying one string literal might modify others that are identical, or end identically. If string literals are not merged, modifying one will have no effect on any other.
MMUs added another dimension: they allowed memory to be marked as read-only, so attempting to modify a string literal would result in a signal of some sort--but only if the system had an MMU (which was often optional at one time) and also depending on whether the compiler/linker decided to put the string literals in memory they'd marked constant or not.
Since they couldn't define what the behavior would be when you modified a string literal, they decided that modifying a string literal would produce undefined behavior.
The second case is entirely different. Here you've defined an array of char. It's clear that if you define two separate arrays, they're still separate, regardless of content, so modifying one can't possibly affect the other. The behavior is clear and always has been, so doing so gives defined behavior. The fact that the array in question might be initialized from a string literal doesn't change that.

C++: Format not a string literal and no format arguments [duplicate]

This question already has answers here:
warning: format not a string literal and no format arguments
(3 answers)
Closed 8 years ago.
I've been trying to print a string literal but seems like I'm doing it wrong, since I'm getting a warning on compilation. It's probably due to wrong formatting or my misunderstanding of c_str() function, which I assume should return a string.
parser.y: In function ‘void setVal(int)’:
parser.y:617:41: warning: format not a string literal and no format arguments [-Wformat-security]
Line 617:
sprintf(temp, constStack.top().c_str());
Having those declarations
#include <stack>
const int LENGTH = 15;
char *temp = new char[LENGTH];
stack<string> constStack;
How can I provide a proper formating to string?
Simple - provide a format string:
sprintf(temp, "%s", constStack.top().c_str());
But much, much better:
string temp = constStack.top();
You are telling me in your comment that the problem is not so much the warning as the fact that your code doesn't do what you expect it to.
The solution to this and other, similar problems is to get rid of the strong C influence in your C++ code. Specifically, don't use raw dynamically allocated char arrays or sprintf. Use std::string instead.
In this case, you are using sprintf very incorrectly. Have you ever seen its signature? It goes like this:
sprintf(char *str, char const *format, ...)
str is the output of the operation. format describes what the output should be. The rest are the format arguments, which must by pure convention match what's described in format.
Now this "rest", written as ..., means that you can pass any number of arguments, even zero. And this is why your code even compiles (delivering a nice example for why ... is a dangerous feature, by the way).
In your code, the output string is, possibly incorrectly, your temp string. And the format to describe the output is, almost certainly incorrectly, what happens to sit on top of your stack.
Is this just about assigning one string to another, using sprintf simply because it more or less can do that as a very special case of what its feature set offers? There's no need for such hacks, as C++ has string assignment out of the box with std::string:
std::string temp = constStack.top();
Notice that this also eliminates the need to know the length of the string in advance.
If, for some reason, you really need formatting (but your question doesn't really show any need for it), then learn more about string streams as an alternative solution to format strings.
As the warning indicates it is issued as a result of the -Wformat-security option; you could simply disable the warning by removing the option; but it would be perhaps unwise.
The security issue is perhaps theoretical unless your code is to be widely distributed. Of perhaps more immediate concern is the possibility of your code crashing or behaving abnormally.
The problem is that the string is variable, and may at runtime contain formatting characters that cause it to attempt to read non-existent arguments. If for example the string is received from user input and the user entered "%s" it would attempt to read a string from some somewhere on the stack. That would at best place junk in temp, but worse if the memory read happened not to contain a nul character in the first 15 bytes, it would overrun temp, and corrupt the heap (in this case). Heap corruptions are probably worse than stack corruptions - the latent bug can remain unnoticed in your code for a long time only to start crashing after some unrelated change; and if it does crash, it is unlikely to be in any proximity to the cause.

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!

(c/c++) do copies of string literals share memory in TEXT section?

If I call a function like
myObj.setType("fluid");
many times in a program, how many copies of the literal "fluid" are saved in memory? Can the compiler recognize that this literal is already defined and just reference it again?
This has nothing to do with C++(the language). Instead, it is an "optimization" that a compiler can do. So, the answer yes and no, depending on the compiler/platform you are using.
#David This is from the latest draft of the language:
§ 2.14.6 (page 28)
Whether all string literals are
distinct (that is, are stored in
non overlapping objects) is
implementation defined. The effect of
attempting to modify a string literal
is undefined.
The emphasis is mine.
In other words, string literals in C++ are immutable because modifying a string literal is undefined behavior. So, the compiler is free, to eliminate redundant copies.
BTW, I am talking about C++ only ;)
Yes, it can. Of course, it depends on the compiler. For VC++, it's even configurable:
http://msdn.microsoft.com/en-us/library/s0s0asdt(VS.80).aspx
Yes it can, but there's no guarantee that it will. Define a constant if you want to be sure.
This is a compiler implementation issue. Many compilers that I have used have an option to share or merge duplicate string literals. Allowing duplicate string literals speeds up the compilation process but produces larger executables.
I believe that in C/C++ there is no specified handling for that case, but in most cases would use multiple definitions of that string.
2.13.4/2: "whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined".
This permits the optimisation you're asking about.
As an aside, there may be a slight ambiguity, at least locally within that section of the standard. The definition of string literal doesn't quite make clear to me whether the following code uses one string literal twice, or two string literals once each:
const char *a = "";
const char *b = "";
But the next paragraph says "In translation phase 6 adjacent narrow string literals are concatenated". Unless it means to say that something can be adjacent to itself, I think the intention is pretty clear that this code uses two string literals, which are concatenated in phase 6. So it's not one string literal twice:
const char *c = "a" "a";
Still, if you did read that "a" and "a" are the same string literal, then the standard requires the optimisation you're talking about. But I don't think they are the same literal, I think they're different literals that happen to consist of the same characters. This is perhaps made clear elsewhere in the standard, for instance in the general information on grammar and parsing.
Whether it's made clear or not, many compiler-writers have interpreted the standard the way I think it is, so I might as well be right ;-)