C++ String Literal Changing After Function Terminates - c++

I have three functions like this:
MyStruct foo() {
//do something...
return get_var("string literal");
}
MyStruct get_var(const string &literal) {
return (MyStruct) {some_attribute, &*literal.begin(), literal.size()}; //struct needs const char*
}
void bar() {
Mystruct var;
//do stuff
var = foo();
std::cout << var.string_attribute;
}
This should print "string literal", but instead, the first half of the string is a random jumble of characters.
If I do this:
MyStruct get_var(const string &literal) {
std::cout << literal;
return (MyStruct) {some_attribute, &*literal.begin(), literal.size()}; //struct needs const char*
}
It prints correctly only the first time. and if I do this:
MyStruct foo() {
//do something...
string my_literal = "string literal";
std::cout << my_literal;
return get_var(my_literal);
}
It prints correctly the first and second times, but not the third. I have no idea what's happening; I thought string literals lasted forever, so it shouldn't be overwritten or anything.
Any help is greatly appreciated.

c++ is an old language that grew out of C, the result is that both the behavior and the terminology used to describe that behavior can be rather confusing.
A "string literal" is a sequence of characters in the source code surrounded by quotes. In most contexts it evaluates to a pointer to a null-terminated sequence of characters (a "C string"). Under normal circumstances* said sequence of characters will indeed remain valid for the entire lifetime of the progream.
The type string in your code on the other hand is probably referring to std::string (via using namespace std somewhere) which is a class representing an automatically managed string
When you do get_var("string literal"); or string my_literal = "string literal"; the "C string" is implicitly converted to a std::string. This operation creates a copy of the sequence of characters. Unlike the original sequence of characters this sequence of characters will be freed when the std::string that owns it is destroyed.
&*literal.begin is a somewhat unorthadox way to get a pointer to the sequence of characters owned by the std::string. using c_str would be more normal. That isn't relevant to your problem though. The important bit is the sequence of characters in memory is one owned by the std::string, not the original sequence from the string literal.
In the case of get_var("string literal"); the std::string is destroyed as soon as the statement completes. In the case of string my_literal = "string literal"; it is destroyed when the variable my_literal goes out of scope. Either way it is destroyed before foo() returns. So when you do std::cout << var.string_attribute; you are referencing a stale pointer for which the associated memory has already been freed.
The reason it works "sometimes" is that memory managers do not generally overwrite memory as soon as it is freed. Typically the memory is not actually overwritten until something re-uses it.
Edit: misread your question. It is possible for a use-after free to "work" sometimes but that is not what is going on here. The cout calls you say are working are at points in the code where the std::string is still alive.
* Excluding cases like unloading shared libraries at runtime that are beyond the scope of the C standard.

Enable maximum compiler warnings. It should alert you to the fact that you're trying to return a pointer to a temporary.
The line string my_literal = "string literal"; creates a string, then passes a const reference to that string into the function. Then at the end of foo(), my_literal is DESTROYED. It is GONE. Any pointers to that are now INVALID.
Absolutely any bad thing can happen after that, it is undefined behavior.

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

The following statment doesn't create a value on the Heap, nor the stack. Where does it exist in memory until then?

This is from my Computer Science Class
"This is dangerous (and officially deprecated in the C++ standard) because you haven't allocated memory for str1 to point at."
            — jD3V's Computer Science Professor
The Quote Above is Referring to this Line of Code
char* str1 = "Hello world";
To be clear:
I Get that using a pointer, as shown in the Line of Code above, is deprecated. I also know that it shouldn't appear in my code.
The Part I Don't Get:
The example line of code — char* str1 = "Hello world"; — works, and that surprises me.
It says that no memory has been allocated for the pointer to point at, though the pointer could still be accessed to obtain the C-String "Hello World". I am unaware of another place in memory, though my guess is that there has to be one, because if the following statement doesn't exist on the heap — "and its not placed in the stack according to my debugger" — then it must live in another memory location.
I am trying to be able to understand, and locate where the variables I declare are at in memory, and I am unable to do that here.
I would like to know...
In the example I showed above, where is the string "Hello World", and the str1 pointer that points at it, located in memory, if not in the Heap, or on the Stack?
[Disclaimer: I wrote this answer when the question was tagged both [c] and [c++]. The answer is definitely different for C versus C++. I am leaning somewhat towards C in this answer.]
char* str = "Hello world";
This is perfectly fine in C.
According to my CS Professor, in reference to the statement above, he says...
"This is dangerous (and officially deprecated in the C++ standard) because you haven't allocated memory for str to point at."
Either you misunderstood, or your professor is very badly confused.
The code is deprecated in C++ because you neglected to declare str as being a pointer to unmodifiable (i.e. const) characters. But there is nothing, absolutely nothing, wrong with the allocation of the pointed-to string.
When you write
char *str = "Hello world";
the compiler takes care of allocating memory for str to point to.
The compiler behaves more or less exactly as if you had written
static char __hidden_string[] = "Hello world";
char *str = __hidden_string;
or maybe
static const char __hidden_string[] = "Hello world";
char *str = (char *)__hidden_string;
Now, where is that __hidden_string array allocated? Certainly not on the stack (you'll notice it's declared static), and certainly not on the heap, either.
Once upon a time, the __hidden_string array was typically allocated in the "initialized data" segment, along with (most of) the rest of your global variables. That meant you could get away with modifying it, if you wanted to.
These days, some/many/most compilers allocate __hidden_string in a nonwritable segment, perhaps even the code segment. In that case, if you try to modify the string, you'll get a bus error or the equivalent.
For backwards compatibility reasons, C compilers cannot treat a string constant as if it were of type const char []. If they did, you'd get a warning whenever you wrote something like
char *str = "Hello world";
and to some extent that warning would be a good thing, because it would force you to write
const char *str = "Hello world";
making it explicit that str points to a string that you're not allowed to modify.
But C did not adopt this rule, because there's too much old code it would have broken.
C++, on the other hand, very definitely has adopted this rule. When I try char *str = "Hello world"; under two different C++ compilers, I get warning: conversion from string literal to 'char *' is deprecated. It's likely this is what your professor was trying to get at.
Summary:
"any strings in double quotes" are const lvalue string literals, stored somewhere in compiled program.
You can't modify such string, but you can store pointer to this string (of course const) and use it without modifying:
const char *str = "some string"
For example:
int my_strcmp(const char *str1, const char *str2) { ... }
int main()
{
...
const char *rule2_str= "rule2";
// compare some strings
if (my_strcmp(my_string, "rule1") == 0)
std::cout << "Execute rule1" << std::endl;
else if (my_strcmp(my_string, rule2_str) == 0)
std::cout << "Execute rule2" << std::endl;
...
}
If you want to modify string, you can copy string literal to your own array: char array[] = "12323", then your array will ititialize as string with terminate zero at the end:
Actually char array[] = "123" is same as char array[] = {'1', '2', '3', '\0'}.
For example:
int main()
{
char my_string[] = "12345";
my_string[0] = 5; // correct!
std::cout << my_string << std::endl; // 52345
}
Remember that then your array will be static, so you can't change it's size, for "dynamic sized" strings use std::string.
The problem is in lvalue and rvalue. lvalue defines locator value and it means that it has a specified place in memory and you can easily take it. rvalue has undefined place in memory. For example, any int a = 5 has 5 as rvalue. So you cannot take the address of an rvalue. When you try to access the memory for char* str = "Hello World" with something like str[0] = 'x' you will get an error Segmentation fault which means you tried to get unaviable memory. Btw operators * and & are forbidden for rvalues, it throws compile time error, if you try to use them.
The lvalue of "Hello World" is stored at the programms segment of memory. But it is specified so, as the programm can't modify it directly.

Allocating C-style string on the Heap

guys, I need a little technical help.
I'm working in C++, don't have much experience working in it but know the language somewhat. I need to use a C-style string (char array) but I need to allocate it on the heap.
If you look at this very simple piece of code:
#include <iostream>
using namespace std;
char* getText()
{
return "Hello";
}
int main()
{
char* text;
text = getText();
cout << text;
//delete text; // Calling delete results in an error
}
Now, I'm assuming that the "Hello" string is allocated on the stack, within getText(), which means the pointer will be "floating" as soon as getText returns, am I right?
If I'm right, then what's the best way to put "Hello" on the heap so I can use that string outside of getText and call delete on the pointer if I need to?
No, there's no hidden stack allocation going on there. "Hello" is static data, and ends up in the .data segment of your program.
This also means the string is shared for all calls to getText. A common use when this would be acceptable is if you have a large list of error messages that map to error codes. Functions like strerror work like this, so that you can get descriptive error messages for standard library error codes. But nobody is supposed to modify the return value of strerror (also because it is const). In your case, your function definition should read:
const char *getText()
If you do want a private copy of the string returned, you can use the strdup function to make a copy:
return strdup("Hello");
Use a std::string, from the <string> header. Then use its .c_str() member function. Then you don't have to care about allocation and deallocation: it takes care of it for you, correctly.
Cheers & hth.,
This is not right. "Hello" is a static string constant and it really should be const char*.
A narrow string literal has type "array of n const char", where n is the size of the string as defined below, and has static storage duration.
Static storage is neither automatic ("on the stack") nor dynamic ("on the heap"). It is allocated prior to the actual runtime of your program, so pointers to string literals never become invalid.
Note that char* p = "Hello" is deprecated because it is dangerous: the type system cannot prevent you from trying to change the string literal through p (which would result in undefined behavior). Use const char* p = "Hello" instead.

C/C++ Char Pointer Crash

Let's say that a function which returns a fixed ‘random text’ string is written like
char *Function1()
{
return “Some text”;
}
then the program could crash if it accidentally tried to alter the value doing
Function1()[1]=’a’;
What are the square brackets after the function call attempting to do that would make the program crash? If you're familiar with this, any explanation would be greatly appreciated!
The string you're returning in the function is usually stored in a read-only part of your process. Attempting to modify it will cause an access violation. (EDIT: Strictly speaking, it is undefined behavior, and in some systems it will cause an access violation. Thanks, John).
This is the case usually because the string itself is hardcoded along with the code of your application. When loading, pointers are stablished to point to those read-only sections of your process that hold literal strings. In fact, whenever you write some string in C, it is treated as a const char* (a pointer to const memory).
The signature of that function should really be constchar* Function();.
You are trying to modify a string literal. According to the Standard, this evokes undefined behavior. Another thing to keep in mind (related) is that string literals are always of type const char*. There is a special dispensation to convert a pointer to a string literal to char*, taking away the const qualifier, but the underlying string is still const. So by doing what you are doing, you are trying to modify a const. This also evokes undefined behavior, and is akin to trying to do this:
const char* val = "hello";
char* modifyable_val = const_cast<char*>(val);
modifyable_val[1] = 'n'; // this evokes UB
Instead of returning a const char* from your function, return a string by value. This will construct a new string based on the string literal, and the calling code can do whatever it wants:
#include <string>
std::string Function1()
{
return “Some text”;
}
...later:
std::string s = Function1();
s[1] = 'a';
Now, if you are trying to change the value that Function() reuturns, then you'll have to do something else. I'd use a class:
#include <string>
class MyGizmo
{
public:
std::string str_;
MyGizmo() : str_("Some text") {};
};
int main()
{
MyGizmo gizmo;
gizmo.str_[1] = 'n';
}
You can use static char string for return value, but you never use it. It's just like access violation error. The behavior of it is not defined in c++ Standard.
It's not the brackets, but the assignement. Your function returns not a simple char *, but const char *( i can be wrong here, but the memory is read-only here), so you try to change the unchangeable memory. And the brackets - they just give you access to the element of the array.
Note also that you can avoid the crash by placing the text in a regular array:
char Function1Str[] = "Some text";
char *Function1()
{
return Function1Str;
}
The question shows that you do not understand the string literals.
image this code
char* pch = "Here is some text";
char* pch2 = "some text";
char* pch3 = "Here is";
Now, how the compiler allocates memory to the strings is entirely a matter for the compiler. the memory might organised like this:
Here is<NULL>Here is some text<NULL>
with pch2 pointing to memory location inside the pch string.
The key here is understanding the memory. Using the Standard Template Library (stl) would be a good practice, but you may be quite a steep learning curve for you.

Returning value from a function

const char *Greet(const char *c) {
string name;
if(c)
name = c;
if (name.empty())
return "Hello, Unknown";
return name.c_str();
}
int _tmain(int argc, _TCHAR* argv[])
{
cout << Greet(0) << '\t' << Greet("Hello, World") << endl;
return 0;
}
I see 2 bugs with the above code.
Returning c_str from a string object that is defined local to the function. String gets destroyed when function returns and clearly c_str() will point to some memory that is de-allocated.
Returning "Hello, Unknown" from within the function. This is again an array of const chars allocated in the stack which should get de-allocated as well when the function returns. However, it does not and I am guessing that is because of Return Value Optimization.
Is my above understanding correct?
PS: I tested the above code with both gcc and MSVC10. GCC runs the above code fine and does not generate any runtime errors or undefined behaviors both for the string object as well as for the constant string. MSVC10 displays garbage data for the string object but prints the constant string correctly.
Number 1 is correct. The pointer returned from c_str() is invalidated when name is destroyed. Dereferencing the pointer after name results in undefined behavior. In your tests, under gcc it appears to work; under Visual C++ it prints garbage. Any results are possible when the behavior is undefined.
Number 2 is incorrect. "Hello, Unknown" is a string literal. String literals have static storage duration (they exist from when the program starts up to when it terminates. You are returning a pointer to this string literal, and that pointer is valid even after the function returns.
String literals have static storage, so are not deallocated at the end of the function.