strcat adds second parameter twice - c++

class Vars{
public:
char *appData = getenv("AppData");
string datadir = strcat(appData, "\\Bob");
};
cout << v.datadir;
outputs "C:\Users\Adam\AppData\Roaming\Bob\Bob"
instead of "C:\Users\Adam\AppData\Roaming\Bob"
It always adds the second parameter twice. How come?

"The string pointed by the pointer returned by this function shall not be modified by the program." Changing the value like you did (by strcat) leads to unpredictable behavior. The solution it to simply copy the immutable given string to a string and do the concatenation there.
What about making a new public function that does this:
string datadir(getenv("AppData"));
datadir += "\\Bob";
This is pre-C++11 code.

The issue is that you are modifying memory that you should not be. You get a pointer from getenv, but that is pointing to memory that you do not control (emphasis mine).
The pointer returned points to an internal memory block, whose content or validity may be altered by further calls to getenv (but not by other library functions).
The string pointed by the pointer returned by this function shall not be modified by the program. Some systems and library implementations may allow to change environmental variables with specific functions (putenv, setenv...), but such functionality is non-portable.
By calling strcat(appData, "\\Bob"); you are writing \Bob into a piece of memory you do not control. The operating system may decide to do any number of things with it. As has already been pointed out by #Liviu, it is much better to take a copy of the original value and append to that.
std::string appData( getEnv("AppData" ) );
appData += "\\Bob";

Related

why is string s = (char*)tag.c_str(); is wrong when tag is a string

why is the following code is wrong ?
string tag = "hello"; string s = (char*)tag.c_str();
what does this mean: storing addresses to internal storage of temporary string objects is wrong.
can someone please help? What exactly happens during above conversion?
The quoted message is wrong for the shown example because tag is not a "temporary string object", but rather a variable with longer lifetime than the pointer. It is also wrong because the pointer is only used within the full-expression where c_str was called, so the pointer is not being "stored" for later.
The cast to char* is unnecessary and should be removed. Also, the call to c_str is unnecessary. A simpler way to copy the string is this:
std::string s = tag;
Here is a wrong program for which the message would be correct:
const char* wrong = "wrong"s.c_str(); // don't do this
std::cout << wrong; // kaboom; behaviour of program is undefined
Here, addresses to internal storage of temporary string object is being stored, and it is wrong because the temporary string object is immediately destroyed and therefore the pointer is invalid and therefore useless.
string tag = "hello";
Initializes tag object with value of "hello". Exactly hos tag does that is not your concern. It does it and that is all you need to know (for now).
string s = (char*)tag.c_str();
First you ask tag object to give you access to its internal buffer where it is storing string data (see c_str() description). You then change original return type (const char*), then use that to initialize new std::string object.
I don't know what intentions you have with s, but one of the following would be a better choice:
std::string s = tag; // s is now a copy of tag.
std::string &s = tag; // s is now a reference to tag (what you do t s will be reflected on tag).
std::string s(std::move(tag)); // contents of tag moved to s
If for some reason you need to access the memory hidden inside std::string (that is what c_str() method does), you definitely shouldn't throw away const. You may think it works, but it would be better to say that it works now.
Not only is storing addresses to internal storage of temporary string objects wrong, mostly storing address to internal storage of string is wrong -- the fact that it is a temporary object only makes it worse. Here are two reasons:
Since the object is temporary it means that that address will definitely be invalid the moment the object is destroyed. By invalid I mean that it will no longer point to start of string. It will still point somewhere, but who knows that that will be.
Even if object wasn't temporary, storing that address is dangerous. Say someone assigned a new value to that string, something long. The way std::string is implemented may either lead to allocation or reallocation. That means that object now uses different place to store its data. The old place gets released and thus becomes available for reuse by others. You are still holding old pointer which now points to what? Who knows!

Assigning a string literal to std::string

I am aware that the following code will create an array of characters and remain in memory until the program ends:
char* str = "this is a string";
As for this statement, creates a local array of characters and will be freed when str goes out of scope:
char str[] = "this is a string";
What I'm curious is, what happens when I write it like this:
std::string str = "this is a string";
str should make a copy of the string in it's own memory (local), but what about the string literal itself? Will it have the lifetime of the program or will it be freed when str goes out of scope?
When you write this
std::string str = "this is a string";
C++ should find a constructor of std::string that takes const char*, calls it to make a temporary object, invokes the copy constructor to copy that temporary into str, and then destroys the temporary object.
However, there is an optimization that allows C++ compiler to skip construction and destruction of the temporary object, so the result is the same as
std::string str("this is a string");
but what about the string literal itself? Will it have the lifetime of the program or will it be freed when str goes out of scope?
String literal itself when used in this way is not accessible to your program. Typically, C++ places it in the same segment as other string literals, uses it to pass to the constructor of std::string, and forgets about it. The optimizer is allowed to eliminate duplicates among all string literals, including ones used only in the initialization of other objects.
most of the OS will partition your program memory into few parts
The Stack
The Heap
The Data segment
The BSS segment
The Code segment
you already know about the stack and the heap, but what about the others?
the code segment keeps all the operations in binary form.
now it gets interesting:
let's see the following code:
int x;
class Y{ static int y; };
int main(){
printf("hello world");
return 0;
}
where does the program allocates x and y? they are not local or dynamically allocated, so where?
The Data segment keeps all the static and global variables, when the program is being loaded, this segments keeps enough bytes to hold all the static and global variables. if the variable is an object, when the program goes up it allocates enough bytes for all the variables, including the objects. before main the program calls each global object constructor, and after the main finishes it call each object destructor in reverse order it called the constructor.
The BSS segment is a sub-set of the Data segment which keeps global and static pointers which are null-intitalized.
So assuming the string literal wasn't optimized away, the program stores it in the data segment. It will live on as long as the program lives. moreover, if it's a string literal, most likely you can even see it inside the exe! open the exe as a text file. some point along the way, you will see the string clearly.
Now what about
std::string str = "hello world"; ?
This is a funky situation. str itself lives on the stack. The actual inner buffer lives on the heap, but the string literal which used to assign the string lives in the data segment and the code which makes str value turn into hello world lives in the code segment. needless to say, if we were to program in assembly, we would need to build this ecosystem with our own bare-hands.
I will offer a counter question: why do you care?
The C++ Standard specifies the behavior of the language, and the first core principle when it comes to implementations is basically known as the as-if rule:
§ 1.9 Program execution
1/ The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
In your case:
std::string str = "this is a string";
There are various valid scenarios:
you do not use str afterward? then this whole code portion may be completely elided
you immediately assign 'T' to str[0] afterward? then the two might be coalesced into std::string str = "This is a string";
...
and there is no guarantee as to what your compiler will do. It may depend on the compiler you use, the standard library implementation you use, the architecture/OS you are compiling for and even the arguments passed to the compiler...
Thus, if you want to know in your case, you will have to inspect the machine code generated. Asking coliru, for the following code:
#include <string>
int main(int argc, char** argv) {
std::string str = "this is a string";
}
Clang produces the following in the IR:
#.str = private unnamed_addr constant [17 x i8] c"this is a string\00", align 1
which in turn gives the following assembly:
.L.str:
.asciz "this is a string"
.size .L.str, 17
So you have it, for these specific conditions, "this is a string" will be as-is in the binary and will be loaded in read-only memory. It will stay in the address space of the process until the end, the OS may page it out or not depending on RAM pressure.
A few of your initial statements are not quite correct:
For the char * and char [] example, in both cases the variable itself, str remains in scope and accessible until the program ends if it's declared in the global namespace.
If it's declared in a function or a method's scope, it's accessible while the scope remains active. Both of them.
As far as what actually happens to the memory that's used to store the actual literal strings, that's unspecified. A particular C++ implementation is free to manage runtime memory in whatever manner is more convenient for it, as long as the results are compliant with the C++ standard. As far as C++ goes, you are not accessing the memory used by the str object, you're only referencing the str object itself.
Of course, you are free to take a native char * pointer, pointing to one of the character in the str. But whether or not the pointer is valid is tied directly to the scope of the underlying object. When the corresponding str object goes out of scope, the pointer is no longer valid, and accessing the contents of the pointer becomes undefined behavior.
Note that in the case where str is in the global namespace, the scope of str is the lifetime of the program, so the point is moot. But when str is in a local scope, and it goes out of scope, using the pointer becomes undefined behavior. What happens to the underlying memory is irrelevant. The C++ standard doesn't really define much what should or should not happen to memory in the underlying implementation, but what is or is not undefined behavior.
Based on that, you can pretty much figure out the answer for the std::string case yourself. It's the same thing. You are accessing the std::string object, and not the underlying memory, and the same principle applies.
But note that in addition to the scoping issue, some, but not all, methods of the std::string object are also specified as invalidating all existing direct pointers, and iterators, to its contents, so this also affects whether or not a direct char * to one of the characters in the std::string remains valid.

Questions and Verifications on immutable [string] objects c++

I've been doing some reading on immutable strings in general and in c++, here, here, and I think I have a decent understanding of how things work. However I have built a few assumptions that I would just like to run by some people for verification. Some of the assumptions are more general than the title would suggest:
While a const string in c++ is the closest thing to an immutable string in STL, it is only locally immutable and therefore doesn't experience the benefit of being a smaller object. So it has all the trimmings of a standard string object but it can't access all of the member functions. This means that it doesn't create any optimization in the program over non-const? But rather just protects the object from modification? I understand that this is an important attribute but I'm simply looking to know what it means to use this
I'm assuming that an object's member functions exist only once in read-only memory, and how is probably implementation specific, so does a const object have a separate location in memory? Or are the member functions limited in another way? If there are only 'const string' objects and no non-const strings in a code base, does the compiler leave out the inaccessible functions?
I recall hearing that each string literal is stored only once in read-only memory in c++, however I don't find anything on this here. In other words, if I use some string literal multiple times in the same program, each instance references the same location in memory. I'm going to assume no, but would two string objects initialized by the same string literal point to the same string until one is modified?
I apologize if I have included too many disjunct thoughts in the same post, they are all related to me as string representation and just learning how to code better.
As far as I know, std::string cannot assume that the input string is a read-only constant string from your data segment. Therefore, point (3) does not apply. It will most likely allocate a buffer and copy the string in the buffer.
Note that C++ (like C) has a const qualifier for compilation time, it is a good idea to use it for two reasons: (a) it will help you find bugs, a statement such as a = 5; if a is declared const fails to compile; (b) the compile may be able to optimize the code more easily (it may otherwise not be able to figure out that the object is constant.)
However, C++ has a special cast to remove the const-ness of a variable. So our a variable can be cast and assigned a value as in const_cast<int&>(a) = 5;. An std::string can also get its const-ness removed. (Note that C does not have a special cast, but it offers the exact same behavior: * (int *) &a = 5)
Are all class members defined in the final binary?
No. std::string as most of the STL uses templates. Templates are compiled once per unit (your .o object files) and the link will reduce duplicates automatically. So if you look at the size of all the .o files and add them up, the final output will be a lot small.
That also means only the functions that are used in a unit are compiled and saved in the object file. Any other function "disappear". That being said, often function A calls function B, so B will be defined, even if you did not explicitly call it.
On the other hand, because these are templates, very often the functions get inlined. But that is a choice by the compiler, not the language or the STL (although you can use the inline keyword for fun; the compiler has the right to ignore it anyway).
Smaller objects... No, in C++ an object has a very specific size that cannot change. Otherwise the sizeof(something) would vary from place to place and C/C++ would go berserk!
Static strings that are saved in read-only data sections, however, can be optimized. If the linker/compiler are good enough, they will be able to merge the same string in a single location. These are just plan char * or wchar_t *, of course. The Microsoft compiler has been able to do that one for a while now.
Yet, the const on a string does not always force your string to be put in a read-only data section. That will generally depend on your command line option. C++ may have corrected that, but I think C still put everything in a read/write section unless you use the correct command line option. That's something you need to test to make sure (your compiler is likely to do it, but without testing you won't know.)
Finally, although std::string may not use it, C++ offers a quite interesting keyword called mutable. If you heard about it, you would know that a variable member can be marked as mutable and that means even const functions can modify that variable member. There are two main reason for using that keyword: (1) you are writing a multi-thread program and that class has to be multi-thread safe, in that case you mark the mutex as mutable, very practical; (2) you want to have a buffer used to cache a computed value which is costly, that buffer is only initialized when that value is requested to not waste time otherwise, that buffer is made mutable too.
Therefore the "immutable" concept is only really something that you can count on at a higher level. In practice, reality is often quite different. For example, an std::string c_str() function may reallocate the buffer to add the necessary '\0' terminator, yet that function is marked as being a const:
const CharT* c_str() const;
Actually, an implementation is free to allocate a completely different buffer, copy its existing data to that buffer and return that bare pointer. That means internally the std::string could be allocate many buffers to store large strings (instead of using realloc() which can be costly.)
Once thing, though... when you copy string A into string B (B = A;) the string data does not get copied. Instead A and B will share the same data buffer. Once you modify A or B, and only then, the data gets copied. This means calling a function which accepts a string by copy does not waste that much time:
int func(std::string a)
{
...
if(some_test)
{
// deep copy only happens here
a += "?";
}
}
std::string b;
func(b);
The characters of string b do not get copied at the time func() gets called. And if func() never modifies 'a', the string data remains the same all along. This is often referenced as a shallow copy or copy on write.

C++ strings - How to avoid obtaining invalid pointer?

In our C++ code, we have our own string class (for legacy reasons). It supports a method c_str() much like std::string. What I noticed is that many developers are using it incorrectly. I have reduced the problem to the following line:
const char* x = std::string("abc").c_str();
This seemingly innocent code is quite dangerous in the sense that the destructor on std::string gets invoked immediately after the call to c_str(). As a result, you are holding a pointer to a de-allocated memory location.
Here is another example:
std::string x("abc");
const char* y = x.substr(0,1).c_str();
Here too, we are using a pointer to de-allocated location.
These problems are not easy to find during testing as the memory location still contains valid data (although the memory location itself is invalid).
I am wondering if you have any suggestions on how I can modify class/method definition such that developers can never make such a mistake.
The modern part of the code should not deal with raw pointers like that.
Call c_str only when providing an argument to a legacy function that takes const char*. Like:
legacy_print(x.substr(0,1).c_str())
Why would you want to create a local variable of type const char*? Even if you write a copying version c_str_copy() you will just get more headache because now the client code is responsible for deleting the resulting pointer.
And if you need to keep the data around for a longer time (e.g. because you want to pass the data to multiple legacy functions) then just keep the data wrapped in a string instance the whole time.
For the basic case, you can add a ref qualifier on the "this" object, to make sure that .c_str() is never immediately called on a temporary. Of course, this can't stop them from storing in a variable that leaves scope before the pointer does.
const char *c_str() & { return ...; }
But the bigger-picture solution is to replace all functions from taking a "const char *" in your codebase with functions that take one of your string classes (at the very least, you need two: an owning string and a borrowed slice) - and make sure that none of your string class does cannot be implicitly constructed from a "const char *".
The simplest solution would be to change your destructor to write a null at the beginning of the string at destruction time. (Alternatively, fill the entire string with an error message or 0's; you can have a flag to disable this for release code.)
While it doesn't directly prevent programmers from making the mistake of using invalid pointers, it will definitely draw attention to the problem when the code doesn't do what it should do. This should help you flush out the problem in your code.
(As you mentioned, at the moment the errors go unnoticed because for the most part the code will happily run with the invalid memory.)
Consider using Valgrind or Electric Fence to test your code. Either of these tools should trivially and immediately find these errors.
I am not sure that there is much you can do about people using your library incorrectly if you warn them about it. Consider the actual stl string library. If i do this:
const char * lala = std::string("lala").c_str();
std::cout << lala << std::endl;
const char * lala2 = std::string("lalb").c_str();
std::cout << lala << std::endl;
std::cout << lala2 << std::endl;
I am basically creating undefined behavior. In the case where i run it on ideone.com i get the following output:
lala
lalb
lalb
So clearly the memory of the original lala has been overwritten. I would just make it very clear to the user in the documentation that this sort of coding is bad practice.
You could remove the c_str() function and instead provide a function that accepts a reference to an already created empty smart pointer that resets the value of the smart pointer to a new copy of the string. This would force the user to create a non temporary object which they could then use to get the raw c string and it would be destructed and free the memory when exiting the method scope.
This assumes though that your library and its users would be sharing the same heap.
EDIT
Even better, create your own smart pointer class for this purpose whose destructor calls a library function in your library to free the memory so it can be used across DLL boundaries.

Compiler ignores code returning a local char* from a function

As far as I know the following code is bad. But, Visual Studio 2010 doesn't give me any warning.
char* CEmployee::GetEmployeeName()
{
char* szEmployeeName = "";
CEmployeeModel* model = GetSwitchMod();
if (model != NULL)
{
szEmployeeName = model->GetName();
}
return szEmployeeName;
}
It's not the compiler's job to debug your code.
lint or similar static checker might find this. Try running Code Analysis if you have one of the premium VS versions that includes it. Make sure you build with /W4 and fix all warning errors.
You're not returning a reference to a local variable, as you're returning by value, so the local variable — the pointer — is copied.
Don't confuse the pointer with its pointee.
If anything, you'd be returning a dangling pointer (though in practice the string literal buffer is likely to be in static memory somewhere). Dangling pointers don't tend to be diagnosed at compile-time.
If model->GetName() returns a dynamically-allocated buffer, making the pointer no longer point to the string literal, then your code is fine.
TRWTF is that you didn't write char const* szEmployeeName = "". Leaving out the const has been deprecated for over a decade, and is illegal in C++0x. It's a concern that so many people are still doing this.
It's even worse that there are still people using char* for strings, instead of std::string.
Returning szEmployeeName here is actually not an error - the string is allocated statically in read-only memory (the .rodata section in ELF executables). Quoting the (C++03) Standard:
2.13.4.1
An ordinary string literal has type
“array of n const char” and static
storage duration (3.7), where n is the
size of the string as defined below,
and is initialized with the given
characters.
3.7.1
All objects which neither have dynamic
storage duration nor are local have
static storage duration. The storage
for these objects shall last for the
duration of the program
On the other hand, trying to modify this string results in undefined behaviour - in this particular case, you'll most likely get a crash at runtime. szEmployeeName should be really declared as const char* (and there are historical reasons why the standard allows initializing a plain char * with a string literal). Again, quoting the Standard:
2.13.14.2
The effect of attempting to modify a
string literal is undefined.
You're returning a pointer to a char at the end. Are you sure the memory that the pointer is referring to is still active when the code leaves the function* (what is the lifetime of model->GetName()'s return)
*EDIT: "loop" is wrong.
This code isn't necessarily "wrong" in all cases. If the thing pointed to by the pointer returned from GetName is still alive, and the pointer returned from GetEmployeeName is not written to then the code appears to be well-formed. The compiler can't reasonably be expected to do a full analysis of all your code to tell you if there's an actual problem with your pointer manipulation.
You should be using std::string as #Tomalak Geret'kal noted in his answer. That then resolves all these lifetime issues.
There's a certain point at which you should be able to say "Why am I writing code this way???" and the compiler isn't going to go to extra-ordinary lengths to warn you about every possible undefined behavior in your program (it's undefined for a reason).
This code is fine. There's nothing going on here that could possibly cause the target of szEmployeeName to be freed.
If model is NULL, then you return a pointer to "". Using a non-const pointer certainly is questionable, but the string literal "" survives for the lifetime of your program, it's not an error to return it.
If model is non-null, you return the pointer returned by model->GetName(). Since CEmployee::GetEmployeeName() doesn't free any memory, the pointer is just as valid when returned as it was when you got it from model->GetName(). Specifically, either the pointer is valid, or it is a dangling pointer, indicating a bug in CEmployeeModel->GetName().
There are no circumstances where CEmployeeModel::GetName() is correct but CEmployee::GetEmployeeName returns a bad pointer.