The value of c-strings change when I call an unrelated method - c++

I am working on a header file that defines a namespace in which some c-strings are defined.
namespace env {
const char* C_NAME;
const char* SYS_DRIVE;
const char* PROCESSOR;
const char* PROCESSOR_ARCHITECTURE;
const char* OSNAME;
}
My main function looks like this:
int main(int argc, char* argv[], char* env[]) {
initialize_environment_vars(env);
cout << "C_NAME\t\t\t" << env::C_NAME << endl;
/*...*/
return 0;
}
My problem is that the strings I initialize in initialize_environment_vars() do not have the values I want them to have.
void initialize_environment_vars(char* env[]) {
int id = PRIVATE::findEntry(env, "COMPUTERNAME");
env::C_NAME = (str::getAfter(env[id], "=")).c_str(); // getAfter() returns a string
//std::cout << env::C_NAME << std::endl; //Right value!!!
id = PRIVATE::findEntry(env, "SystemDrive");
std::cout << env::C_NAME; //Value at env[id]
/*Here the other constants are initialized in the same way.*/
}
I have found out that in the function initialize_environment_vars() the variables have the right value until I call the function findEntry() to look for another entry.
int PRIVATE::findEntry(const char* const arr[], std::string toFind) {
bool found = false;
int i = 0;
std::string actual;
while(arr[i] && !found) {
actual = arr[i];
if(str::contains(actual, toFind)) {
found = true;
break;
}
i++;
}
if(found)
return i;
else { /*Error message and exit program*/ }
}
After reading this post, string::c_str query, I also thought that my use of .c_str() in initialize_environment_vars() is wrong, because the string that is returned would be destroyed after calling the .c_str(), but this does not seem to be the case, since the env::C_NAME is valid in main().
Thus I have two questions:
Why does my PRIVATE::findEntry(const char* const [], std::string) function change the value of env::C_NAME the way I use it above, even though it only returns an int and does not modify the array nor the entries?
Why is env::C_NAME still valid in 'main()'? Should it not become invalid after the destructor of the string that str::getAfter(const std::string&, std::string) returns is called? (ANSWERED)

Why does my PRIVATE::findEntry(const char* const [], std::string) function change the value of env::C_NAME the way I use it above, even though it only returns an int and does not modify the array nor the entries?
The return value of c_str() is no longer guaranteed to be valid if the string is destroyed or modified.
Why is env::C_NAME still valid in 'main()'? Should it not become invalid after the destructor of the string that str::getAfter(const std::string&, std::string) returns is called?
It's not. It just happens to contain what you want it to contain rather than happening to contain something other than what you want it to contain. If you flip a coin in a circumstance in which it might come up heads, you are doing something broken, but it might happen to work. If you do it again, it might happen not to work. That's why we don't do stuff like this.
Do not confuse how code happens to behave without how you should expect code to behave. In situations where we expect things to be invalid, we have no idea how the code will actually behave. It might happen to do something good, it might happen to do something disastrous. It might change with the compiler options, compiler version, platform, or other parameters.
You have two obvious options. You can change the type of these variables from const char * to std::string or you can use malloc or strdup to allocate memory that will remain valid.

Your "c-strings" in your env environment are just pointers... nothing more. For example, env::C_NAME points to the address that once held the string you got with (str::getAfter(env[id], "=")). Who knows what's there now? You could change your "c-strings" in env to be char buffers of fixed size and use strcpy() to copy the content into them (beware of overrunning the end of the buffers though), or you could leave them as pointers and malloc() space for your copies of the strings, then strcpy() the original strings into your malloc()ed buffers, or best option, use std::string and don't worry about the nitty gritty.

Related

How does the '->' operator work and is it a good implementation to modify a large string?

I want to begin with saying that I have worked with pointers before and I assumed I understood how they worked. As in,
int x = 5;
int *y = &x;
*y = 3;
std::cout << x; // Would output 3
But then I wanted to make a method which modifies a rather large string and I believe therefore it would be better to pass a reference to the string in order to avoid passing the entire string back and fourth. So I pass my string to myFunc() and I do the same thing as I did with the numbers above. Which means I can modify *str as I do in the code below. But in order to use methods for String I need to use the -> operator.
#include <iostream>
#include <string>
int myFunc(std::string *str) { // Retrieve the address to which str will point to.
*str = "String from myFunc"; // This is how I would normally change the value of myString
str->replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
return 0;
}
int main() {
std::string myString << "String from main";
myFunc(&myString); // Pass address of myString to myFunc()
}
My questions are:
Since str in myFunc is an address, why can an address use an
operator such as -> and how does it work? Is it as simple as the
object at the address str's method is used? str->replace(); // str->myString.replace()?
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??
ptr->x is identical to (*ptr).x unless -> is overridden for a type you're dereferencing. On normal pointers, that works as you'd expect it to.
As for implementation, profile it when you implement it. You can't know what compiler will do with this once you turn optimizations on. For example, if given function gets inlined, you won't even have any extra indirection in the first place and it won't matter which way you do it. As long as you don't allocate a new string, differences should generally be negligible.
str is a pointer to std::string object. The arrow operator, ->, is used to dereference the pointer and then access its member. Alternatively, you can also write (*str).replace(0,1,"s"); here, * dereferences the pointer and then . access the member function replace().
Pointers are often confusing; it is better to use references when possible.
void myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}
int main() {
std::string myString = "String from main";
myFunc(myString); // Pass address of myString to myFunc()
}
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??
If you don't want to change the original string then create a new string and return it.
If it's ok for your application to modify the original string then do it. Also you can return a reference to a modified string if you need to chain function calls.
std::string& myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
return str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}

string to char* function

Quite new to c / c++. I have a question about the below code:
char* string2char(String command){
if (command.length() != 0) {
char *p = const_cast<char*>(command.c_str());
return p;
}
}
void setup() {}
void loop() {
String string1 = "Bob";
char *string1Char = string2char(string1);
String string2 = "Ross";
char *string2Char = string2char(string2);
Serial.println(string1Char);
Serial.println(string2Char);
}
This basically outputs repeatedly:
Ross
Ross
I understand I'm failing to grasp the concept of how pointers are working here - would someone be able to explain it? And how would I alter this so that it could show:
Bob
Ross
This function :
char* string2char(String command){
if (command.length() != 0) {
char *p = const_cast<char*>(command.c_str());
return p;
}
}
Does not make much sense, it takes string by value and returns pointer to its internal buffer, with cased away constnes(don't do it). You are getting some odd behaviour as you are returning values of object that already was destroyed, pass it by ref. Also I'm curious why you need to do all this stuff, can't you just pass:
Serial.println(string1.c_str());
Serial.println(string2.c_str());
As noted by Mark Ransom in the comments, when you pass the string by value, the string command is a local copy of the original string. Therefore you can't return a pointer to its c_str(), because that one points at the local copy, which will go out of scope when the function is done. So you get the same bug as described here: How to access a local variable from a different function using pointers?
A possible solution is to rewrite the function like this:
const char* string2char(const String& command){
return command.c_str();
}
Now the string is passed by reference so that c_str() refers to the same string object as the one in the caller (string1). I also took the libery to fix const-correctness at the same time.
Please note that you cannot modify the string by the pointer returned by c_str()! So it is very important to keep this const.
The problem here is that you've passed String command to the function by value, which makes a copy of whatever String you passed to the function. So, when you call const_cast<char*>(command.c_str()); you're making a pointer to the c string of that copied String. Since the String you've cast is within the scope of the function, the memory is freed when the function returns and the pointer is essentially invalid. What you want to do is change the argument to String & command which will pass a reference to the string, whose memory won't be freed when the function returns.
Your issue revolves around your argument.
char* string2char(String command){
// create a new string that's a copy of the thing you pass in, and call it command
if (command.length() != 0) {
char *p = const_cast<char*>(command.c_str());
// get the const char* that this string contains.
// It's valid only while the string command does; and is invalidated on changing the string.
return p; /// and destroy command - making p invalid
}
}
There are 2 ways to resolve this. The first and most complex, is to pass command in by reference. Thus const String& command and then work with that.
The alternative, which is much simpler, is to completely delete your function; make your char* const char* and just call c_str() on the string; ie
String string1 = "Bob";
const char *string1Char = string1.c_str();

Map C-style string to int using C++ STL?

Mapping of string to int is working fine.
std::map<std::string, int> // working
But I want to map C-style string to int
For example:
char A[10] = "apple";
map<char*,int> mapp;
mapp[A] = 10;
But when I try to access the value mapped to "apple" I am getting a garbage value instead of 10. Why it doesn't behave the same as std::string?
map<char*,int> mapp;
They key type here is not "c string". At least not, if we define c string to be "an array of characters, with null terminator". The key type, which is char*, is a pointer to a character object. The distinction is important. You aren't storing strings in the map. You are storing pointers, and the strings live elsewhere.
Unless you use a custom comparison function object, std::map uses operator<(const key_type&,key_type&) by default. Two pointers are equal if, and only if they point to the same object.
Here is an example of three objects:
char A[] = "apple";
char B[] = "apple";
const char (&C)[6] = "apple"
First two are arrays, the third is an lvalue reference that is bound to a string literal object that is also an array. Being separate objects, their address is of course also different. So, if you were to write:
mapp[A] = 10;
std::cout << mapp[B];
std::cout << mapp[C];
The output would be 0 for each, because you hadn't initialized mapp[B] nor mapp[C], so they will be value initialized by operator[]. The key values are different, even though each array contains the same characters.
Solution: Don't use operator< to compare pointers to c strings. Use std::strcmp instead. With std::map, this means using a custom comparison object. However, you aren't done with caveats yet. You must still make sure that the strings must stay in memory as long as they are pointed to by the keys in the map. For example, this would be a mistake:
char A[] = "apple";
mapp[A] = 10;
return mapp; // oops, we returned mapp outside of the scope
// but it contains a pointer to the string that
// is no longer valid outside of this scope
Solution: Take care of scope, or just use std::string.
It can be done but you need a smarter version of string:
struct CString {
CString(const char *str) {
strcpy(string, str);
}
CString(const CString &copy); // Copy constructor will be needed.
char string[50]; // Or char * if you want to go that way, but you will need
// to be careful about memory so you can already see hardships ahead.
bool operator<(const CString &rhs) {
return strcmp(string, rhs.string) < 0;
}
}
map<CString,int> mapp;
mapp["someString"] = 5;
But as you can likely see, this is a huge hassle. There are probably some things that i have missed or overlooked as well.
You could also use a comparison function:
struct cmpStr{
bool operator()(const char *a, const char *b) const {
return strcmp(a, b) < 0;
}
};
map<char *,int> mapp;
char A[5] = "A";
mapp[A] = 5;
But there is a lot of external memory management, what happens if As memory goes but the map remains, UB. This is still a nightmare.
Just use a std::string.

assignment of c_str() to a string

I have a problem which i cannot fix on my own.
string filenameRaw;
filenameRaw= argv[1];
function(filenameRaw.c_str(),...);
function(const char* rawDataFile,const char* targetfieldFile,const char* resultFile,const char* filename)
...
this->IOPaths.rawData=rawDataFile;
...
works very fine so far. Now I try to put another string in the variable IOPaths.rawData...
function(const char* rawDataFile,const char* targetfieldFile,const char* resultFile,const char* filename)
...
string filenameRaw;
filenameRaw=reader.Get("paths", "rawData", "UNKNOWN")
...
const char* rawDataFile1=filenameRaw.c_str();
cout << "Compare: " << strcmp(rawDataFile,rawDataFile1) <<endl;
...
this->IOPaths.rawData=rawDataFile1;
this does not work any more. Later in my programm I get errors with the filename. The strcmp definitly gives a 0, so the strings must be equal. Does anyone has an idea what i am doing wrong?
The validity of the output of c_str() is limited to, at most, the lifetime of the object on which c_str() was called.1
I suspect that this->IOPaths.rawData is pointing to deallocated memory once filenameRaw is out of scope.
An adequate remedy would be to pass the std::string around rather than [const] char*. A good stl implementation would use copy on write semantics for the string class so perhaps you wouldn't be repeatedly copying string data.
1In certain instances (such as if the object is modified), it could be less.

Caching a const char * as a return type

Was reading up a bit on my C++, and found this article about RTTI (Runtime Type Identification):
http://msdn.microsoft.com/en-us/library/70ky2y6k(VS.80).aspx . Well, that's another subject :) - However, I stumbled upon a weird saying in the type_info-class, namely about the ::name-method. It says: "The type_info::name member function returns a const char* to a null-terminated string representing the human-readable name of the type. The memory pointed to is cached and should never be directly deallocated."
How can you implement something like this yourself!? I've been struggling quite a bit with this exact problem often before, as I don't want to make a new char-array for the caller to delete, so I've stuck to std::string thus far.
So, for the sake of simplicity, let's say I want to make a method that returns "Hello World!", let's call it
const char *getHelloString() const;
Personally, I would make it somehow like this (Pseudo):
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
.. But this would mean that the caller should do a delete[] on my return pointer :(
Thx in advance
How about this:
const char *getHelloString() const
{
return "HelloWorld!";
}
Returning a literal directly means the space for the string is allocated in static storage by the compiler and will be available throughout the duration of the program.
I like all the answers about how the string could be statically allocated, but that's not necessarily true for all implementations, particularly the one whose documentation the original poster linked to. In this case, it appears that the decorated type name is stored statically in order to save space, and the undecorated type name is computed on demand and cached in a linked list.
If you're curious about how the Visual C++ type_info::name() implementation allocates and caches its memory, it's not hard to find out. First, create a tiny test program:
#include <cstdio>
#include <typeinfo>
#include <vector>
int main(int argc, char* argv[]) {
std::vector<int> v;
const type_info& ti = typeid(v);
const char* n = ti.name();
printf("%s\n", n);
return 0;
}
Build it and run it under a debugger (I used WinDbg) and look at the pointer returned by type_info::name(). Does it point to a global structure? If so, WinDbg's ln command will tell the name of the closest symbol:
0:000> ?? n
char * 0x00000000`00857290
"class std::vector<int,class std::allocator<int> >"
0:000> ln 0x00000000`00857290
0:000>
ln didn't print anything, which indicates that the string wasn't in the range of addresses owned by any specific module. It would be in that range if it was in the data or read-only data segment. Let's see if it was allocated on the heap, by searching all heaps for the address returned by type_info::name():
0:000> !heap -x 0x00000000`00857290
Entry User Heap Segment Size PrevSize Unused Flags
-------------------------------------------------------------------------------------------------------------
0000000000857280 0000000000857290 0000000000850000 0000000000850000 70 40 3e busy extra fill
Yes, it was allocated on the heap. Putting a breakpoint at the start of malloc() and restarting the program confirms it.
Looking at the declaration in <typeinfo> gives a clue about where the heap pointers are getting cached:
struct __type_info_node {
void *memPtr;
__type_info_node* next;
};
extern __type_info_node __type_info_root_node;
...
_CRTIMP_PURE const char* __CLR_OR_THIS_CALL name(__type_info_node* __ptype_info_node = &__type_info_root_node) const;
If you find the address of __type_info_root_node and walk down the list in the debugger, you quickly find a node containing the same address that was returned by type_info::name(). The list seems to be related to the caching scheme.
The MSDN page linked in the original question seems to fill in the blanks: the name is stored in its decorated form to save space, and this form is accessible via type_info::raw_name(). When you call type_info::name() for the first time on a given type, it undecorates the name, stores it in a heap-allocated buffer, caches the buffer pointer, and returns it.
The linked list may also be used to deallocate the cached strings during program exit (however, I didn't verify whether that is the case). This would ensure that they don't show up as memory leaks when you run a memory debugging tool.
Well gee, if we are talking about just a function, that you always want to return the same value. it's quite simple.
const char * foo()
{
static char[] return_val= "HelloWorld!";
return return_val;
}
The tricky bit is when you start doing things where you are caching the result, and then you have to consider Threading,or when your cache gets invalidated, and trying to store thing in thread local storage. But if it's just a one off output that is immediate copied, this should do the trick.
Alternately if you don't have a fixed size you have to do something where you have to either use a static buffer of arbitrary size.. in which you might eventually have something too large, or turn to a managed class say std::string.
const char * foo()
{
static std::string output;
DoCalculation(output);
return output.c_str();
}
also the function signature
const char *getHelloString() const;
is only applicable for member functions.
At which point you don't need to deal with static function local variables and could just use a member variable.
I think that since they know that there are a finite number of these, they just keep them around forever. It might be appropriate for you to do that in some instances, but as a general rule, std::string is going to be better.
They can also look up new calls to see if they made that string already and return the same pointer. Again, depending on what you are doing, this may be useful for you too.
Be careful when implementing a function that allocates a chunk of memory and then expects the caller to deallocate it, as you do in the OP:
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
By doing this you are transferring ownership of the memory to the caller. If you call this code from some other function:
int main()
{
char * str = getHelloString();
delete str;
return 0;
}
...the semantics of transferring ownership of the memory is not clear, creating a situation where bugs and memory leaks are more likely.
Also, at least under Windows, if the two functions are in 2 different modules you could potentially corrupt the heap. In particular, if main() is in hello.exe, compiled in VC9, and getHelloString() is in utility.dll, compiled in VC6, you'll corrupt the heap when you delete the memory. This is because VC6 and VC9 both use their own heap, and they aren't the same heap, so you are allocating from one heap and deallocating from another.
Why does the return type need to be const? Don't think of the method as a get method, think of it as a create method. I've seen plenty of API that requires you to delete something a creation operator/method returns. Just make sure you note that in the documentation.
/* create a hello string
* must be deleted after use
*/
char *createHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
What I've often done when I need this sort of functionality is to have a char * pointer in the class - initialized to null - and allocate when required.
viz:
class CacheNameString
{
private:
char *name;
public:
CacheNameString():name(NULL) { }
const char *make_name(const char *v)
{
if (name != NULL)
free(name);
name = strdup(v);
return name;
}
};
Something like this would do:
const char *myfunction() {
static char *str = NULL; /* this only happens once */
delete [] str; /* delete previous cached version */
str = new char[strlen("whatever") + 1]; /* allocate space for the string and it's NUL terminator */
strcpy(str, "whatever");
return str;
}
EDIT: Something that occurred to me is that a good replacement for this could be returning a boost::shared_pointer instead. That way the caller can hold onto it as long as they want and they don't have to worry about explicitly deleting it. A fair compromise IMO.
The advice given that warns about the lifetime of the returned string is sound advise. You should always be careful about recognising your responsibilities when it comes to managing the lifetime of returned pointers. The practise is quite safe, however, provided the variable pointed to will outlast the call to the function that returned it. Consider, for instance, the pointer to const char returned by c_str() as a method of class std::string. This is returning a pointer to the memory managed by the string object which is guaranteed to be valid as long as the string object is not deleted or made to reallocate its internal memory.
In the case of the std::type_info class, it is a part of the C++ standard as its namespace implies. The memory returned from name() is actually pointed to static memory created by the compiler and linker when the class was compiled and is a part of the run time type identification (RTTI) system. Because it refers to a symbol in code space, you should not attempt to delete it.
I think something like this can only be implemented "cleanly" using objects and the RAII idiom.
When the objects destructor is called (obj goes out of scope), we can safely assume that the const char* pointers arent be used anymore.
example code:
class ICanReturnConstChars
{
std::stack<char*> cached_strings
public:
const char* yeahGiveItToMe(){
char* newmem = new char[something];
//write something to newmem
cached_strings.push_back(newmem);
return newmem;
}
~ICanReturnConstChars(){
while(!cached_strings.empty()){
delete [] cached_strings.back()
cached_strings.pop_back()
}
}
};
The only other possibility i know of is to pass a smart_ptr ..
It's probably done using a static buffer:
const char* GetHelloString()
{
static char buffer[256] = { 0 };
strcpy( buffer, "Hello World!" );
return buffer;
}
This buffer is like a global variable that is accessible only from this function.
You can't rely on GC; this is C++. That means you must keep the memory available until the program terminates. You simply don't know when it becomes safe to delete[] it. So, if you want to construct and return a const char*, simple new[] it and return it. Accept the unavoidable leak.