Confusing std::string::c_str() behavior in VS2010

Confusing std::string::c_str() behavior in VS2010 - c++

I'm sure I've done something wrong, but for the life of me I can't figure out what! Please consider the following code:
cerr<<el.getText()<<endl;
cerr<<el.getText().c_str()<<endl;
cerr<<"---"<<endl;
const char *value = el.getText().c_str();
cerr<<"\""<<value<<"\""<<endl;
field.cdata = el.getText().c_str();
cerr<<"\""<<field.cdata<<"\""<<endl;
el is an XML element and getText returns a std::string. As expected, el.getText() and el.getText().c_str() print the same value. However, value is set to "" - that is, the empty string - when it assigned the result of c_str(). This code had been written to set field.cdata=value, and so was clearing it out. After changing it to the supposedly-identical expression value is set from, it works fine and the final line prints the expected value.
Since el is on the stack, I thought I might have been clobbering it - but even after value is set, the underlying value in el is still correct.
My next thought was that there was some weird compiler-specific issue with assigning things to const pointers, so I wrote the following:
std::string thing = "test";
std::cout << thing << std::endl;
std::cout << thing.c_str() << std::endl;
const char* value = thing.c_str();
std::cout << value << std::endl;
As expected, I get 'test' three times.
So now I have no clue what is going on. It would seem obvious that there is something strange going on in my program that's not happening in the sample, but I don't know what it is and I'm out of ideas about how to keep looking. Can somebody enlighten me, or at least point me in the right direction?

I assume that el.getText() is returning a temporary string object. When that object is destroyed the pointer returned by c_str() is no longer valid (keep in mind that that are other ways the pointer returned by c_str() can be invalidated, too).
The temporary object will be destroyed at the end of the full expression it's created in (which is generally at the semi-colon in your example above).
You may be able to solve your problem with something like the following:
const char *value = strdup(el.getText().c_str());
which creates a copy of the string as a raw char array in dynamically allocated memory. You then become responsible for calling free() on that pointer at some point when that data is no longer needed.

Related

Manipulating std::string

The below code does not give any fault/error/warning(although I think there might be some illegal memory access happening). Strangely, the size of the string being printed using 2 different methods(strlen and std::string.size() is coming out differently.
strlen(l_str.c_str()-> is giving the size as 1500, whereas,
l_str.size()-> is giving the size as 0.
#include <string.h>
#include <string>
#include <stdio.h>
#include<iostream>
using namespace std;
void strRet(void* data)
{
char ar[1500];
memset(ar,0,1500);
for(int i=0;i<1500;i++)
ar[i]='a';
memset(data,0,1500); // This might not be correct but it works fine
memcpy(data,ar,1500);
}
int main()
{
std::string l_str;
cout<<endl<<"size before: "<<l_str.length();
int var=10;
strRet((void *)l_str.c_str());
printf("Str after call: %s\n",l_str.c_str());
cout<<endl<<"size after(using strlen): "<<strlen(l_str.c_str());
cout<<endl<<"Size after(using size function): "<<l_str.size();
printf("var value after call: %d\n",var);
return 0;
}
Please suggest, if I'm doing something which I'm not supposed to do!
Also, I wanted to know which memory bytes are being set to 0 when I do memset(data,0,1500);? What I mean to ask is that if suppose, my string variable's starting address is 100, then does memset command sets the memory range [100,1600] as 0? Or is it setting some other memory range?

memset(data,0,1500); // This might not be correct but it works fine
It isn't correct, and it doesn't "work fine". This is Undefined Behaviour, and you're making the common mistake of assuming that if it compiles, and your computer doesn't instantly catch fire, everything is fine.
It really isn't.
I've done something which I wasn't supposed to do!
Yes, you have. You took a pointer to a std::string, a non-trivial object with its own state and behaviour, asked it for the address of some memory it controls, and cast that to void*.
There's no reason to do that, you should very rarely ever see void* in C++ code, and seeing C-style casts to any type is pretty worrying.
Don't take void* pointers into objects with state and behaviour like std::string until you understand what you're doing and why this is wrong. Then, when that day comes, you still won't do it because you'll know better.
We can look at the first problem in some fine detail, if it helps:
(void *)l_str.c_str()
what does c_str() return? A pointer to some memory owned by l_str
where is this memory? No idea, that's l_str's business. If this standard library implementation uses the small string optimization, it may be inside the l_str object. If not, it may be dynamically allocated.
how much memory is allocated at this location? No idea, that's l_str's business. All we can say for sure is that there is at least one legally-addressable char (l_str.c_str()[0] == '\0') and that it's legal to use the address l_str.c_str()+1 (but only as a one-past-the-end pointer, so you can't dereference it)
So, the statement
strRet((void *)l_str.c_str());
passes strRet a pointer to a location containing one or more addressable chars, of which the first is zero. That's everything we can say about it.
Now let's look again at the problematic line
memset(data,0,1500); // This might not be correct but it works fine
why would we expect there to be 1500 chars at this location? If you'd documented strRet as requiring a buffer of at least 1500 allocated chars, would it look reasonable to actually pass l_str.c_str() when you know l_str has just been default constructed as an empty string? It's not like you asked l_str to allocate that storage for you.
You could start to make this work by giving l_str a chance to allocate the memory you intend to write, by calling
l_str.reserve(1500);
before calling strRet. This still won't notify l_str that you filled it with 'a's though, because you did that by changing the raw memory behind its back.
If you want this to work correctly, you could replace the entirety of strRet with
std::string l_str(1500, 'a');
or, if you want to change an existing string correctly, with
void strRet(std::string& out) {
// this just speeds it up, since we know the size in advance
out.reserve(1500);
// this is in case the string wasn't already empty
out.clear();
// and this actually does the work
std::fill_n(std::back_inserter(out), 1500, 'a');
}

strange behavior of std::string assign,clear and operator[]

I am observing some strange behavior of string operation.
Ex :
int main()
{
std::string name("ABCDEFGHIJ");
std::cout << "Hello, " << name << "!\n";
name.clear();
std::cout << "Hello, " << name << "!\n";
name.assign("ABCDEF",6);
std::cout << "Hello, " << name << "!\n";
std::cout << "Hello, " << name[8] << "!\n";
}
Output:
Hello, ABCDEFGHIJ!
Hello, !
Hello, ABCDEF!
Hello, I!
string::clear is actually not clearing because I am able to access the data even after clear. As per documentation when we are accessing something out of bound the result is undefined. But here I am getting the same result every time.
Can somebody explains how it works at memory level when we call clear or opeartor[].

Welcome to C++'s amazing attraction called "undefined behavior".
When name contains a six-character string, "ABCDEF", name[8] attempts to access a nonexistent member of the string, which is undefined behavior.
Which means that the result of this operation are completely meaningless.
The C++ standard does not define the result of accessing a nonexistent member character of the string; hence the undefined behavior. The potential results of this operation can be:
Some previous value that was in the string, at the given position.
Some garbage, random character.
Your program crashes.
Anything else.
A result that's different every time you execute the program, selected from options 1 through 4.

name.assign("ABCDEF",6);
Now the string has length 6. So you may legally only access elements 0 through 5.
std::cout << "Hello, " << name[8] << "!\n";
Therefore this is Undefined Behaviour. The compiler is free to do whatever the hell it pleases. Not just with the statement, but with the whole program, even the preceding lines!
At this time, it returned the character that used to be at that position earlier. It could have returned anything else, it could have crashed, it could have skipped that statement altogether, it could have skipped the assignment and many other funny things (up to and including making daemons fly out of your nose!).
And I am saying that because all that behaviour (except the daemons) can be actually observed in the wild in various circumstances.

As others said, accessing an std::string outside it's logical boundaries (i.e. [0, size()], notice that size() is included) is undefined behavior, so the compiler can make anything happen.
Now, the particular flavor of UB you are seeing is nothing particularly unexpected.
clear() just zeroes the logical length of the string, but the memory that it used is retained (it's actually required by the standard, and quite some code would work way slower without this behavior).
Given that there's no good reason to waste time in zeroing out the old data, by accessing the string out of bounds you are seeing what was at that index previously.
This may change if you e.g. call the shrink_to_fit() method after clear(), which asks to the string to free all the extra memory it's keeping.

I'd like to add to the other answers that you can use std::string::at instead of using the operator[].
std::string::at does boundary checking and will throw a std::out_of_range when you try to access an element that is out of range.

[I ran your code through a debugger. Take note of the capacity of the string. It is still 15. "assign" did not change the capacity. SO you won't get "garbage" value as everyone is saying. You're getting the exact same data which is stored in the same location. As stated the string is just a pointer to a memory address. It will go over x bytes to access the element. name[8] is a constant value it will go to the exact same memory location.
Here is a picture of the string in debugger

Adding two LPCWSTR variables

I'm trying to add two LPCWSTR Variables as in
Shader = L"shader.fx"
Path = L"Source/Shaders/"
return Path + Shader
I've tried a thousand different ways, but my latest has been this
LPCWSTR ShaderFile = GetShader(L"shader.fx");
....
LPCWSTR GetShader(std::wstring _Shader)
{
std::wstring ShaderPath = static_cast<std::wstring>(SHADER_DIRECTORY) + _Shader;
LPCWSTR Return = ShaderPath.c_str();
return Return;
}
Now when I put a break point on the return, the value seems fine, return = Source/Shaders/shader.fx as expected. But when I F10 back into my object, the ShaderFile variable turns out to be something completely random, a bunch of what seems like arabic symbols.
Could anyone point me in the right direction of what to do? As I said, the function seems to work fine, just when i F10 through the breakpoint back into my project the variable equals something completely different

What's happening is that you're returning an address to data that's being invalidated by the return, so everything will seem fine before the function returns, but immediately after the result, it's all (at least potentially) garbage.
If at all possible, just return the std::wstring, and somewhere in the calling code call its c_str() member function when you really need it in the form of a raw buffer.
If you can't do that, and simply must return the result as a raw LPCWSTR, then you'll probably have to allocate the space dynamically:
LPCWSTR *ret = new char [ShaderPath.size()];
strcpy(ret, ShaderPath.c_str());
return ret;
Then, the calling code will need to delete [] the memory when it's no longer needed.
You really want to avoid the latter, and just return an std::wstring though. It's much simpler and cleaner, and will save the nearly inevitable problems with either deleting the buffer before you're finished using it, or else forgetting to delete it when you are done using it (still serious problems in C, but essentially unheard of in decently written C++).

The wstring.c_str() returns the internal pointer of the string.
In your case the local variable is destroyed when you exit the function and hence the pointer returned is deallocated and you get unexpected result.
Possible solution would be to copy the string using the method wcscpy()

The problem is that the c_str() method is returning a pointer into the local variable ShaderPath's memory. When the function exits, ShaderPath is destroyed, along with the data pointed to by your LPCWSTR.
Why don't you just store the variable as a wstring, and whenever you need the LPCWSTR you can call c_str()?
std::wstring GetShader(std::wstring _Shader)
{
return static_cast<std::wstring>(SHADER_DIRECTORY) + _Shader;
}
Assuming you had a function Foo(LPCWSTR path), you would use it like:
Foo(GetShader(L"shader.fx").c_str());
or
std::wstring ShaderFile = GetShader(L"shader.fx");
Foo(ShaderFile.c_str());

libpq error message deallocation

Here comes a stupid question. libpq's PQerrorMessage function return a char const*
char const* msg = PQerrorMessage(conn);
Now since it's const, I don't think I should be deallocating it and I've never seen that done in any examples. But then, when and how does it get freed?
How could it know when I'm finished using my msg pointer?
At first I thought it gets deallocated once another error message is requested but that's not the case.
// cause some error
char const* msg1 = PQerrorMessage(pgconn);
// cause another error
char const* msg2 = PQerrorMessage(pgconn);
// still works
std::cout << msg1 << msg2 << std::endl;
Can someone shed some light on this for me?
Edit: credits to Dmitriy Igrishin
I asked this on the postgresql mailing list and turns out that my initial assumption was correct.
The msg1 pointer should not have been valid and I got lucky somehow.
Edit: from the postgresql docs
PQerrorMessage
Returns the error message most recently generated by an operation on the connection.
char *PQerrorMessage(const PGconn *conn);
Nearly all libpq functions will set a message for PQerrorMessage if they fail. Note that by libpq convention, a nonempty PQerrorMessage result can consist of multiple lines, and will include a trailing newline. The caller should not free the result directly. It will be freed when the associated PGconn handle is passed to PQfinish. The result string should not be expected to remain the same across operations on the PGconn structure.

Do as the docs say, dont expect it's contents to remain constant, just save them away in a std::string rather than storing the pointer.
// cause some error
std::string msg1 = PQerrorMessage(pgconn);
// cause another error
std::string msg2 = PQerrorMessage(pgconn);
// works all the time
std::cout << msg1 << msg2 << std::endl;

A library function that returns a plain-old-pointer to allocated memory is very old-school and C-ish, but there are still a lot of them around. There's no way other than documentation to know if the intent of the library designer was to transfer ownership of the allocated storage to your code. The modern library designer can return a shared_ptr<> to make their intention about storage lifetime completely clear, or wrap the string up as an std::string, which also handles allocation and deletion under the covers.
The const char* declaration doesn't really say anything about the storage lifetime. Instead, it says don't modify the storage. For an old-school function that returns allocated storage, you just have to know that deleting the storage isn't the same as modifying it. The old-school function might want to return a const char* to let you know that only so many storage positions are allocated, and if you write off the end, chaos will ensue.
Of course this function might be returning data from a static table, in which case you should neither write into it nor delete it. Again, when you use plain-old-pointers, there's no way to know.

Splitting a std::string into two const chars resulting in the second const char overwriting the first

I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.

First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.

The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).

Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).

input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Confusing std::string::c_str() behavior in VS2010 - c++

Related

Manipulating std::string

strange behavior of std::string assign,clear and operator[]

Adding two LPCWSTR variables

libpq error message deallocation

Splitting a std::string into two const chars resulting in the second const char overwriting the first

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Confusing std::string::c_str() behavior in VS2010 - c++

Related

Manipulating std::string

strange behavior of std::string assign,clear and operator[]

Adding two LPCWSTR variables

libpq error message deallocation

Splitting a std::string into two const char*s resulting in the second const char* overwriting the first

Categories

Resources

Splitting a std::string into two const chars resulting in the second const char overwriting the first