I'm hoping someone can help answer a question about strings in C++. I've tried to strip out any extraneous code from here, so it wont compile (missing namespace, defines, etc...). This is not a "bug" problem. If working code samples are needed, please specify what code you would like (for which question), I'd be happy to put something more detailed up.
//Foo.c
#define EXIT "exit"
Bar* bar; //See question C
//1
foo(const string& text) {
cout << text;
bar = new Bar(text); //See question C
}
//2
foo(const char* text) {
cout << text;
}
//3
foo(string text) {
cout << text;
}
int main() {
....
{ foo(EXIT); } //braces for scope, see question C)
bar->print(); //4
....
}
class Bar {
private const string& strBar;
Bar::Bar(const string& txt) : strBar(txt) { }
Bar::print() { cout << strBar; }
}
Assuming that only one of the three foo() methods is uncommented, they are not meant to be overloaded. I have a couple of questions here:
A) If I could figure out how to use OllyDbg well enough to fiddle the string literal "exit" into "axit" AFTER the call foo() is made, I believe the output would still be "exit" in case 1 and 3, and "exit" in case 2. Is this correct?
B) In case 1 and 3, I believe that because the method is asking for a String (even if it is a reference in case 1), there is an implicit call to the string constructor (it accepts const char*), and that constructor ALWAYS makes a copy, never a reference. (see cplusplus.com string page ) Is this correct (especially the ALWAYS)?
C) In case 1, if I initialised a new class which had a string& attribute to which I assigned the text variable, will this reference wind up pointing to bad memory when we leave the scope? IE, when we reach 4, I believe the following has happened (assuming foo(const string& text) is the uncommented function):
1. A temporary string object is create for the line foo(EXIT) that copies the literal.
2. The reference to the temp object is passed through to bar and to the strBar attribute
3. Once the code moves on and leaves the scope in which foo(EXIT) was called, I believe that the temp string object goes out of scope and disappears, which means strBar now references an area of memory with undefined contents, thinking it is still a string.
D) Going back to A, I believe in case 2 (foo(const char* text)) that this call to foo references the literal itself, not a copy, which is why fiddling with the literal in memory would change the output. Is this correct? Could I continue to pass the literal through (say to Bar) if I continued to use const char*?
E) How would you go about testing any of this beyond "this is how it works"? and "read the specs"? I don't need step by step instructions, but some ideas on what I should have done to answer the question myself using the tools I have available (Visual Studio, OllyDbg, suggestions for others?) would be great. I've spent a goodly amount of time trying to do it, and I'd like to hear what people have to say.
A) I don't know anything about OllyDbg, but in all cases std::ostream makes it's own copy of text before foo returns, so any changing of the variables after the call will not affect the output.
B) Yes, the string constructor will always make it's own copy of a char* during the implicit construction for the parameter.
C) Yes, when you call foo, a string is automatically created and used, and after the call ends, it is destroyed, leaving bar pointing at invalid memory.
D) You are correct. foo(const char* text) makes a copy of the pointer to the data, but does not copy the data. But since operator<<(ostream, char*) makes a copy of the data, changing the data will not affect the output. I don't see why you couldn't pass the const char* literal through.
E) Take a class, read a tutorial, or read the specs. Trial and error won't get you far in the standard library for this sort of question.
For these, the concept is encapsulation. The objects in the C++ standard library are all encapsulated, so that the results of any operation are what you would expect, and it is really hard to accidentally mess with their internals to make things fail or leak. If you tell ostream to print the data at a char *, it will (A) do it immediately, or (B) make it's own copy before it returns in case you mess with the char* later.
Related
I'm learning to code c++ and I've come to this problem:
I have this struct:
struct storeData
{
string name;
string username;
string fav_food;
string fav_color;
}data[30];
And I need to check if two usernames are equal so I made this statement:
for(i=0;i<c;i++){
if(data[c].username.compare(data[i].username)==0){
cout<<"Username already taken"<<endl;
}
}
And it works well, the problem that I have is that I'm required to make a function let's call it: isTaken that returns the error message, so I can use it whenever I need to for example delete a username so I don't have to copy/paste the code again.
So I began looking for an answer for that, many forums present a way to send the whole struct like this:
void isTaken(struct storeData *data)
which I understand but because I'm using string is not working, so I guess it's because string is an object? I'm using the library <string> I'm sorry if I'm not being that clear at the moment, I'm looking for a way to use isTaken(data[c].user); but I don't know how to declare the function, I think is also because string is not the same as C string but I'm not really sure I've been looking for a solution and could not find it.
I tried: void isTaken(struct storeData *data) but I got an error saying that I can't convert std::string to basic_string which makes sense if I'm correct about string I tried converting string into c string but could not get anywhere. I'm open to suggestions/corrections because I want to improve my code, also I could not find the answer here, so If someone's got a link to a problem like this please let me know.
Thank you so much for you time, have a good day.
Do you mean an array of structs instead of a struct of arrays?
In the example you are giving I see only an array of structs each of which has multiple string objects in it. You see, a string is a class coming from std and I wouldn't call it an array. If you want to know how to pass an array to a function, you should read about it (I'm sure you can find such a question in SO). If you want to have an array within your struct, then the struct will take care of the memory of the array, but you should definitely read about constructors.
You got an error because you are passing an string argument to a function which requires struct pointer
void isTaken(struct storeData *data);
...
isTaken(data[c].user);
but what you actually need is to have a function which takes an array of your users, its size and username you want to check
bool IsUsernameTaken(struct storeData data[], int dataSize, const string &username){
for(int i = 0; i<dataSize; i++){
if(username == data[i].username)
return true;
}
return false;
}
A C string looks like this
data
A C++ string usually looks like this
size
capacity
ptr
|
v
data
or if using short string optimization and the string is short enough
size
data
data
all are zero terminated.
Making a shallow copy a C string only cost the copy of the pointer to it. Where a copy of a might cost just copying the 3 members and possible an allocation of data, which is not ideal, therefor most C++ functions use a reference to a string making the cost equivalent to the C string.
All code is untested.
bool Find(const std::string& target);
Making a deep copy of a C string would also cost an allocation.
In C++ you have many options to do a search, for your struct it could look like this. In case your member variables are private you must use an access function
auto found = std::find(std::begin(data), std::begin(data)+c, [&target](const storeData& auser) { return auser.GetName() == target });
return (found != std::begin(data)+c);
The first two parameters are the range that is search, not including the 2nd. A lambda is used to check the name, a free function with the right declaration would also do.
std::string& GetName() { return name; }
The higher C++ protection schemes would advice adding 2 consts to that in case you don't need to change name.
const std::string& GetName() const { return name; }
Meaning the returned string cant be changed and the 2nd says it wont change anything in your class. This const version would be required as I used a const storeData& auser in the lambda to satisfy the constness of the struct.
I have the following mock up code of a class which uses an attribute to set a filename:
#include <iostream>
#include <iomanip>
#include <sstream>
class Test {
public:
Test() { id_ = 1; }
/* Code which modifies ID */
void save() {
std::string filename ("file_");
filename += getID();
std::cout << "Saving into: " << filename <<'\n';
}
private:
const std::string getID() {
std::ostringstream oss;
oss << std::setw(4) << std::setfill('0') << id_;
return oss.str();
}
int id_;
};
int main () {
Test t;
t.save();
}
My concern is about the getID method. At first sight it seems pretty inefficient since I am creating the ostringstream and its corresponding string to return. My questions:
1) Since it returns const std::string is the compiler (GCC in my case) able to optimize it?
2) Is there any way to improve the performance of the code? Maybe move semantics or something like that?
Thank you!
Creating an ostringstream, just once, prior to an expensive operation like opening a file, doesn't matter to your program's efficiency at all, so don't worry about it.
However, you should worry about one bad habit exhibited in your code. To your credit, you seem to have identified it already:
1) Since it returns const std::string is the compiler (GCC in my case) able to optimize it?
2) Is there any way to improve the performance of the code? Maybe move semantics or something like that?
Yes. Consider:
class Test {
// ...
const std::string getID();
};
int main() {
std::string x;
Test t;
x = t.getID(); // HERE
}
On the line marked // HERE, which assignment operator is called? We want to call the move assignment operator, but that operator is prototyped as
string& operator=(string&&);
and the argument we're actually passing to our operator= is of type "reference to an rvalue of type const string" — i.e., const string&&. The rules of const-correctness prevent us from silently converting that const string&& to a string&&, so when the compiler is creating the set of assignment-operator functions it's possible to use here (the overload set), it must exclude the move-assignment operator that takes string&&.
Therefore, x = t.getID(); ends up calling the copy-assignment operator (since const string&& can safely be converted to const string&), and you make an extra copy that could have been avoided if only you hadn't gotten into the bad habit of const-qualifying your return types.
Also, of course, the getID() member function should probably be declared as const, because it doesn't need to modify the *this object.
So the proper prototype is:
class Test {
// ...
std::string getID() const;
};
The rule of thumb is: Always return by value, and never return by const value.
1) Since it returns const std::string is the compiler (GCC in my case)
able to optimize it?
Makes no sense to return a const object unless returning by reference
2) Is there any way to improve the performance of the code? Maybe move
semantics or something like that?
Id id_ does not change, just create the value in the constructor, using an static method may help:
static std::string format_id(int id) {
std::ostringstream oss;
oss << std::setw(4) << std::setfill('0') << id;
return oss.str();
}
And then:
Test::Test()
: id_(1)
, id_str_(format_id(id_))
{ }
Update:
This answer is not totally valid for the problem due to the fact that id_ does change, I will not remove it 'cause maybe someone will find it usefull for his case. Anyway, I wanted to clarify some thoughts:
Must be static in order to be used in variable initialization
There was a mistake in the code (now corrected), which used the member variable id_.
It makes no sense to return a const object by value, because returning by value will just copy (ignoring optimizations) the result to a new variable, which is in the scope of the caller (and might be not const).
My advice
An option is to update the id_str_ field anytime id_ changes (you must have a setter for id_), given that you're already changin the member id_ I assume there will be no issues updating another.
This approach allows to implement getID() as a simple getter (should be const, btw) with no performance issues, and the string field is computed only once.
One possibility would be to do something like this:
std::string getID(int id) {
std::string ret(4, '0') = std::to_string(id);
return ret.substring(ret.length()-4);
}
If you're using an implementation that includes the short string optimization (e.g., VC++) chances are pretty good that this will give a substantial speed improvement (a quick test with VC++ shows it at around 4-5 times as fast).
OTOH, if you're using an implementation that does not include short string optimization, chances are pretty good it'll be substantially slower. For example, running the same test with g++, produces code that's about 4-5 times slower.
One more point: if your ID number might be more than 4 digits long, this doesn't give the same behavior--it always returns a string of exactly 4 characters rather than the minimum of 4 created by the stringstream code. If your ID numbers may exceed 9999, then this code simply won't work for you.
You could change getID in this way:
std::string getID() {
thread_local std::ostringstream oss;
oss.str(""); // replaces the input data with the given string
oss.clear(); // resets the error flags
oss << std::setw(4) << std::setfill('0') << id_;
return oss.str();
}
it won't create a new ostringstream every single time.
In your case it isn't worth it (as Chris Dodd says opening a file and writing to it is likely to be 10-100x more expensive)... just to know.
Also consider that in any reasonable library implementation std::to_string will be at least as fast as stringstream.
1) Since it returns const std::string is the compiler (GCC in my case)
able to optimize it?
There is a rationale for this practice, but it's essentially obsolete (e.g. Herb Sutter recommended returning const values for non-primitive types).
With C++11 it is strongly advised to return values as non-const so that you can take full advantage of rvalue references.
About this topic you can take a look at:
Purpose of returning by const value?
Should I return const objects?
Okay, so. I've been working on a class project (we haven't covered std::string and std::vector yet though obviously I know about them) to construct a time clock of sorts. The main portion of the program expects time and date values as formatted c-strings (e.g. "12:45:45", "12/12/12" etc.), and I probably could have kept things simple by storing them the same way in my basic class. But, I didn't.
Instead I did this:
class UsageEntry {
public:
....
typedef time_t TimeType;
typedef int IDType;
...
// none of these getters are thread safe
// furthermore, the char* the getters return should be used immediately
// and then discarded: its contents will be modified on the next call
// to any of these functions.
const char* getUserID();
const char* getDate();
const char* getTimeIn();
const char* getTimeOut();
private:
IDType m_id;
TimeType m_timeIn;
TimeType m_timeOut;
char m_buf[LEN_MAX];
};
And one of the getters (they all do basically the same thing):
const char* UsageEntry::getDate()
{
strftime(m_buf, LEN_OF_DATE, "%D", localtime(&m_timeIn));
return m_buf;
}
And here is a function that uses this pointer:
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
// because of the m_buf kludge, this cannot be a single
// call to fprintf
fprintf(outFile, "%s,", m_data[index].getUserID());
fprintf(outFile, "%s,", m_data[index].getDate());
fprintf(outFile, "%s,", m_data[index].getTimeIn());
fprintf(outFile, "%s\n", m_data[index].getTimeOut());
fflush(outFile);
} // end of TDataSet::writeOut
How much trouble will this cause? Or to look at it from another angle, what other sorts of interesting and !!FUN!! behaviour can this cause? And, finally, what can be done to fix it (besides the obvious solution of using strings/vectors instead)?
Somewhat related: How do the C++ library functions that do similar things handle this? e.g. localtime() returns a pointer to a struct tm object, which somehow survives the end of that function call at least long enough to be used by strftime.
There is not enough information to determine if it will cause trouble because you do not show how you use it. As long as you document the caveats and keep them in mind when using your class, there won't be issues.
There are some common gotchas to watch out for, but hopefully these are common sense:
Deleting the UsageEntry will invalidate the pointers returned by your getters, since those buffers will be deleted too. (This is especially easy to run into if using locally declared UsageEntrys, as in MadScienceDream's example.) If this is a risk, callers should create their own copy of the string. Document this.
It does not look like m_timeIn is const, and therefore it may change. Calling the getter will modify the internal buffer and these changes will be visible to anything that has that pointer. If this is a risk, callers should create their own copy of the string. Document this.
Your getters are neither reentrant nor thread-safe. Document this.
It would be safer to have the caller supply a destination buffer and length as a parameter. The function can return a pointer to that buffer for convenience. This is how e.g. read works.
A strong API can avoid issues. Failing that, good documentation and common sense can also reduce the chance of issues. Behavior is only unexpected if nobody expects it, this is why documentation about the behavior is important: It generally eliminates unexpected behavior.
Think of it like the "CAUTION: HOT SURFACE" warning on top of a toaster oven. You could design the toaster oven with insulation on top so that an accident can't happen. Failing that, the least you can do is put a warning label on it and there probably won't be an accident. If there's neither insulation nor a warning, eventually somebody will burn themselves.
Now that you've edited your question to show some documentation in the header, many of the initial risks have been reduced. This was a good change to make.
Here is an example of how your usage would change if user-supplied buffers were used (and a pointer to that buffer returned):
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
char userId[LEN_MAX], date[LEN_MAX], timeIn[LEN_MAX], timeOut[LEN_MAX];
fprintf(outFile, "%s,%s,%s,%s\n",
m_data[index].getUserID(userId, sizeof(userId)),
m_data[index].getDate(date, sizeof(date)),
m_data[index].getTimeIn(timeIn, sizeof(timeIn)),
m_data[index].getTimeOut(timeOut, sizeof(timeOut))
);
fflush(outFile);
} // end of TDataSet::writeOut
How much trouble will this cause? Or to look at it from another angle,
what other sorts of interesting and !!FUN!! behaviour can this cause?
And, finally, what can be done to fix it (besides the obvious solution
of using strings/vectors instead)?
Well there is nothing very FUN here, it just means that the results of your getter cannot outlive the corresponding instance of UsageEntry or you have a dangling pointer.
How do the C++ library functions that do similar things handle this?
e.g. localtime() returns a pointer to a struct tm object, which
somehow survives the end of that function call at least long enough to
be used by strftime.
The documentation of localtime says:
Return value
pointer to a static internal std::tm object on success, or NULL otherwise. The structure may be shared between
std::gmtime, std::localtime, and std::ctime, and may be overwritten on
each invocation.
The main problem here, as the main problem with most pointer based code, is the issue of ownership. The problem is the following:
const char* val;
{
UsageEntry ue;
val = ue.getDate();
}//ue goes out of scope
std::cout << val << std::endl;//SEGFAULT (maybe, really nasal demons)
Because val is actually owned by ue, you shoot yourself in the foot if they exist in different scopes. You COULD document this, but it is oh-so-much simpler to pass the buffer in as an argument (just like the strftime function does).
(Thanks to odedsh below for pointing this one out)
Another issue is that subsequent calls will blow away the info gained. The example odesh used was
fprintf(outFile, "%s\n%s",ue.getUserID(), ue.getDate());
but the problem is more pervasive:
const char* id = ue.getUserID();
const char* date = ue.getDate();//Changes id!
This violates the "Principal of Least Astonishment" becuase...well, its weird.
This design also breaks the rule-of-thumb that each class should do exactly one thing. In this case, UsageEntry both provides accessors to get the formatted time as a string, AND manages that strings buffer.
We recently had a lecture in college where our professor told us about different things to be careful about when programming in different languages.
The following is an example in C++:
std::string myFunction()
{
return "it's me!!";
}
int main(int argc, const char * argv[])
{
const char* tempString = myFunction().c_str();
char myNewString[100] = "Who is it?? - ";
strcat(myNewString, tempString);
printf("The string: %s", myNewString);
return 0;
}
The idea why this would fail is that return "it's me!!" implicitly calls the std::string constructor with a char[]. This string gets returned from the function and the function c_str() returns a pointer to the data from the std::string.
As the string returned from the function is not referenced anywhere, it should be deallocated immediately. That was the theory.
However, letting this code run works without problems.
Would be curious to hear what you think.
Thanks!
Your analysis is correct. What you have is undefined behaviour. This means pretty much anything can happen. It seems in your case the memory used for the string, although de-allocated, still holds the original contents when you access it. This often happens because the OS does not clear out de-allocated memory. It just marks it as available for future use. This is not something the C++ language has to deal with: it is really an OS implementation detail. As far as C++ is concerned, the catch-all "undefined behaviour" applies.
I guess deallocation does not imply memory clean-up or zeroing. And obviously this could lead to a segfault in other circumstances.
I think that the reason is that the stack memory has not been rewriten, so it can get the original data. I created a test function and called it before the strcat.
std::string myFunction()
{
return "it's me!!";
}
void test()
{
std::string str = "this is my class";
std::string hi = "hahahahahaha";
return;
}
int main(int argc, const char * argv[])
{
const char* tempString = myFunction().c_str();
test();
char myNewString[100] = "Who is it?? - ";
strcat(myNewString, tempString);
printf("The string: %s\n", myNewString);
return 0;
}
And get the result:
The string: Who is it?? - hahahahahaha
This proved my idea.
As others have mentioned, according to the C++ standard this is undefined behavior.
The reason why this "works" is because the memory has been given back to the heap manager which holds on to it for later reuse. The memory has not been given back to the OS and thus still belongs to the process. That's why accessing freed memory does not cause a segmentation fault. The problem remains however that now two parts of your program (your code and the heap manager or new owner) are accessing memory that they think uniquely belongs to them. This will destroy things sooner or later.
The fact that the string is deallocated does not necessarily mean that the memory is no longer accessible. As long as you do nothing that could overwrite it, the memory is still usable.
As said above - it's unpredicted behaviour. It doesn't work for me (in Debug configuration).
The std::string Destructor is called immediately after the assignment to the tempString - when the expression using the temporary string object finishes.
Leaving the tempString to point on a released memory (that in your case still contains the "it's me!!" literals).
You cannot conclude there is no problems by getting your result by coincidence.
There are other means to detect 'problems' :
Static analysis.
Valgrind would catch the error, showing you both the offending action (trying to copy from freed zone -by strcat) and the deallocation which caused the freeing.
Invalid read of size 1
at 0x40265BD: strcat (mc_replace_strmem.c:262)
by 0x80A5BDB: main() (valgrind_sample_for_so.cpp:20)
[...]
Address 0x5be236d is 13 bytes inside a block of size 55 free'd
at 0x4024B46: operator delete(void*) (vg_replace_malloc.c:480)
by 0x563E6BC: std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13)
by 0x80A5C18: main() (basic_string.h:236)
[...]
The one true way would be to prove the program correct. But it is really hard for procedural language, and C++ makes it harder.
Actually, string literals have static storage duration. They are packed inside the executable itself. They are not on the stack, nor dynamically allocated. In the usual case, it is correct that this would be pointing to invalid memory and be undefined behavior, however for strings, the memory is in static storage, so it will always be valid.
Unless I'm missing something, I think this is an issue of scope. myFunction() returns a std::string. The string object is not directly assigned to a variable. But it remains in scope until the end of main(). So, tempString will point to perfectly valid and available space in memory until the end of the main() code block, at which time tempString will also fall out of scope.
I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.
First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.
The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).
Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).
input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.