Optimizing string creation - c++

I have the following mock up code of a class which uses an attribute to set a filename:
#include <iostream>
#include <iomanip>
#include <sstream>
class Test {
public:
Test() { id_ = 1; }
/* Code which modifies ID */
void save() {
std::string filename ("file_");
filename += getID();
std::cout << "Saving into: " << filename <<'\n';
}
private:
const std::string getID() {
std::ostringstream oss;
oss << std::setw(4) << std::setfill('0') << id_;
return oss.str();
}
int id_;
};
int main () {
Test t;
t.save();
}
My concern is about the getID method. At first sight it seems pretty inefficient since I am creating the ostringstream and its corresponding string to return. My questions:
1) Since it returns const std::string is the compiler (GCC in my case) able to optimize it?
2) Is there any way to improve the performance of the code? Maybe move semantics or something like that?
Thank you!

Creating an ostringstream, just once, prior to an expensive operation like opening a file, doesn't matter to your program's efficiency at all, so don't worry about it.
However, you should worry about one bad habit exhibited in your code. To your credit, you seem to have identified it already:
1) Since it returns const std::string is the compiler (GCC in my case) able to optimize it?
2) Is there any way to improve the performance of the code? Maybe move semantics or something like that?
Yes. Consider:
class Test {
// ...
const std::string getID();
};
int main() {
std::string x;
Test t;
x = t.getID(); // HERE
}
On the line marked // HERE, which assignment operator is called? We want to call the move assignment operator, but that operator is prototyped as
string& operator=(string&&);
and the argument we're actually passing to our operator= is of type "reference to an rvalue of type const string" — i.e., const string&&. The rules of const-correctness prevent us from silently converting that const string&& to a string&&, so when the compiler is creating the set of assignment-operator functions it's possible to use here (the overload set), it must exclude the move-assignment operator that takes string&&.
Therefore, x = t.getID(); ends up calling the copy-assignment operator (since const string&& can safely be converted to const string&), and you make an extra copy that could have been avoided if only you hadn't gotten into the bad habit of const-qualifying your return types.
Also, of course, the getID() member function should probably be declared as const, because it doesn't need to modify the *this object.
So the proper prototype is:
class Test {
// ...
std::string getID() const;
};
The rule of thumb is: Always return by value, and never return by const value.

1) Since it returns const std::string is the compiler (GCC in my case)
able to optimize it?
Makes no sense to return a const object unless returning by reference
2) Is there any way to improve the performance of the code? Maybe move
semantics or something like that?
Id id_ does not change, just create the value in the constructor, using an static method may help:
static std::string format_id(int id) {
std::ostringstream oss;
oss << std::setw(4) << std::setfill('0') << id;
return oss.str();
}
And then:
Test::Test()
: id_(1)
, id_str_(format_id(id_))
{ }
Update:
This answer is not totally valid for the problem due to the fact that id_ does change, I will not remove it 'cause maybe someone will find it usefull for his case. Anyway, I wanted to clarify some thoughts:
Must be static in order to be used in variable initialization
There was a mistake in the code (now corrected), which used the member variable id_.
It makes no sense to return a const object by value, because returning by value will just copy (ignoring optimizations) the result to a new variable, which is in the scope of the caller (and might be not const).
My advice
An option is to update the id_str_ field anytime id_ changes (you must have a setter for id_), given that you're already changin the member id_ I assume there will be no issues updating another.
This approach allows to implement getID() as a simple getter (should be const, btw) with no performance issues, and the string field is computed only once.

One possibility would be to do something like this:
std::string getID(int id) {
std::string ret(4, '0') = std::to_string(id);
return ret.substring(ret.length()-4);
}
If you're using an implementation that includes the short string optimization (e.g., VC++) chances are pretty good that this will give a substantial speed improvement (a quick test with VC++ shows it at around 4-5 times as fast).
OTOH, if you're using an implementation that does not include short string optimization, chances are pretty good it'll be substantially slower. For example, running the same test with g++, produces code that's about 4-5 times slower.
One more point: if your ID number might be more than 4 digits long, this doesn't give the same behavior--it always returns a string of exactly 4 characters rather than the minimum of 4 created by the stringstream code. If your ID numbers may exceed 9999, then this code simply won't work for you.

You could change getID in this way:
std::string getID() {
thread_local std::ostringstream oss;
oss.str(""); // replaces the input data with the given string
oss.clear(); // resets the error flags
oss << std::setw(4) << std::setfill('0') << id_;
return oss.str();
}
it won't create a new ostringstream every single time.
In your case it isn't worth it (as Chris Dodd says opening a file and writing to it is likely to be 10-100x more expensive)... just to know.
Also consider that in any reasonable library implementation std::to_string will be at least as fast as stringstream.
1) Since it returns const std::string is the compiler (GCC in my case)
able to optimize it?
There is a rationale for this practice, but it's essentially obsolete (e.g. Herb Sutter recommended returning const values for non-primitive types).
With C++11 it is strongly advised to return values as non-const so that you can take full advantage of rvalue references.
About this topic you can take a look at:
Purpose of returning by const value?
Should I return const objects?

Related

Cannot Get C++ Function to Use String from User

Please help.
I have this program from here that calls a function in a header file.
#include <iostream>
#include "md5.h"
#include <string>
using namespace std;
int main(){
string x;
char* v_MD5String;
MD5 md5 ;
v_MD5String = "Hello World";
x = puts(md5.digestString(v_MD5String));
cout << x;
return 0;}
The function called:
char* digestString( char *string ){
Init() ;
Update( (unsigned char*)string, strlen(string) ) ;
Final() ;
return digestChars ;}
The above works, however when I use input from the user it compiles, but the run crashes without any errors.
In the program, this is changed:
v_MD5String = "Hello World";
to this:
cin >> v_MD5String;
What should I do to get this to work?
Thanks.
So, if I understand correctly, you have the following function declared in a header file which you cannot modify:
char* digestString( char *string );
You should first know that this is questionable coding style. The function takes a char * rather than a char const *, which implies that the passed data is changed, yet it also returns something. I had to dig around in the implementation posted on the page you linked to find out that string is really an input parameter, so that the author just forgot about using const and the data is not going to be changed anyway.
(The data not going to be modified is at least my assumption upon superficial code analysis and some compile tests. You should ask the author to be really sure!)
If you use this function in C++, your first task should be to provide a safer, easy-to-understand wrapper function which uses real C++ strings (the std::string class), not C strings (which happen to be completely unencapsulated pointers to characters in memory, which is fine in the C world but not in C++). You already use one std::string in your program. That's good. Now use it more:
std::string SafeDigestString(MD5 &md5, std::string const &input)
{
// the input of digestString will never be modified:
return md5.digestString(const_cast<char *>(input.c_str()));
}
Both the const & and the parameter name make it clear that we are dealing with input.
Note that I used a const_cast<char *> to pass the std::string's C-compatible data representation, which is char const *, to the digestString function. This is one of the rare cases where a const_cast is appropriate; it's also a typical one, namely making up for shortcomings with regards to const declarations in other code you have to use. If all functions in the MD5 class correctly declared their input parameters const, then no const_cast would be needed.
Also note that I just prepend every std identifier with std::, rather than having using namespace std. This is often the better, simpler, more consistent choice.
Now that we have our safe C++ mechanism in place, main becomes drastically simpler:
int main()
{
MD5 md5;
std::string result = SafeDigestString(md5, "Hello World");
std::cout << result << "\n";
}
We have laid the base to implement user input, which is best done with the std::getline function:
int main()
{
MD5 md5;
std::string input;
std::getline(std::cin, input);
std::string result = SafeDigestString(md5, input);
std::cout << result << "\n";
}

Why doesn't the string class have a << operator (operator<<) predefined so that strings work like ostringstreams?

It seems to me that defining the << operator (operator<<) to work directly with strings is more elegant than having to work with ostringstreams and then converting back to strings. Is there a reason why c++ doesn't do this out of the box?
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
template <class T>
string& operator<<(string& s, T a) {
ostringstream ss;
ss << a;
s.append(ss.str());
return s;
}
int main() {
string s;
// this prints out: "inserting text and a number(1)"
cout << (s << "inserting text and a number (" << 1 << ")\n");
// normal way
ostringstream os;
os << "inserting text and a number(" << 1 << ")\n";
cout << os.str();
}
Streams contain additional state. Imagine if this were possible:
std::string str;
int n = 1234;
str << std::hex;
str << n;
return str; // returns "0x4d2" (or something, I forget)
In order to maintain this additional state, strings would have to have storage for this state. The C++ standards committee (and C++ programmers in general) have generally frowned upon superfluous resource consumption, under the motto "pay only for what you use". So, no extra fields in the string class.
The subjective answer: is that I think the std::string class was quite poorly designed to begin with, especially compared to other parts of C++'s excellent standard library, and adding features to std::string is just going to make things worse. This is a very subjective opinion and feel free to dismiss me as a raving lunatic.
The problem with the idea of strings being output streams is that they would become too heavy.
Strings are intended to "hold string data", not to format some output. Output streams have a heavy "state" which can be manipulated (see <iomanip>) and thus has to be stored. This means that, of course, this has to be stored for every string in every program, but almost none of them are used as an output stream; so it's a huge waste of resources.
C++ follows the "zero overhead" design principle (or at least no more overhead than totally necessary). Not having a string class which doesn't add any unnecessary overhead would be a huge violation of this design principle. If this was the case: what would people do in overhead-critical cases? Use C-strings... ouch!
In C++11, an alternative is to use the operator+= with std::to_string to append to a string, which can also be chained like the operator<< of the output stream. You can wrap both += and to_string in a nice operator<< for string if you like:
template <class Number>
std::string& operator<<(std::string& s, Number a) {
return s += std::to_string(a);
}
std::string& operator<<(std::string& s, const char* a) {
return s += a;
}
std::string& operator<<(std::string& s, const std::string &a) {
return s += a;
}
Your example, updated using this method: http://ideone.com/4zbVtD
Probably lost in the depths of time now but formatted output was always associated with streams in C (since they didn't have "real" strings) and this may have been carried over into C++ (which was, after all, C with classes). In C, the way to format to a string is to use sprintf, a variation on fprintf, the output-to-stream function.
Obviously conjecture on my part but someone probably thought similarly to yourself that these formatting things in the streams would be brilliant to have on strings as well, so they subclassed the stream classes to produce one that used a string as it's "output".
That seems the elegant solution to getting it working as quickly as possible. Otherwise, you would have had formatting code duplicated in streams and strings.

Bind temporary to non-const reference

Rationale
I try to avoid assignments in C++ code completely. That is, I use only initialisations and declare local variables as const whenever possible (i.e. always except for loop variables or accumulators).
Now, I’ve found a case where this doesn’t work. I believe this is a general pattern but in particular it arises in the following situation:
Problem Description
Let’s say I have a program that loads the contents of an input file into a string. You can either call the tool by providing a filename (tool filename) or by using the standard input stream (cat filename | tool). Now, how do I initialise the string?
The following doesn’t work:
bool const use_stdin = argc == 1;
std::string const input = slurp(use_stdin ? static_cast<std::istream&>(std::cin)
: std::ifstream(argv[1]));
Why doesn’t this work? Because the prototype of slurp needs to look as follows:
std::string slurp(std::istream&);
That is, the argument i non-const and as a consequence I cannot bind it to a temporary. There doesn’t seem to be a way around this using a separate variable either.
Ugly Workaround
At the moment, I use the following solution:
std::string input;
if (use_stdin)
input = slurp(std::cin);
else {
std::ifstream in(argv[1]);
input = slurp(in);
}
But this is rubbing me the wrong way. First of all it’s more code (in SLOCs) but it’s also using an if instead of the (here) more logical conditional expression, and it’s using assignment after declaration which I want to avoid.
Is there a good way to avoid this indirect style of initialisation? The problem can likely be generalised to all cases where you need to mutate a temporary object. Aren’t streams in a way ill-designed to cope with such cases (a const stream makes no sense, and yet working on a temporary stream does make sense)?
Why not simply overload slurp?
std::string slurp(char const* filename) {
std::ifstream in(filename);
return slurp(in);
}
int main(int argc, char* argv[]) {
bool const use_stdin = argc == 1;
std::string const input = use_stdin ? slurp(std::cin) : slurp(argv[1]);
}
It is a general solution with the conditional operator.
The solution with the if is more or less the standard solution when
dealing with argv:
if ( argc == 1 ) {
process( std::cin );
} else {
for ( int i = 1; i != argc; ++ i ) {
std::ifstream in( argv[i] );
if ( in.is_open() ) {
process( in );
} else {
std::cerr << "cannot open " << argv[i] << std::endl;
}
}
This doesn't handle your case, however, since your primary concern is to
obtain a string, not to "process" the filename args.
In my own code, I use a MultiFileInputStream that I've written, which
takes a list of filenames in the constructor, and only returns EOF when
the last has been read: if the list is empty, it reads std::cin. This
provides an elegant and simple solution to your problem:
MultiFileInputStream in(
std::vector<std::string>( argv + 1, argv + argc ) );
std::string const input = slurp( in );
This class is worth writing, as it is generally useful if you often
write Unix-like utility programs. It is definitly not trivial, however,
and may be a lot of work if this is a one-time need.
A more general solution is based on the fact that you can call a
non-const member function on a temporary, and the fact that most of the
member functions of std::istream return a std::istream&—a
non const-reference which will then bind to a non const reference. So
you can always write something like:
std::string const input = slurp(
use_stdin
? std::cin.ignore( 0 )
: std::ifstream( argv[1] ).ignore( 0 ) );
I'd consider this a bit of a hack, however, and it has the more general
problem that you can't check whether the open (called by the constructor
of std::ifstream worked.
More generally, although I understand what you're trying to achieve, I
think you'll find that IO will almost always represent an exception.
You can't read an int without having defined it first, and you can't
read a line without having defined the std::string first. I agree
that it's not as elegant as it could be, but then, code which correctly
handles errors is rarely as elegant as one might like. (One solution
here would be to derive from std::ifstream to throw an exception if
the open didn't work; all you'd need is a constructor which checked for
is_open() in the constructor body.)
All SSA-style languages need to have phi nodes to be usable, realistically. You would run into the same problem in any case where you need to construct from two different types depending on the value of the condition. The ternary operator cannot handle such cases. Of course, in C++11 there are other tricks, like moving the stream or suchlike, or using a lambda, and the design of IOstreams is virtually the exact antithesis of what you're trying to do, so in my opinion, you would just have to make an exception.
Another option might be an intermediate variable to hold the stream:
std::istream&& is = argc==1? std::move(cin) : std::ifstream(argv[1]);
std::string const input = slurp(is);
Taking advantage of the fact that named rvalue references are lvalues.

String Reference from String Literal C++

I'm hoping someone can help answer a question about strings in C++. I've tried to strip out any extraneous code from here, so it wont compile (missing namespace, defines, etc...). This is not a "bug" problem. If working code samples are needed, please specify what code you would like (for which question), I'd be happy to put something more detailed up.
//Foo.c
#define EXIT "exit"
Bar* bar; //See question C
//1
foo(const string& text) {
cout << text;
bar = new Bar(text); //See question C
}
//2
foo(const char* text) {
cout << text;
}
//3
foo(string text) {
cout << text;
}
int main() {
....
{ foo(EXIT); } //braces for scope, see question C)
bar->print(); //4
....
}
class Bar {
private const string& strBar;
Bar::Bar(const string& txt) : strBar(txt) { }
Bar::print() { cout << strBar; }
}
Assuming that only one of the three foo() methods is uncommented, they are not meant to be overloaded. I have a couple of questions here:
A) If I could figure out how to use OllyDbg well enough to fiddle the string literal "exit" into "axit" AFTER the call foo() is made, I believe the output would still be "exit" in case 1 and 3, and "exit" in case 2. Is this correct?
B) In case 1 and 3, I believe that because the method is asking for a String (even if it is a reference in case 1), there is an implicit call to the string constructor (it accepts const char*), and that constructor ALWAYS makes a copy, never a reference. (see cplusplus.com string page ) Is this correct (especially the ALWAYS)?
C) In case 1, if I initialised a new class which had a string& attribute to which I assigned the text variable, will this reference wind up pointing to bad memory when we leave the scope? IE, when we reach 4, I believe the following has happened (assuming foo(const string& text) is the uncommented function):
1. A temporary string object is create for the line foo(EXIT) that copies the literal.
2. The reference to the temp object is passed through to bar and to the strBar attribute
3. Once the code moves on and leaves the scope in which foo(EXIT) was called, I believe that the temp string object goes out of scope and disappears, which means strBar now references an area of memory with undefined contents, thinking it is still a string.
D) Going back to A, I believe in case 2 (foo(const char* text)) that this call to foo references the literal itself, not a copy, which is why fiddling with the literal in memory would change the output. Is this correct? Could I continue to pass the literal through (say to Bar) if I continued to use const char*?
E) How would you go about testing any of this beyond "this is how it works"? and "read the specs"? I don't need step by step instructions, but some ideas on what I should have done to answer the question myself using the tools I have available (Visual Studio, OllyDbg, suggestions for others?) would be great. I've spent a goodly amount of time trying to do it, and I'd like to hear what people have to say.
A) I don't know anything about OllyDbg, but in all cases std::ostream makes it's own copy of text before foo returns, so any changing of the variables after the call will not affect the output.
B) Yes, the string constructor will always make it's own copy of a char* during the implicit construction for the parameter.
C) Yes, when you call foo, a string is automatically created and used, and after the call ends, it is destroyed, leaving bar pointing at invalid memory.
D) You are correct. foo(const char* text) makes a copy of the pointer to the data, but does not copy the data. But since operator<<(ostream, char*) makes a copy of the data, changing the data will not affect the output. I don't see why you couldn't pass the const char* literal through.
E) Take a class, read a tutorial, or read the specs. Trial and error won't get you far in the standard library for this sort of question.
For these, the concept is encapsulation. The objects in the C++ standard library are all encapsulated, so that the results of any operation are what you would expect, and it is really hard to accidentally mess with their internals to make things fail or leak. If you tell ostream to print the data at a char *, it will (A) do it immediately, or (B) make it's own copy before it returns in case you mess with the char* later.

C++: Is it possible to get a std::string out of an object that overloads the << operator?

I have an object that can be printed to the console with std::cout << obj, but I can't get a std::string out of it, because it doesn't seem to implement something like a .string() method. I thought I might be able to use that overloaded operator to just get string representations of everything instead of having to implement a function to do it myself every time I need it, though having found nothing on the subject makes me think this isn't possible.
Use a std::ostringstream. It is a C++ stream implementation which writes to a string.
You can use a std::ostringstream.
std::ostringstream os;
os << obj;
std::string result = os.str();
There are different ways of doing it, you can manually implement it in terms of std::ostringstream, or you can use a prepacked version of it in boost::lexical_cast. For more complex operations, you can implement a in-place string builder like the one I provided as an answer here (this solves a more complex problem of building generic strings, but if you want to check it is a simple generic solution).
It seems that the linked question has been removed from StackOverflow, so I will provide the basic skeleton. The first think is to consider what we want to use with the in-place string builder, which basically is avoiding the need to use create unnecessary objects:
void f( std::string const & x );
f( make_string() << "Hello " << name << ", your are " << age << " years old." );
For that to work, make_string() must provide an object that is able to take advantage of the already existing operator<< for the different types. And the whole expression must be convertible to std::string. The basic implementation is rather simple:
class make_string {
std::ostringstream buffer;
public:
template <typename T>
make_string& operator<<( T const & obj ) {
buffer << obj;
return *this;
}
operator std::string() const {
return buffer.str();
}
};
This takes care of most of the implementation with the very least amount of code. It has some shortcomings, for example it does not take manipulators (make_string() << std::hex << 30), for that you have to provide extra overloads that take the manipulators (function pointers). There are other small issues with this implementation, most of which can be overcome by adding extra overloads, but the basic implementation above is enough for most regular cases.