Related
I am trying to write a lexer, when I try to copy isdigit buffer value in an array of char, I get this core dumped error although I have done the same thing with identifier without getting error.
#include<fstream>
#include<iostream>
#include<cctype>
#include <cstring>
#include<typeinfo>
using namespace std;
int isKeyword(char buffer[]){
char keywords[22][10] = {"break","case","char","const","continue","default", "switch",
"do","double","else","float","for","if","int","long","return","short",
"sizeof","struct","void","while","main"};
int i, flag = 0;
for(i = 0; i < 22; ++i){
if(strcmp(keywords[i], buffer) == 0)
{
flag = 1;
break;
}
}
return flag;
}
int isSymbol_Punct(char word)
{
int flag = 0;
char symbols_punct[] = {'<','>','!','+','-','*','/','%','=',';','(',')','{', '}','.'};
for(int x= 0; x< 15; ++x)
{
if(word==symbols_punct[x])
{
flag = 1;
break;
}
}
return flag;
}
int main()
{
char buffer[15],buffer1[15];
char identifier[30][10];
char number[30][10];
memset(&identifier[0], '\0', sizeof(identifier));
memset(&number[0], '\0', sizeof(number));
char word;
ifstream fin("program.txt");
if(!fin.is_open())
{
cout<<"Error while opening the file"<<endl;
}
int i,k,j,l=0;
while (!fin.eof())
{
word = fin.get();
if(isSymbol_Punct(word)==1)
{
cout<<"<"<<word<<", Symbol/Punctuation>"<<endl;
}
if(isalpha(word))
{
buffer[j++] = word;
// cout<<"buffer: "<<buffer<<endl;
}
else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (j != 0))
{
buffer[j] = '\0';
j = 0;
if(isKeyword(buffer) == 1)
cout<<"<"<<buffer<<", keyword>"<<endl;
else
{
cout<<"<"<<buffer<<", identifier>"<<endl;
strcpy(identifier[i],buffer);
i++;
}
}
else if(isdigit(word))
{
buffer1[l++] = word;
cout<<"buffer: "<<buffer1<<endl;
}
else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (l != 0))
{
buffer1[l] = '\0';
l = 0;
cout<<"<"<<buffer1<<", number>"<<endl;
// cout << "Type is: "<<typeid(buffer1).name() << endl;
strcpy(number[k],buffer1);
k++;
}
}
cout<<"Identifier Table"<<endl;
int z=0;
while(strcmp(identifier[z],"\0")!=0)
{
cout <<z<<"\t\t"<< identifier[z]<<endl;
z++;
}
// cout<<"Number Table"<<endl;
// int y=0;
// while(strcmp(number[y],"\0")!=0)
// {
// cout <<y<<"\t\t"<< number[y]<<endl;
// y++;
// }
}
I am getting this error when I copy buffer1 in number[k] using strcpy. I do not understand why it is not being copied. When i printed the type of buffer1 to see if strcpy is not generating error, I got A_15, I searched for it, but did not find any relevant information.
The reason is here (line 56):
int i,k,j,l=0;
You might think that this initializes i, j, k, and l to 0, but in fact it only initializes l to 0. i, j, and k are declared here, but not initialized to anything. As a result, they contain random garbage, so if you use them as array indices you are likely to end up overshooting the bounds of the array in question.
At that point, anything could happen—in other words, this is undefined behavior. One likely outcome, which is probably happening to you, is that your program tries to access memory that hasn't been assigned to it by the operating system, at which point it crashes (a segmentation fault).
To give a concrete demonstration of what I mean, consider the following program:
#include <iostream>
void print_var(std::string name, int v)
{
std::cout << name << ": " << v << "\n";
}
int main(void)
{
int i, j, k, l = 0;
print_var("i", i);
print_var("j", j);
print_var("k", k);
print_var("l", l);
return 0;
}
When I ran this, I got the following:
i: 32765
j: -113535829
k: 21934
l: 0
As you can see, i, j, and k all came out such that using them as indices into any of the arrays you declared would exceed their bounds. Unless you are very lucky, this will happen to you, too.
You can fix this by initializing each variable separately:
int i = 0;
int j = 0;
int k = 0;
int l = 0;
Initializing each on its own line makes the initializations easier to see, helping to prevent mistakes.
A few side notes:
I was able to spot this issue immediately because I have my development environment configured to flag lines that provoke compiler warnings. Using a variable before it's being initialized should provoke such a warning if you're using a reasonable compiler, so you can fix problems like this as you run into them. Your development environment may support the same feature (and if it doesn't, you might consider switching to something that does). If nothing else, you can turn on warnings during compilation (by passing -Wall -Wextra to your compiler or the like—check its documentation for the specifics).
Since you declared your indices as int, they are signed integers, which means they can hold negative values (as j did in my demonstration). If you try to index into an array using a negative index, you will end up dereferencing a pointer to a location "behind" the start of the array in memory, so you will be in trouble even with an index of -1 (remember that a C-style array is basically just a pointer to the start of the array). Also, int probably has only 32 bits in your environment, so if you're writing 64-bit code then it's possible to define arrays too large for an int to fully cover, even if you were to index into the array from the middle. For these sorts of reasons, it's generally a good idea to type raw array indices as std::size_t, which is always capable of representing the size of the largest possible array in your target environment, and also is unsigned.
You describe this as C++ code, but I don't see much C++ here aside from the I/O streams. C++ has a lot of amenities that can help you guard against bugs compared to C-style code (which has to be written with great care). For example, you could replace your C-style arrays here with instances of std::array, which has a member function at() that does subscripting with bounds checking; that would have thrown a helpful exception in this case instead of having your program segfault. Also, it doesn't seem like you have a particular need for fixed-size arrays in this case, so you may better off using std::vector; this will automatically grow to accommodate new elements, helping you avoid writing outside the vector's bounds. Both support range-based for loops, which save you from needing to deal with indices by hand at all. You might enjoy Bjarne's A Tour of C++, which gives a nice overview of idiomatic C++ and will make all the wooly reference material easier to parse. (And if you want to pick up some nice C habits, both K&R and Kernighan and Pike's The Practice of Programming can save you much pain and tears).
Some general hints that might help you to avoid your cause of crash totally by design:
As this is C++, you should really refer to established C++ data types and schemes here as far as possible. I know, that distinct stuff in terms of parser/lexer writing can become quite low-level but at least for the things you want to achieve here, you should really appreciate that. Avoid plain arrays as far as possible. Use std::vector of uint8_t and/or std::string for instance.
Similar to point 1 and a consequence: Always use checked bounds iterations! You don't need to try to be better than the optimizer of your compiler, at least not here! In general, one should always avoid to duplicate container size information. With the stated C++ containers, this information is always provided on data source side already. If not possible for very rare cases (?), use constants for that, directly declared at/within data source definition/initialization.
Give your variables meaningful names, declare them as local to their used places as possible.
isXXX-methods - at least your ones, should return boolean values. You never return something else than 0 or 1.
A personal recommendation that is a bit controversional to be a general rule: Use early returns and abort criteria! Even after the check for file reading issues, you proceed further.
Try to keep your functions smart and non-boilerplate! Use sub-routines for distinct sub-tasks!
Try to avoid using namespace that globally! Even without exotic building schemes like UnityBuilds, this can become error-prone as hell for huger projects at latest.
the arrays keywords and symbols_punct should be at least static const ones. The optimizer will easily be able to recognize that but it's rather a help for you for fast code understanding at least. Try to use classes here to compound the things that belong together in a readable, adaptive, easy modifiable and reusable way. Always keep in mind, that you might want to understand your own code some months later still, maybe even other developers.
In java you can do something like:
Scanner sc = new Scanner(System.in);
int total = 0;
for(int i = 0; i<something;i++){
total+=sc.nextInt(); // <<< Doesn't require an extra variable
}
And my question is: can you do something similar in C or C++ ? and if there is, is it better?
This is what I currently do:
int total;
int aux; // <<< Need an extra variable to read input
for(int i = 0; i<something;i++){
scanf("%d",&aux);
total+=aux; // <<< and add the read value here
}
The obvious way to do it in C++ would be something like this:
int total = std::accumulate(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
0);
As it stands, this reads all the ints it can from the input file rather than requiring a separate specification of the number of input values. You could specify an N if you wanted to badly enough, but at least in my experience, you're not very likely to want that.
If you really want to specify N directly, the cleanest way to handle the situation would probably be to define an accumulate_n that works about like std::accumulate:
template <class InIt, class T>
T accumulate_n(InIt in, size_t n, T init) {
for (size_t i=0; i<n; i++)
init += *in++;
return init;
}
You'd use this about like the previous version, but (obviously enough) specifying the number of values to read:
int total = accumulate_n(std::istream_iterator<int>(std::cin),
something,
0);
I suppose I should add that (especially for production code) you'd probably want to add some constraints on the template parameters in the accumulate_n definition above. I also haven't tried to do anything about the possibility of bad input, such as containing something other than a number, or simply containing fewer items than specified. These can be dealt with, but offhand I don't remember how Java deals with them; I'd probably have to do some thinking/research to find/figure out exactly what reaction to such problems would be most appropriate.
Reading a number of variables from an input stream in c++ usually looks like follows (regarding your sample):
int total = 0;
int aux;
while(std::cin >> aux) {
// break on 'something' condition
total += aux;
}
So, I don't see a way how to do it without an auxiliary variable, that actually receives the value read from the std::istream, unless you provide a wrapper class yourself, that just provides that java like behavior.
"can you do something similar in C or C++ ? and if there is, is it better?"
I doubt it's worth to write such wrapper class for std::istream in c++ (You can just consider to use the std::accumulate() algorithm as mentioned in #JerryCoffin's answer).
For c language, there's no alternate choice, I actually can see/know about.
I want to generate some variables using for command. look at code below:
for (char ch='a'; ch<='z'; ch++)
int ch=0;
It just an example, after running code above, I want to have int a, int b, int c ...
another example:
for (int i=0; i<10; i++)
int NewiEnd=0;
For example after running code above, we will have int New1End, int New2End etc.
Hope I'm clear enough, How can I do such thing in C++??
No, not possible, not exactly. However, this is possible:
std::map<char,int> vars;
for (char ch='a'; ch<='z'; ch++)
vars[ch] = 0;
std::cout << vars['a'] << vars['b'] << vars['c'];
You can also have std::map<std::string, int>.
std::map<std::string,int> vars;
for (int i=0; i<10; i++)
vars["New" + std::to_string(i) + "End"] = 0;
std::cout << vars["New5End"];
What you're trying to do isn't possible in C or C++.
What you seem to want is a map of the type:
std::map<std::string, int> ints;
This will let you call "variables" by name:
ints["a"] = 0;
ints["myVariable"] = 10;
Or as given in your example:
std::map<char, int> ints;
for (char ch='a'; ch<='z'; ch++)
ints[ch] = 0;
If you are just about to use 'a' - 'z' you could use an array of ints:
int ints['z' + 1];
ints['a'] = 0;
ints['z'] = 0;
But this allocates unnecessary space for the ascii characters below 'a'.
In C/C++ the variable names have "gone away" by the time the code has been compiled and run. You can't print out the name of an existing variable at run time via "reflection"...much less make new named variables. People looking for this feature find out that the only generalized way you can do it falls down to using the preprocessor:
generic way to print out variable name in c++
The preprocessor could theoretically be applied to your problem as well, with certain constraints:
Writing a while loop in the C preprocessor
But anyone reading your code would probably drive a stake through your heart, and be justified in doing so. Both Sunday-morning laziness and a strong belief that it's not what you (should) want leads me to not try and write a working example. :-)
(For the curious, the preprocessor is not Turing-Complete, although there are some "interesting" experiments)
The nature of C/C++ is to have you build up named tables on an as-needed basis. The languages that offer this feature by default make you pay for the runtime tracking of names whether you wind up using reflection or not, and that's not in the spirit of this particular compiled language. Others have given you answers that are more on the right track.
Windows XP SP3. Core 2 Duo 2.0 GHz.
I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
Integer casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
c++: 6700 milliseconds
java: 1178 milliseconds
python: 6702 milliseconds
c++ is as slow as python and 6 times slower than java.
Double casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
c++: 56129 milliseconds
java: 2852 milliseconds
python: 30780 milliseconds
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
Edit 2012-04-11
rve quite rightly commented about lexical_cast's performance, providing a link:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
the following answer is still valid (if only for learning purposes)
there was probably an optimization introduced somewhere between the two versions (I'll search that)
which means that boost is still getting better and better
Original answer
Just to add info on Barry's and Motti's excellent answers:
Some background
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If lexical_cast was so wrong, someone would have hacked the library either with criticism or with code.
I guess you missed the point of lexical_cast's real value...
Comparing apples and oranges.
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
In Python, you are more or less doing the same.
As said by other posts, you are, in essence, using the Java and Python equivalents of sprintf (or the less standard itoa).
In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps sprintf would be better suited), but powerful in the sense of extensibility.
Comparing apples.
If you want to compare a Java Integer.toString method, then you should compare it with either C sprintf or C++ ostream facilities.
The C++ stream solution would be 6 times faster (on my g++) than lexical_cast, and quite less extensible:
inline void toString(const int value, std::string & output)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", value) ;
output = buffer ;
}
The C sprintf solution would be 8 times faster (on my g++) than lexical_cast but a lot less safe:
inline void toString(const int value, char * output)
{
sprintf(output, "%i", value) ;
}
Both solutions are either as fast or faster than your Java solution (according to your data).
Comparing oranges.
If you want to compare a C++ lexical_cast, then you should compare it with this Java pseudo code:
Source s ;
Target t = Target.fromString(Source(s).toString()) ;
Source and Target being of whatever type you want, including built-in types like boolean or int, which is possible in C++ because of templates.
Extensibility? Is that a dirty word?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
In the current case, in a naive viewpoint, lexical_cast will use the stream facilities to convert from a type A into a string stream, and then from this string stream into a type B.
This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use lexical_cast on it, without touching any single line of code.
So, what are the uses of lexical_cast?
The main uses of lexical casting are:
Ease of use (hey, a C++ cast that works for everything being a value!)
Combining it with template heavy code, where your types are parametrized, and as such you don't want to deal with specifics, and you don't want to know the types.
Still potentially relatively efficient, if you have basic template knowledge, as I will demonstrate below
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
This is the real point you missed, and this is the point that costs in performance terms.
But it's so slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the lexical_cast ease-of-use feature.
It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
#ifdef SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
namespace boost
{
template<>
std::string lexical_cast<std::string, int>(const int &arg)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", arg) ;
return buffer ;
}
}
#endif
By enabling this specialization of lexical_cast for strings and ints (by defining the macro SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.
And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the std::string internal buffer, we could avoid a temporary external buffer, for example).
You could specialize lexical_cast for int and double types. Use strtod and strtol in your's specializations.
namespace boost {
template<>
inline int lexical_cast(const std::string& arg)
{
char* stop;
int res = strtol( arg.c_str(), &stop, 10 );
if ( *stop != 0 ) throw_exception(bad_lexical_cast(typeid(int), typeid(std::string)));
return res;
}
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
int main(int argc, char* argv[])
{
std::string str = "22"; // SOME STRING
int int_str = boost::lexical_cast<int>( str );
std::string str2 = boost::lexical_cast<std::string>( str_int );
return 0;
}
This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than printf, because printf should parse format string.
lexical_cast is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.
(BTW, you may get better performance out of Java using the static version, Integer.toString(int). [1])
Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case lexical_cast is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.
[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
public class Test
{
static int instanceCall(int i)
{
String s = new Integer(i).toString();
return s == null ? 0 : 1;
}
static int staticCall(int i)
{
String s = Integer.toString(i);
return s == null ? 0 : 1;
}
public static void main(String[] args)
{
// count used to avoid dead code elimination
int count = 0;
// *** instance
// Warmup calls
for (int i = 0; i < 100; ++i)
count += instanceCall(i);
long start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += instanceCall(i);
long finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
// *** static
// Warmup calls
for (int i = 0; i < 100; ++i)
count += staticCall(i);
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += staticCall(i);
finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
if (count == 42)
System.out.println("bad result"); // prevent elimination of count
}
}
The runtimes, using JDK 1.6.0-14, server VM:
10MM Time taken: 688 ms
10MM Time taken: 547 ms
And in client VM:
10MM Time taken: 687 ms
10MM Time taken: 610 ms
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
What lexical cast is doing in your code can be simplified to this:
string Cast( int i ) {
ostringstream os;
os << i;
return os.str();
}
There is unfortunately a lot going on every time you call Cast():
a string stream is created possibly allocating memory
operator << for integer i is called
the result is stored in the stream, possibly allocating memory
a string copy is taken from the stream
a copy of the string is (possibly) created to be returned.
memory is deallocated
Thn in your own code:
s = Cast( i );
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
string s = Cast( i );
instead.
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
Unfortunately I don't have enough rep yet to comment...
lexical_cast is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary). lexical_cast is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and because lexical_cast must check for and convert iostream error signals. Thus:
a stream object has to be created and destroyed
in the string output case above, note that C++ compilers have a hard time avoiding buffer copies (an alternative is to format directly to the output buffer, like sprintf does, though sprintf won't safely handle buffer overruns)
lexical_cast has to check for stringstream errors (ss.fail()) in order to throw exceptions on conversion failures
lexical_cast is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).
Edit: I just found a message mentioning the use of BOOST_LEXICAL_CAST_ASSUME_C_LOCALE to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.
lexical_cast may or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
As Barry said, lexical_cast is very general, you should use a more specific alternative, for example check out itoa (int->string) and atoi (string -> int).
if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested thread regarding it.
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).
I use this very fast solution for POD types...
namespace DATATYPES {
typedef std::string TString;
typedef char* TCString;
typedef double TDouble;
typedef long THuge;
typedef unsigned long TUHuge;
};
namespace boost {
template<typename TYPE>
inline const DATATYPES::TString lexical_castNumericToString(
const TYPE& arg,
const DATATYPES::TCString fmt) {
enum { MAX_SIZE = ( std::numeric_limits<TYPE>::digits10 + 1 ) // sign
+ 1 }; // null
char buffer[MAX_SIZE] = { 0 };
if (sprintf(buffer, fmt, arg) < 0) {
throw_exception(bad_lexical_cast(typeid(TYPE),
typeid(DATATYPES::TString)));
}
return ( DATATYPES::TString(buffer) );
}
template<typename TYPE>
inline const TYPE lexical_castStringToNumeric(const DATATYPES::TString& arg) {
DATATYPES::TCString end = 0;
DATATYPES::TDouble result = std::strtod(arg.c_str(), &end);
if (not end or *end not_eq 0) {
throw_exception(bad_lexical_cast(typeid(DATATYPES::TString),
typeid(TYPE)));
}
return TYPE(result);
}
template<>
inline DATATYPES::THuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::THuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::THuge& arg) {
return (lexical_castNumericToString<DATATYPES::THuge>(arg,"%li"));
}
template<>
inline DATATYPES::TUHuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TUHuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TUHuge& arg) {
return (lexical_castNumericToString<DATATYPES::TUHuge>(arg,"%lu"));
}
template<>
inline DATATYPES::TDouble lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TDouble>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TDouble& arg) {
return (lexical_castNumericToString<DATATYPES::TDouble>(arg,"%f"));
}
} // end namespace boost
I've discovered that std::strings are very slow compared to old-fashioned null-terminated strings, so much slow that they significantly slow down my overall program by a factor of 2.
I expected STL to be slower, I didn't realise it was going to be this much slower.
I'm using Visual Studio 2008, release mode. It shows assignment of a string to be 100-1000 times slower than char* assignment (it's very difficult to test the run-time of a char* assignment). I know it's not a fair comparison, a pointer assignment versus string copy, but my program has lots of string assignments and I'm not sure I could use the "const reference" trick in all places. With a reference counting implementation my program would have been fine, but these implementations don't seem to exist anymore.
My real question is: why don't people use reference counting implementations anymore, and does this mean we all need to be much more careful about avoiding common performance pitfalls of std::string?
My full code is below.
#include <string>
#include <iostream>
#include <time.h>
using std::cout;
void stop()
{
}
int main(int argc, char* argv[])
{
#define LIMIT 100000000
clock_t start;
std::string foo1 = "Hello there buddy";
std::string foo2 = "Hello there buddy, yeah you too";
std::string f;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
f = foo1;
foo1 = foo2;
foo2 = f;
}
double stl = double(clock() - start) / CLOCKS\_PER\_SEC;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
}
double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = "Hello there buddy";
char* goo2 = "Hello there buddy, yeah you too";
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
g = goo1;
goo1 = goo2;
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
cout << "Empty loop = " << emptyLoop << "\n";
cout << "char* loop = " << charLoop << "\n";
cout << "std::string = " << stl << "\n";
cout << "slowdown = " << (stl - emptyLoop) / (charLoop - emptyLoop) << "\n";
std::string wait;
std::cin >> wait;
return 0;
}
Well there are definitely known problems regarding the performance of strings and other containers. Most of them have to do with temporaries and unnecessary copies.
It's not too hard to use it right, but it's also quite easy to Do It Wrong. For example, if you see your code accepting strings by value where you don't need a modifiable parameter, you Do It Wrong:
// you do it wrong
void setMember(string a) {
this->a = a; // better: swap(this->a, a);
}
You better had taken that by const reference or done a swap operation inside, instead of yet another copy. Performance penalty increases for a vector or list in that case. However, you are right definitely that there are known problems. For example in this:
// let's add a Foo into the vector
v.push_back(Foo(a, b));
We are creating one temporary Foo just to add a new Foo into our vector. In a manual solution, that might create the Foo directly into the vector. And if the vector reaches its capacity limit, it has to reallocate a larger memory buffer for its elements. What does it do? It copies each element separately to their new place using their copy constructor. A manual solution might behave more intelligent if it knows the type of the elements before-hand.
Another common problem is introduced temporaries. Have a look at this
string a = b + c + e;
There are loads of temporaries created, which you might avoid in a custom solution that you actually optimize onto performance. Back then, the interface of std::string was designed to be copy-on-write friendly. However, with threads becoming more popular, transparent copy on write strings have problems keeping their state consistent. Recent implementations tend to avoid copy on write strings and instead apply other tricks where appropriate.
Most of those problems are solved however for the next version of the Standard. For example instead of push_back, you can use emplace_back to directly create a Foo into your vector
v.emplace_back(a, b);
And instead of creating copies in a concatenation above, std::string will recognize when it concatenates temporaries and optimize for those cases. Reallocation will also avoid making copies, but will move elements where appropriate to their new places.
For an excellent read, consider Move Constructors by Andrei Alexandrescu.
Sometimes, however, comparisons also tend to be unfair. Standard containers have to support the features they have to support. For example if your container does not keep map element references valid while adding/removing elements from your map, then comparing your "faster" map to the standard map can become unfair, because the standard map has to ensure that elements keep being valid. That was just an example, of course, and there are many such cases that you have to keep in mind when stating "my container is faster than standard ones!!!".
It looks like you're misusing char* in the code you pasted. If you have
std::string a = "this is a";
std::string b = "this is b"
a = b;
you're performing a string copy operation. If you do the same with char*, you're performing a pointer copy operation.
The std::string assignment operation allocates enough memory to hold the contents of b in a, then copies each character one by one. In the case of char*, it does not do any memory allocation or copy the individual characters one by one, it just says "a now points to the same memory that b is pointing to."
My guess is that this is why std::string is slower, because it's actually copying the string, which appears to be what you want. To do a copy operation on a char* you'd need to use the strcpy() function to copy into a buffer that's already appropriately sized. Then you'll have an accurate comparison. But for the purposes of your program you should almost definitely use std::string instead.
When writing C++ code using any utility class (whether STL or your own) instead of eg. good old C null terminated strings, you need to rememeber a few things.
If you benchmark without compiler optimisations on (esp. function inlining), classes will lose. They are not built-ins, even stl. They are implemented in terms of method calls.
Do not create unnesessary objects.
Do not copy objects if possible.
Pass objects as references, not copies, if possible,
Use more specialised method and functions and higher level algorithms. Eg.:
std::string a = "String a"
std::string b = "String b"
// Use
a.swap(b);
// Instead of
std::string tmp = a;
a = b;
b = tmp;
And a final note. When your C-like C++ code starts to get more complex, you need to implement more advanced data structures like automatically expanding arrays, dictionaries, efficient priority queues. And suddenly you realise that its a lot of work and your classes are not really faster then stl ones. Just more buggy.
You are most certainly doing something wrong, or at least not comparing "fairly" between STL and your own code. Of course, it's hard to be more specific without code to look at.
It could be that you're structuring your code using STL in a way that causes more constructors to run, or not re-using allocated objects in a way that matches what you do when you implement the operations yourself, and so on.
This test is testing two fundamentally different things: a shallow copy vs. a deep copy. It's essential to understand the difference and how to avoid deep copies in C++ since a C++ object, by default, provides value semantics for its instances (as with the case with plain old data types) which means that assigning one to the other is generally going to copy.
I "corrected" your test and got this:
char* loop = 19.921
string = 0.375
slowdown = 0.0188244
Apparently we should cease using C-style strings since they are soooo much slower! In actuality, I deliberately made my test as flawed as yours by testing shallow copying on the string side vs. strcpy on the :
#include <string>
#include <iostream>
#include <ctime>
using namespace std;
#define LIMIT 100000000
char* make_string(const char* src)
{
return strcpy((char*)malloc(strlen(src)+1), src);
}
int main(int argc, char* argv[])
{
clock_t start;
string foo1 = "Hello there buddy";
string foo2 = "Hello there buddy, yeah you too";
start = clock();
for (int i=0; i < LIMIT; i++)
foo1.swap(foo2);
double stl = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = make_string("Hello there buddy");
char* goo2 = make_string("Hello there buddy, yeah you too");
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
g = make_string(goo1);
free(goo1);
goo1 = make_string(goo2);
free(goo2);
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
cout << "char* loop = " << charLoop << "\n";
cout << "string = " << stl << "\n";
cout << "slowdown = " << stl / charLoop << "\n";
string wait;
cin >> wait;
}
The main point is, and this actually gets to the heart of your ultimate question, you have to know what you are doing with the code. If you use a C++ object, you have to know that assigning one to the other is going to make a copy of that object (unless assignment is disabled, in which case you'll get an error). You also have to know when it's appropriate to use a reference, pointer, or smart pointer to an object, and with C++11, you should also understand the difference between move and copy semantics.
My real question is: why don't people use reference counting
implementations anymore, and does this mean we all need to be much
more careful about avoiding common performance pitfalls of
std::string?
People do use reference-counting implementations. Here's an example of one:
shared_ptr<string> ref_counted = make_shared<string>("test");
shared_ptr<string> shallow_copy = ref_counted; // no deep copies, just
// increase ref count
The difference is that string doesn't do it internally as that would be inefficient for those who don't need it. Things like copy-on-write are generally not done for strings either anymore for similar reasons (plus the fact that it would generally make thread safety an issue). Yet we have all the building blocks right here to do copy-on-write if we wish to do so: we have the ability to swap strings without any deep copying, we have the ability to make pointers, references, or smart pointers to them.
To use C++ effectively, you have to get used to this way of thinking involving value semantics. If you don't, you might enjoy the added safety and convenience but do it at heavy cost to the efficiency of your code (unnecessary copies are certainly a significant part of what makes poorly written C++ code slower than C). After all, your original test is still dealing with pointers to strings, not char[] arrays. If you were using character arrays and not pointers to them, you'd likewise need to strcpy to swap them. With strings you even have a built-in swap method to do exactly what you are doing in your test efficiently, so my advice is to spend a bit more time learning C++.
If you have an indication of the eventual size of your vector you can prevent excessive resizes by calling reserve() before filling it up.
The main rules of optimization:
Rule 1: Don't do it.
Rule 2: (For experts only) Don't do it yet.
Are you sure that you have proven that it is really the STL that is slow, and not your algorithm?
Good performance isn't always easy with STL, but generally, it is designed to give you the power. I found Scott Meyers' "Effective STL" an eye-opener for understanding how to deal with the STL efficiently. Read!
As others said, you are probably running into frequent deep copies of the string, and compare that to a pointer assignment / reference counting implementation.
Generally, any class designed towards your specific needs, will beat a generic class that's designed for the general case. But learn to use the generic class well, and learn to ride the 80:20 rules, and you will be much more efficient than someone rolling everything on their own.
One specific drawback of std::string is that it doesn't give performance guarantees, which makes sense. As Tim Cooper mentioned, STL does not say whether a string assignment creates a deep copy. That's good for a generic class, because reference counting can become a real killer in highly concurrent applications, even though it's usually the best way for a single threaded app.
They didn't go wrong. STL implementation is generally speaking better than yours.
I'm sure that you can write something better for a very particular case, but a factor of 2 is too much... you really must be doing something wrong.
If used correctly, std::string is as efficient as char*, but with the added protection.
If you are experiencing performance problems with the STL, it's likely that you are doing something wrong.
Additionally, STL implementations are not standard across compilers. I know that SGI's STL and STLPort perform generally well.
That said, and I am being completely serious, you could be a C++ genius and have devised code that is far more sophisticated than the STL. It's not likely , but who knows, you could be the LeBron James of C++.
I would say that STL implementations are better than the traditional implementations. Also did you try using a list instead of a vector, because vector is efficient for some purpose and list is efficient for some other
std::string will always be slower than C-strings. C-strings are simply a linear array of memory. You cannot get any more efficient than that, simply as a data structure. The algorithms you use (like strcat() or strcpy()) are generally equivalent to the STL counterparts. The class instantiation and method calls will be, in relative terms, significantly slower than C-string operations (even worse if the implementation uses virtuals). The only way you could get equivalent performance is if the compiler does optimization.
string const string& char* Java string
---------------------------------------------------------------------------------------------------
Efficient no ** yes yes yes
assignment
Thread-safe yes yes yes yes
memory management yes no no yes
done for you
** There are 2 implementations of std::string: reference counting or deep-copy. Reference counting introduces performance problems in multi-threaded programs, EVEN for just reading strings, and deep-copy is obviously slower as shown above. See:
Why VC++ Strings are not reference counted?
As this table shows, 'string' is better than 'char*' in some ways and worse in others, and 'const string&' is similar in properties to 'char*'. Personally I'm going to continue using 'char*' in many places. The enormous amount of copying of std::string's that happens silently, with implicit copy constructors and temporaries makes me somewhat ambivalent about std::string.
A large part of the reason might be the fact that reference-counting is no longer used in modern implementations of STL.
Here's the story (someone correct me if I'm wrong): in the beginning, STL implementations used reference counting, and were fast but not thread-safe - the implementors expected application programmers to insert their own locking mechanisms at higher levels, to make them thread-safe, because if locking was done at 2 levels then this would slow things down twice as much.
However, the programmers of the world were too ignorant or lazy to insert locks everywhere. For example, if a worker thread in a multi-threaded program needed to read a std::string commandline parameter, then a lock would be needed even just to read the string, otherwise crashes could ensue. (2 threads increment the reference count simultaneously on different CPU's (+1), but decrement it separately (-2), so the reference count goes down to zero, and the memory is freed.)
So implementors ditched reference counting and instead had each std::string always own its own copy of the string. More programs worked, but they were all slower.
So now, even a humble assignment of one std::string to another, (or equivalently, passing a std::string as a parameter to a function), takes about 400 machine code instructions instead of the 2 it takes to assign a char*, a slowdown of 200 times.
I tested the magnitude of the inefficiency of std::string on one major program, which had an overall slowdown of about 100% compared with null-terminated strings. I also tested raw std::string assignment using the following code, which said that std::string assignment was 100-900 times slower. (I had trouble measuring the speed of char* assignment). I also debugged into the std::string operator=() function - I ended up knee deep in the stack, about 7 layers deep, before hitting the 'memcpy()'.
I'm not sure there's any solution. Perhaps if you need your program to be fast, use plain old C++, and if you're more concerned about your own productivity, you should use Java.
#define LIMIT 800000000
clock_t start;
std::string foo1 = "Hello there buddy";
std::string foo2 = "Hello there buddy, yeah you too";
std::string f;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
f = foo1;
foo1 = foo2;
foo2 = f;
}
double stl = double(clock() - start) / CLOCKS_PER_SEC;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
}
double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = "Hello there buddy";
char* goo2 = "Hello there buddy, yeah you too";
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
g = goo1;
goo1 = goo2;
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
TfcMessage("done", 'i', "Empty loop = %1.3f s\n"
"char* loop = %1.3f s\n"
"std::string loop = %1.3f s\n\n"
"slowdown = %f",
emptyLoop, charLoop, stl,
(stl - emptyLoop) / (charLoop - emptyLoop));