Meaning of acronym SSO in the context of std::string - c++

In a C++ question about optimization and code style, several answers referred to "SSO" in the context of optimizing copies of std::string. What does SSO mean in that context?
Clearly not "single sign on". "Shared string optimization", perhaps?

Background / Overview
Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.
SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.
EDIT:
I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".
Implementation details
At the minimum, a std::string needs to store the following information:
The size
The capacity
The location of the data
The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.
You don't pay for what you don't use.
First, consider the naive implementation based on what I outlined above:
class string {
public:
// all 83 member functions
private:
std::unique_ptr<char[]> m_data;
size_type m_size;
size_type m_capacity;
std::array<char, 16> m_sso;
};
For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):
class string {
public:
// all 83 member functions
private:
size_type m_size;
union {
class {
// This is probably better designed as an array-like class
std::unique_ptr<char[]> m_data;
size_type m_capacity;
} m_large;
std::array<char, sizeof(m_large)> m_small;
};
};
I'd assume that most implementations look more like this.

SSO is the abbreviation for "Small String Optimization", a technique where small strings are embedded in the body of the string class rather than using a separately allocated buffer.

As already explained by the other answers, SSO means Small / Short String Optimization.
The motivation behind this optimization is the undeniable evidence that applications in general handle much more shorter strings than longer strings.
As explained by David Stone in his answer above, the std::string class uses an internal buffer to store contents up to a given length, and this eliminates the need to dynamically allocate memory. This makes the code more efficient and faster.
This other related answer clearly shows that the size of the internal buffer depends on the std::string implementation, which varies from platform to platform (see benchmark results below).
Benchmarks
Here is a small program that benchmarks the copy operation of lots of strings with the same length.
It starts printing the time to copy 10 million strings with length = 1.
Then it repeats with strings of length = 2. It keeps going until the length is 50.
#include <string>
#include <iostream>
#include <vector>
#include <chrono>
static const char CHARS[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static const int ARRAY_SIZE = sizeof(CHARS) - 1;
static const int BENCHMARK_SIZE = 10000000;
static const int MAX_STRING_LENGTH = 50;
using time_point = std::chrono::high_resolution_clock::time_point;
void benchmark(std::vector<std::string>& list) {
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
// force a copy of each string in the loop iteration
for (const auto s : list) {
std::cout << s;
}
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cerr << list[0].length() << ',' << duration << '\n';
}
void addRandomString(std::vector<std::string>& list, const int length) {
std::string s(length, 0);
for (int i = 0; i < length; ++i) {
s[i] = CHARS[rand() % ARRAY_SIZE];
}
list.push_back(s);
}
int main() {
std::cerr << "length,time\n";
for (int length = 1; length <= MAX_STRING_LENGTH; length++) {
std::vector<std::string> list;
for (int i = 0; i < BENCHMARK_SIZE; i++) {
addRandomString(list, length);
}
benchmark(list);
}
return 0;
}
If you want to run this program, you should do it like ./a.out > /dev/null so that the time to print the strings isn't counted.
The numbers that matter are printed to stderr, so they will show up in the console.
I have created charts with the output from my MacBook and Ubuntu machines.
Note that there is a huge jump in the time to copy the strings when the length reaches a given point.
That's the moment when strings don't fit in the internal buffer anymore and memory allocation has to be used.
Note also that on the linux machine, the jump happens when the length of the string reaches 16.
On the macbook, the jump happens when the length reaches 23. This confirms that SSO depends on the platform implementation.
Ubuntu
Macbook Pro

Related

Guaranteed allocation size in std::string initialization.\

There are a variety of questions around this topic already on this site:
Guarantees on C++ std::string heap memory allocation?
std::string allocation policy
My question is different from these in that I am interested in how std::string determines its original capacity and how I can rightsize a string assuming that I know how many bytes exactly I will need. Calling reserve(n) can result in the string allocating more memory and I need it to be 24 bytes (right above the sso threshold, can't fit under). Overallocation would be quite dramatic as I potentially hold millions of these in memory, so if it, e.g., aligns at 32 bytes the 33% overhead really hurts. Naturally I would also like to avoid the potential reallocation from shrink_to_fit.
My understanding is that you can get an exact allocation size by initializing as std::string(rightsized_constant, size) via the const char*, size_t ctor, but of course nothing guarantees this.
Is there a reasonably clean way to get this using std::string?
Solution I used after some thinking, discussion with a coworker and brief experiments with clang++/g++ on Linux.
template<typename T>
class Reader {
void operator()(T key, std::string* append_to);
};
...
Reader a;
Reader b;
std::string s;
std::vector<KeyType> keys = ...;
std::vector<std::string> out;
out.reserve(keys.size());
for (const auto& key : keys) {
a(key, &s);
b(key, &s);
out.emplace_back(s); // relies on assumption that copy will rightsize
s.clear(); // relies on the implicit guarantee that this doesn't release memory
} // s rightsized after first iteration
Still curious if anything actually gives a guarantee.

What difference does it make (in memory terms) if I declare arguments as variables in advance instead of writing them in-line of the function call?

For example, for the dummy function write(int length, const char* text){...}, is there any difference in terms of memory between these two approaches?
write(18,"The cake is a lie.");
or
int len = 18;
char txt[19] = "The cake is a lie.";
write(len,txt)
Bonus: what if there's some repetition? i.e. A loop calls the function repeatedly using an array whose elements are the intended arguments.
I'm asking this question, especially the bonus, in hopes of better understanding how each consumes memory to optimize my efficiency when writing on memory-sensitive platforms like Arduino. That said, if you know of an even more efficient way, please share! Thanks!
It depends on whether char txt[19] is declared in scope of a function or at a global (or namespace) scope.
If in scope of a function, then txt will be allocated on the stack and initialized at run time from a copy of the string literal residing in a (read-only) data segment.
If at global scope, then it will be allocated at build time in the data segment.
Bonus: if it's allocated in some sub-scope, like a loop body, then you should assume it will be initialized during every loop iteration (the optimizer might do some tricks but don't count on it).
Example 1:
int len = 18;
char txt[19] = "The cake is a lie.";
int main() {
write(len,txt);
}
Here len (an int) and txt (19 bytes + alignment padding) will be allocated in the program's data segment at build time.
Example 2:
int main() {
int len = 18;
char txt[19] = "The cake is a lie.";
write(len,txt);
}
Here the string literal "The cake is a lie." will be allocated in the program's data segment at build time. In addition, len and txt (19 bytes + padding) may be allocated on the stack at run time. The optimizer may omit the len allocation and maybe even txt, but don't count on it, as it's going to depend on many factors, like whether write body is available, what it does exactly, the quality of the optimizer, etc. When in doubt, look at the generated code (godbolt now supports AVR targets).
Example 3:
int main() {
write(18,"The cake is a lie.");
}
Here the string literal "The cake is a lie." will be allocated in the program's data segment at build time. The 18 will be embedded in the program code.
Since you're developing on AVR, there are some additional specifics worth mentioning, namely the application's executable is initially stored in the Flash, and once you "run" it, it is copied to the RAM. It is possible to avoid copying to RAM and keep the data in the Flash using the PROGMEM keyword (though to do anything meaningful with the data you will need to copy it to RAM).

Using sizeof for classes

There is a simple C++ class
class LASet
{
public:
long int maxValue, minValue;
int _count;
set <long int> _mySet;
LASet()
{
maxValue = 0;
_count = 0;
minValue =std::numeric_limits<long int>::max();
_mySet.clear();
}
void Add(long int value)
{
if (_mySet.find(value) != _mySet.end())
return;
if (value > maxValue)
maxValue = value;
if (value < minValue)
minValue = value;
_mySet.insert(value);
_count++;
}
// some helper functions....
};
As I instantiate some objects from this class, I want to find their sizes during the runtime. So, I simply wrote the sizeof after many cycles of the execution for each object. For example:
LASet LBAStartSet;
...
...
cout << "sizeof LBAStartSet = " << sizeof( LBAStartSet ) << '\n';
That cout line reports only 72 for all objects means 72B, but top command shows 15GB of memory.
I read the manual at cppreference that sizeof doesn't work on some incomplete data types. Is my data type incomplete?
Any feedback is appreciated.
As I instantiate some objects from this class, I want to find their sizes during the runtime.
You seem to assume that the size of an object can change at run time. That's not possible. The size of an object is the same at compile time, as it is through the entire runtime. It never changes after compilation.
Is my data type incomplete?
No. If it were, then your program would be ill-formed and the compiler would likely refuse to compile. A type is incomplete if it is only declared, but not defined. You've defined LASet, so it is complete.
You seem to also assume, that elements within std::set (I assume that's what you use) increase the size of the set object. They don't. They can't, because the size always remains and has to remain the same. Instead, the objects are stored outside the object, in the so called dynamic storage.
So, the memory used by your program is roughly:
sizeof LBAStartSet
+ LBAStartSet._mySet.size() * (sizeof(long) + overhead_of_set_node + padding)
+ overhead_of_dynamic_allocation
+ static_objects
Static objects include things like std::cout. Overhead of dynamic allocation depends on implementation, but it could be O(n) in the number of dynamically allocated objects.
I want to monitor the growth of the data size during the execution. Is there any way to find that?
Not in standard c++. But there are OS specific ways. In Linux for example, there is the /proc pseudo-filesystem and you may be interested in contents of /proc/self/status
I think the problem is in the member _myset. It is an object, and sizeof(LASet) does not increase when you insert values in _myset. It's likely that sizeof(LAset) is evaluated at compile-time.
You could use set::size to find out how many elements are in a set. If you do that for all instances of LASet you could get rough approximation of the needed memory.

C++ array and vector dynamic item size

I guess I'm still not understanding the limitations of C++ containers and arrays. According to this post and this It is impossible to store items of dynamic size in an STL vector.
However with the following code I can dynamically re-size an element of a vector with the results one would expect if it was ok to have items of varying and changing size in a vector.
string test = "TEST";
vector<string> studentsV;
for (int i = 0; i < 5; ++i)
{
studentsV.push_back(test);
}
studentsV[2].resize(100);
for (string s : studentsV)
{
cout << s << "end" << endl;
}
Result:
TESTend
TESTend
TEST
end
TESTend
TESTend
I can re-size the string element to any size, and it works fine. I can also do the same with a regular C-style array. So, what is the difference between the above posts and what I am doing, and can you give an example of what "dynamic item size" really means, because apparently I am not understanding.
A std::string uses dynamic memory to increase the size of the string being stored. This is not what those articles are talking about.
What they mean, is that sizeof(std::string) is constant. The actual object representing a std::string will always have the same size, but it might do additional allocations in another part of memory.
A std::vector is really just a friendly wrapper around a dynamically-sized array. The definition of an array in C or C++ is a contiguous block of memory where all elements are of equal size.
can you give an example of what "dynamic item size" really means, because apparently I am not understanding.
This is the core of your question.
Namely: if all C++ classes (even ones that manage dynamic memory as part of their implementations) have a fixed and known footprint size via sizeof()...just what sort of thing is it that you can't put in a std::vector?
Since something like a std::string and a std::bitset are classes of different sizes, you couldn't have a vector of [string string bitset string bitset string]. But the type system already wouldn't let you do that. So that can't be what they're talking about.
They're just saying there's no hook for supporting structures like this from the C world:
struct packetheader {
int id;
int filename_len;
};
struct packet {
struct packetheader h;
char filename[1];
};
You couldn't make a std::vector<packet> and expect to find some parameter to push_back letting you specify a per-item size. You'd lose any data you'd allocated outside of the structure boundary.
So to use something like that, you'd have to do std::vector<packet*> and store pointers.
The size of std::string is not dynamic. std::string is probably implemented with a pointer to a dynamically allocated memory. This makes sizeof(std::string) static and possibly different from the size of the actual string.

Very poor boost::lexical_cast performance

Windows XP SP3. Core 2 Duo 2.0 GHz.
I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
Integer casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
c++: 6700 milliseconds
java: 1178 milliseconds
python: 6702 milliseconds
c++ is as slow as python and 6 times slower than java.
Double casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
c++: 56129 milliseconds
java: 2852 milliseconds
python: 30780 milliseconds
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
Edit 2012-04-11
rve quite rightly commented about lexical_cast's performance, providing a link:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
the following answer is still valid (if only for learning purposes)
there was probably an optimization introduced somewhere between the two versions (I'll search that)
which means that boost is still getting better and better
Original answer
Just to add info on Barry's and Motti's excellent answers:
Some background
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If lexical_cast was so wrong, someone would have hacked the library either with criticism or with code.
I guess you missed the point of lexical_cast's real value...
Comparing apples and oranges.
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
In Python, you are more or less doing the same.
As said by other posts, you are, in essence, using the Java and Python equivalents of sprintf (or the less standard itoa).
In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps sprintf would be better suited), but powerful in the sense of extensibility.
Comparing apples.
If you want to compare a Java Integer.toString method, then you should compare it with either C sprintf or C++ ostream facilities.
The C++ stream solution would be 6 times faster (on my g++) than lexical_cast, and quite less extensible:
inline void toString(const int value, std::string & output)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", value) ;
output = buffer ;
}
The C sprintf solution would be 8 times faster (on my g++) than lexical_cast but a lot less safe:
inline void toString(const int value, char * output)
{
sprintf(output, "%i", value) ;
}
Both solutions are either as fast or faster than your Java solution (according to your data).
Comparing oranges.
If you want to compare a C++ lexical_cast, then you should compare it with this Java pseudo code:
Source s ;
Target t = Target.fromString(Source(s).toString()) ;
Source and Target being of whatever type you want, including built-in types like boolean or int, which is possible in C++ because of templates.
Extensibility? Is that a dirty word?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
In the current case, in a naive viewpoint, lexical_cast will use the stream facilities to convert from a type A into a string stream, and then from this string stream into a type B.
This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use lexical_cast on it, without touching any single line of code.
So, what are the uses of lexical_cast?
The main uses of lexical casting are:
Ease of use (hey, a C++ cast that works for everything being a value!)
Combining it with template heavy code, where your types are parametrized, and as such you don't want to deal with specifics, and you don't want to know the types.
Still potentially relatively efficient, if you have basic template knowledge, as I will demonstrate below
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
This is the real point you missed, and this is the point that costs in performance terms.
But it's so slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the lexical_cast ease-of-use feature.
It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
#ifdef SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
namespace boost
{
template<>
std::string lexical_cast<std::string, int>(const int &arg)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", arg) ;
return buffer ;
}
}
#endif
By enabling this specialization of lexical_cast for strings and ints (by defining the macro SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.
And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the std::string internal buffer, we could avoid a temporary external buffer, for example).
You could specialize lexical_cast for int and double types. Use strtod and strtol in your's specializations.
namespace boost {
template<>
inline int lexical_cast(const std::string& arg)
{
char* stop;
int res = strtol( arg.c_str(), &stop, 10 );
if ( *stop != 0 ) throw_exception(bad_lexical_cast(typeid(int), typeid(std::string)));
return res;
}
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
int main(int argc, char* argv[])
{
std::string str = "22"; // SOME STRING
int int_str = boost::lexical_cast<int>( str );
std::string str2 = boost::lexical_cast<std::string>( str_int );
return 0;
}
This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than printf, because printf should parse format string.
lexical_cast is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.
(BTW, you may get better performance out of Java using the static version, Integer.toString(int). [1])
Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case lexical_cast is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.
[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
public class Test
{
static int instanceCall(int i)
{
String s = new Integer(i).toString();
return s == null ? 0 : 1;
}
static int staticCall(int i)
{
String s = Integer.toString(i);
return s == null ? 0 : 1;
}
public static void main(String[] args)
{
// count used to avoid dead code elimination
int count = 0;
// *** instance
// Warmup calls
for (int i = 0; i < 100; ++i)
count += instanceCall(i);
long start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += instanceCall(i);
long finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
// *** static
// Warmup calls
for (int i = 0; i < 100; ++i)
count += staticCall(i);
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += staticCall(i);
finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
if (count == 42)
System.out.println("bad result"); // prevent elimination of count
}
}
The runtimes, using JDK 1.6.0-14, server VM:
10MM Time taken: 688 ms
10MM Time taken: 547 ms
And in client VM:
10MM Time taken: 687 ms
10MM Time taken: 610 ms
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
What lexical cast is doing in your code can be simplified to this:
string Cast( int i ) {
ostringstream os;
os << i;
return os.str();
}
There is unfortunately a lot going on every time you call Cast():
a string stream is created possibly allocating memory
operator << for integer i is called
the result is stored in the stream, possibly allocating memory
a string copy is taken from the stream
a copy of the string is (possibly) created to be returned.
memory is deallocated
Thn in your own code:
s = Cast( i );
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
string s = Cast( i );
instead.
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
Unfortunately I don't have enough rep yet to comment...
lexical_cast is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary). lexical_cast is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and because lexical_cast must check for and convert iostream error signals. Thus:
a stream object has to be created and destroyed
in the string output case above, note that C++ compilers have a hard time avoiding buffer copies (an alternative is to format directly to the output buffer, like sprintf does, though sprintf won't safely handle buffer overruns)
lexical_cast has to check for stringstream errors (ss.fail()) in order to throw exceptions on conversion failures
lexical_cast is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).
Edit: I just found a message mentioning the use of BOOST_LEXICAL_CAST_ASSUME_C_LOCALE to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.
lexical_cast may or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
As Barry said, lexical_cast is very general, you should use a more specific alternative, for example check out itoa (int->string) and atoi (string -> int).
if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested thread regarding it.
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).
I use this very fast solution for POD types...
namespace DATATYPES {
typedef std::string TString;
typedef char* TCString;
typedef double TDouble;
typedef long THuge;
typedef unsigned long TUHuge;
};
namespace boost {
template<typename TYPE>
inline const DATATYPES::TString lexical_castNumericToString(
const TYPE& arg,
const DATATYPES::TCString fmt) {
enum { MAX_SIZE = ( std::numeric_limits<TYPE>::digits10 + 1 ) // sign
+ 1 }; // null
char buffer[MAX_SIZE] = { 0 };
if (sprintf(buffer, fmt, arg) < 0) {
throw_exception(bad_lexical_cast(typeid(TYPE),
typeid(DATATYPES::TString)));
}
return ( DATATYPES::TString(buffer) );
}
template<typename TYPE>
inline const TYPE lexical_castStringToNumeric(const DATATYPES::TString& arg) {
DATATYPES::TCString end = 0;
DATATYPES::TDouble result = std::strtod(arg.c_str(), &end);
if (not end or *end not_eq 0) {
throw_exception(bad_lexical_cast(typeid(DATATYPES::TString),
typeid(TYPE)));
}
return TYPE(result);
}
template<>
inline DATATYPES::THuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::THuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::THuge& arg) {
return (lexical_castNumericToString<DATATYPES::THuge>(arg,"%li"));
}
template<>
inline DATATYPES::TUHuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TUHuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TUHuge& arg) {
return (lexical_castNumericToString<DATATYPES::TUHuge>(arg,"%lu"));
}
template<>
inline DATATYPES::TDouble lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TDouble>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TDouble& arg) {
return (lexical_castNumericToString<DATATYPES::TDouble>(arg,"%f"));
}
} // end namespace boost