Why StringCopyFromLiteral is faster than StringCopyFromString?

Why StringCopyFromLiteral is faster than StringCopyFromString? - c++

The Quick C++ Benchmarks example:
static void StringCopyFromLiteral(benchmark::State& state) {
// Code inside this loop is measured repeatedly
for (auto _ : state) {
std::string from_literal("hello");
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(from_literal);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromLiteral);
static void StringCopyFromString(benchmark::State& state) {
// Code before the loop is not measured
std::string x = "hello";
for (auto _ : state) {
std::string from_string(x);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromString);
http://quick-bench.com/IcZllt_14hTeMaB_sBZ0CQ8x2Ro
What if I understand assembly...
More results:
http://quick-bench.com/39fLTvRdpR5zdapKSj2ZzE3asCI

The answer is simple. In the case where you construct an std::string from a small string literal, the compiler optimizes this case by directly populating the contents of the string object using constants in assembly. This avoids expensive looping as well as tests to see whether small string optimization (SSO) can be applied. In this case it knows SSO can be applied so the code the compiler generates simply involves writing the string directly into the SSO buffer.
Note this assembly code in the StringCreation case:
// Populate SSO buffer (each set of 4 characters is backwards since
// x86 is little-endian)
19.63% movb $0x6f,0x4(%r15) // "o"
19.35% movl $0x6c6c6568,(%r15) // "lleh"
// Set size
20.26% movq $0x5,0x10(%rsp) // size = 5
// Probably set heap pointer. 0 (nullptr) = use SSO buffer
20.07% movb $0x0,0x1d(%rsp)
You're looking at the constant values right there. That's not very much code, and no loop is required. In fact, the std::string constructor doesn't even have to be invoked! The compiler is just putting stuff in memory in the same places where the std::string constructor would.
If the compiler cannot apply this optimization, the results are quite different -- in particular, if we "hide" the fact that the source is a string literal by first copying the literal into a char array, the results flip:
char x[] = "hello";
for (auto _ : state) {
std::string created_string(x);
benchmark::DoNotOptimize(created_string);
}
Now the "from-char-pointer" case takes twice as long! Why?
I suspect that this is because the "copy from char pointer" case cannot simply check to see how long the string is by looking at a value. It needs to know whether small string optimization can be performed. There's a few ways it could go about this:
Measure the length of the string first, make an allocation (if needed), then copy the source to the destination. In the case where SSO does apply (it almost certainly does here) I'd expect this to take twice as long since it has to walk the source twice -- once to measure, once to copy.
Copy from the source character-by-character, appending to the new string. This requires testing on each append operation whether the string is now too long for SSO and needs to be copied into a heap-allocated char array. If the string is currently in a heap-allocated array, it needs to instead test if the allocation needs to be resized. This would also take quite a bit longer since there is at least one test for each character in the source string.
Copy from the source in chunks to lower the number of tests that need to be performed and to avoid walking the source twice. This would be faster than the character-by-character approach both because the number of tests would be lower and, because the source is not being walked twice, the CPU memory cache is going to be more effective. This would only show significant speed improvements for long strings, which we don't have here. For short strings it would work about the same as the first approach (measure, then copy).
Contrast this to the case when it's copying from another string object: it can simply look at the size() of the other string and immediately know whether it can perform SSO, and if it can't perform SSO then it also knows exactly how much memory to allocate for the new string.

Related

String.length woes

Edit: Solutions must compile against Microsoft Visual Studio 2012.
I want to use a known string length to declare another string of the same length.
The reasoning is the second string will act as a container for operation done to the first string which must be non volatile with regards to it.
e.g.
const string messy "a bunch of letters";
string dostuff(string sentence) {
string organised NNN????? // Idk, just needs the same size.
for ( x = 0; x < NNN?; x++) {
organised[x] = sentence[x]++; // Doesn't matter what this does.
}
}
In both cases above, the declaration and the exit condition, the NNN? stands for the length of 'messy'.
How do I discover the length at compile time?

std::string has two constructors which could fit your purposes.
The first, a copy constructor:
string organised(sentence);
The second, a constructor which takes a character and a count. You could initialize a string with a temporary character.
string organised(sentence.length(), '_');
Alternatively, you can:
Use an empty string and append (+=) text to it as you go along, or
Use a std::stringstream for the same purpose.
the stringstream will likely be more efficient.
Overall, I would prefer the copy constructor if the length is known.

std::string isn't a compile time type (it can't be a constexpr), so you can't use it directly to determine the length at compile time.
You could initialize a constexpr char[] and then use sizeof on that:
constexpr char messychar[] = "a bunch of letters";
// - 1 to avoid including NUL terminator which std::string doesn't care about
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
const string messy(messychar);
and use that, but frankly, that's pretty ugly; the length would be compile time, but organized would need to use the count and char constructor that would still be performed on each call, allocating and initializing only to have the contents replaced in the loop.
While it's not compile time, you'd avoid that initialization cost by just using reserve and += to build the new string, which with the #define could be done in an ugly but likely efficient way as:
constexpr char messychar[] = "a bunch of letters";
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
// messy itself may not be needed, but if it is, it's initialized optimally
// by using the compile time calculated length, so there is no need to scan for
// NUL terminators, and it can reserve the necessary space in the initial alloc
const string messy(messychar, messylen);
string dostuff(string sentence) {
string organised;
organized.reserve(messylen);
for (size_t x = 0; x < messylen; x++) {
organised += sentence[x]++; // Doesn't matter what this does.
}
}
This avoids setting organised's values more than once, allocating more than once (well, possibly twice if initial construction performs it) per call, and only performs a single read/write pass of sentence, no full read followed by read/write or the like. It also makes the loop constraint a compile time value, so the compiler has the opportunity to unroll the loop (though there is no guarantee of this, and even if it happens, it may not be helpful).
Also note: In your example, you mutate sentence, but it's accepted by value, so you're mutating the local copy, not the caller copy. If mutation of the caller value is required, accept it by reference, and if mutation is not required, accept by const reference to avoid a copy on every call (I understand the example code was filler, just mentioning this).

Efficient const char* concatenations and output to std::string [duplicate]

This question already has answers here:
Most optimized way of concatenation in strings
(9 answers)
Closed 9 years ago.
Consider first that the amount of total data that will be stored in the output string will almost certainly be small and so I doubt any of these have a noticeable affect on performance. My primary goal is to find a way to concatenate a series of const char*'s of unknown size that doesn't look terrible while also keeping efficiency in mind. Below are the results of my search:
Method 1:
std::string str = std::string(array1) + array2 + array3;
Method 2:
std::string str(array1);
str += array2;
str += array3;
I decided on the first method as it is short and concise. If I'm not mistaken, both methods will invoke the same series of operations. the unoptimized compiler would first create a temporary string and internally allocate some amount of space for its buffer >= sizeof(array1). If that buffer is sufficiently large, the additional + operations will not require any new allocations. Finally, if move semantics are supported, then the buffers of the temporary and named str are swapped.
Are there any other ways to perform such an operation that also look nice and don't incur terrible overhead?

Remember, that, in case of arrays, sizeof(array) returns actual size (aka length) of it's parameter, if it has been declared as an array of explicit size (and you wrote 'series of const char*'s of unknown size'). So, assuming you want to create universal solution, strlen() should come under consideration instead.
I don't think you can avoid all additional operations. In case of many concatenations, the best solution would be to allocate buffer, that is large enough to store all concatenated strings.
We can easily deduce, that the most optimal version of append() in this case is:
string& append (const char* s, size_t n);
Why? Because reference says: 'If s does not point to an array long enough (...), it causes undefined behavior'. So we can assume, that internally no additional checks take place (especially additional strlen() calls). Which is good, since you are completely sure, that values passed to append() are correct and you can avoid unnecesary overhead.
Now, the actual concatenation can be done like this:
len_1 = strlen(array_1);
len_2 = strlen(array_2);
len_3 = strlen(array_3);
//Preallocate enough space for all arrays. Only one reallocation takes place.
target_string.reserve(len_1 + len_2 + len_3);
target_string.append(array_1, len_1);
target_string.append(array_2, len_2);
target_string.append(array_3, len_3);
I do not know if this solution 'looks good' in your opinion, but it's definitely clear and is optimized for this use case.

C++ std::string append vs push_back()

This really is a question just for my own interest I haven't been able to determine through the documentation.
I see on http://www.cplusplus.com/reference/string/string/ that append has complexity:
"Unspecified, but generally up to linear in the new string length."
while push_back() has complexity:
"Unspecified; Generally amortized constant, but up to linear in the new string length."
As a toy example, suppose I wanted to append the characters "foo" to a string. Would
myString.push_back('f');
myString.push_back('o');
myString.push_back('o');
and
myString.append("foo");
amount to exactly the same thing? Or is there any difference? You might figure that append would be more efficient because the compiler would know how much memory is required to extend the string the specified number of characters, while push_back may need to secure memory each call?

In C++03 (for which most of "cplusplus.com"'s documentation is written), the complexities were unspecified because library implementers were allowed to do Copy-On-Write or "rope-style" internal representations for strings. For instance, a COW implementation might require copying the entire string if a character is modified and there is sharing going on.
In C++11, COW and rope implementations are banned. You should expect constant amortized time per character added or linear amortized time in the number of characters added for appending to a string at the end. Implementers may still do relatively crazy things with strings (in comparison to, say std::vector), but most implementations are going to be limited to things like the "small string optimization".
In comparing push_back and append, push_back deprives the underlying implementation of potentially useful length information which it might use to preallocate space. On the other hand, append requires that an implementation walk over the input twice in order to find that length, so the performance gain or loss is going to depend on a number of unknowable factors such as the length of the string before you attempt the append. That said, the difference is probably extremely Extremely EXTREMELY small. Go with append for this -- it is far more readable.

I had the same doubt, so I made a small test to check this (g++ 4.8.5 with C++11 profile on Linux, Intel, 64 bit under VmWare Fusion).
And the result is interesting:
push :19
append :21
++++ :34
Could be possible this is because of the string length (big), but the operator + is very expensive compared with the push_back and the append.
Also it is interesting that when the operator only receives a character (not a string), it behaves very similar to the push_back.
For not to depend on pre-allocated variables, each cycle is defined in a different scope.
Note : the vCounter simply uses gettimeofday to compare the differences.
TimeCounter vCounter;
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.push_back('a');
vTest.push_back('b');
vTest.push_back('c');
}
vCounter.stop();
cout << "push :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.append("abc");
}
vCounter.stop();
cout << "append :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest += 'a';
vTest += 'b';
vTest += 'c';
}
vCounter.stop();
cout << "++++ :" << vCounter.elapsed() << endl;
}

Add one more opinion here.
I personally consider it better to use push_back() when adding characters one by one from another string. For instance:
string FilterAlpha(const string& s) {
string new_s;
for (auto& it: s) {
if (isalpha(it)) new_s.push_back(it);
}
return new_s;
}
If using append()here, I would replace push_back(it) with append(1,it), which is not that readable to me.

Yes, I would also expect append() to perform better for the reasons you gave, and in a situation where you need to append a string, using append() (or operator+=) is certainly preferable (not least also because the code is much more readable).
But what the Standard specifies is the complexity of the operation. And that is generally linear even for append(), because ultimately each character of the string being appended (and possible all characters, if reallocation occurs) needs to be copied (this is true even if memcpy or similar are used).

C++ faster way to do string addition?

I'm finding standard string addition to be very slow so I'm looking for some tips/hacks that can speed up some code I have.
My code is basically structured as follows:
inline void add_to_string(string data, string &added_data) {
if(added_data.length()<1) added_data = added_data + "{";
added_data = added_data+data;
}
int main()
{
int some_int = 100;
float some_float = 100.0;
string some_string = "test";
string added_data;
added_data.reserve(1000*64);
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
stringstream fragment;
fragment<<some_int <<","<<some_float<<","<<some_string;
add_to_string(fragment.str(),added_data);
}
return;
}
Doing some basic profiling, I'm finding that a ton of time is being used in the for loop. Are there some things I can do that will significantly speed this up? Will it help to use c strings instead of c++ strings?

String addition is not the problem you are facing. std::stringstream is known to be slow due to it's design. On every iteration of your for-loop the stringstream is responsible for at least 2 allocations and 2 deletions. The cost of each of these 4 operations is likely more than that of the string addition.
Profile the following and measure the difference:
std::string stringBuffer;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
char buffer[128];
sprintf(buffer, "%i,%f,%s",some_int,some_float,some_string.c_str());
stringBuffer = buffer;
add_to_string(stringBuffer ,added_data);
}
Ideally, replace sprintf with _snprintf or the equivalent supported by your compiler.
As a rule of thumb, use stringstream for formatting by default and switch to the faster and less safe functions like sprintf, itoa, etc. whenever performance matters.
Edit: that, and what didierc said: added_data += data;

You can save lots of string operations if you do not call add_to_string in your loop.
I believe this does the same (although I am not a C++ expert and do not know exactly what stringstream does):
stringstream fragment;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
fragment<<some_int<<","<<some_float<<","<<some_string;
}
// inlined add_to_string call without the if-statement ;)
added_data = "{" + fragment.str();

I see you used the reserve method on added_data, which should help by avoiding multiple reallocations of the string as it grows.
You should also use the += string operator where possible:
added_data += data;
I think that the above should save up some significant time by avoiding unecessary copies back and forth of added_data in a temporary string when doing the catenation.
This += operator is a simpler version of the string::append method, it just copies data directly at the end of added_data. Since you made the reserve, that operation alone should be very fast (almost equivalent to a strcpy).
But why going through all this, when you are already using a stringstream to handle input? Keep it all in there to begin with!
The stringstream class is indeed not very efficient.
You may have a look at the stringstream class for more information on how to use it, if necessary, but your solution of using a string as a buffer seems to avoid that class speed issue.
At any rate, stay away from any attempt at reimplementing the speed critical code in pure C unless you really know what you are doing. Some other SO posts support the idea of doing it,, but I think it's best (read safer) to rely as much as possible on the standard library, which will be enhanced over time, and take care of many corner cases you (or I) wouldn't think of. If your input data format is set in stone, then you might start thinking about taking that road, but otherwise it's premature optimization.

If you start added_data with a "{", you would be able to remove the if from your add_to_string method: the if gets executed exactly once, when the string is empty, so you might as well make it non-empty right away.
In addition, your add_to_string makes a copy of the data; this is not necessary, because it does not get modified. Accepting the data by const reference should speed things up for you.
Finally, changing your added_data from string to sstream should let you append to it in a loop, without the sstream intermediary that gets created, copied, and thrown away on each iteration of the loop.

Please have a look at Twine used in LLVM.
A Twine is a kind of rope, it represents a concatenated string using a
binary-tree, where the string is the preorder of the nodes. Since the
Twine can be efficiently rendered into a buffer when its result is used,
it avoids the cost of generating temporary values for intermediate string
results -- particularly in cases when the Twine result is never
required. By explicitly tracking the type of leaf nodes, we can also avoid
the creation of temporary strings for conversions operations (such as
appending an integer to a string).
It may helpful in solving your problem.

How about this approach?
This is a DevPartner for MSVC 2010 report.

string newstring = stringA & stringB;
i dont think strings are slow, its the conversions that can make it slow
and maybe your compiler that might check variable types for mismatches.

C++ string literals vs. const strings

I know that string literals in C/C++ have static storage duration, meaning that they live "forever", i.e. as long as the program runs.
Thus, if I have a function that is being called very frequently and uses a string literal like so:
void foo(int val)
{
std::stringstream s;
s << val;
lbl->set_label("Value: " + s.str());
}
where the set_label function takes a const std::string& as a parameter.
Should I be using a const std::string here instead of the string literal or would it make no difference?
I need to minimise as much runtime memory consumption as possible.
edit:
I meant to compare the string literal with a const std::string prefix("Value: "); that is initialized in some sort of a constants header file.
Also, the concatenation here returns a temporary (let us call it Value: 42 and a const reference to this temporary is being passed to the function set_text(), am I correct in this?
Thank you again!

Your program operates on the same literal every time. There is no more efficient form of storage. A std::string would be constructed, duplicated on the heap, then freed every time the function runs, which would be a total waste.

This will use less memory and run much faster (use snprintf if your compiler supports it):
void foo(int val)
{
char msg[32];
lbl->set_label(std::string(msg, sprintf(msg, "Value: %d", val)));
}
For even faster implementations, check out C++ performance challenge: integer to std::string conversion

How will you build your const std::string ? If you do it from some string literral, in the end it will just be worse (or identical if compiler does a good job). A string literal does not consumes much memory, and also static memory, that may not be the kind of memory you are low of.
If you can read all your string literals from, say a file, and give the memory back to OS when the strings are not used any more, there may be some way to reduce memory footprint (but it will probably slow the program much).
But there is probably many other ways to reduce memory consumption before doing that kind of thing.

Store them in some kind of resource and load/unload them as necessary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js