C++, const reference is actually faster than move?

C++, const reference is actually faster than move? - c++

After testing this code:
#include <iostream>
#include <chrono>
#include <vector>
#include <string>
void x(std::vector<std::string>&& v){ }
void y(const std::vector<std::string>& v) { }
int main() {
std::vector<std::string> v = {};
auto tp = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000000; ++i)
x(std::move(v));
auto t2 = std::chrono::high_resolution_clock::now();
auto time = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - tp);
std::cout << "1- It took: " << time.count() << " seconds\n";
tp = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000000; ++i)
y(v);
t2 = std::chrono::high_resolution_clock::now();
time = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - tp);
std::cout << "2- It took: " << time.count() << " seconds\n";
std::cin.get();
}
I get that using const-reference is actually ~15s faster than using move semantics, why is that? I thought that move semantics were faster, else, why would they add them? What did I get wrong about move semantics? thanks

Your code makes no sense. Here is a simpler version of your code, substituted with int and cleaned up. Here is the assembly version of the code, compiled with -std=c++11 -02:
https://goo.gl/6MWLNp
There is NO difference between the assembly for the rvalue and lvalue functions. Whatever is the cause doesn't matter because the test itself doesn't make use of move semantics.
The reason is probably because the compiler optimizes both functions to the same assembly. You're not doing anything with either, so there's no point in doing anything different in the assembly than a simple ret.
Here is a better example, this time, swapping the first two items in the vector:
https://goo.gl/Sp6sk4
Ironically, you can see that the second function actually just calls the rvalue reference version automatically as a part of its execution.
Assuming that a function A which calls B is slower than just executing the function B, the speed of x() should outperform y().
std::move() itself has an additional cost. All things else being constant, calling std::move() is more costly than not calling std::move(). This is why the "move semantics" is slower in the code you gave us. In reality, the code is slower because you're not actually doing anything--both functions simply return as soon as they execute. You can also see that one version appears to call std::move() while the other doesn't.
Edit: The above doesn't appear to be true. std::move() is not usually a true function call; it is mainly a static_cast<T&&> that depends on some template stuff.
In the example I gave you, I'm actually making use of the move semantics. Most of the assembly is more important, but you can see that y() calls x() as a part of its execution. y() should therefore be slower than x().
tl;dr: You're not actually using move semantics because your functions don't need to do anything at all. Make the functions use copying/moving, and you'll see that even the assembly uses part of the "move semantics" code as a part of its copying code.

Related

C++ constructor performance

In C++17 if we design a class like this:
class Editor {
public:
// "copy" constructor
Editor(const std::string& text) : _text {text} {}
// "move" constructor
Editor(std::string&& text) : _text {std::move(text)} {}
private:
std::string _text;
}
It might seem (to me at least), that the "move" constructor should be much faster than the "copy" constructor.
But if we try to measure actual times, we will see something different:
int current_time()
{
return chrono::high_resolution_clock::now().time_since_epoch().count();
}
int main()
{
int N = 100000;
auto t0 = current_time();
for (int i = 0; i < N; i++) {
std::string a("abcdefgh"s);
Editor {a}; // copy!
}
auto t1 = current_time();
for (int i = 0; i < N; i++) {
Editor {"abcdefgh"s};
}
auto t2 = current_time();
cout << "Copy: " << t1 - t0 << endl;
cout << "Move: " << t2 - t1 << endl;
}
Both copy and move times are in the same range. Here's one of the outputs:
Copy: 36299550
Move: 35762602
I tried with strings as long as 285604 characters, with the same result.
Question: why is "copy" constructor Editor(std::string& text) : _text {text} {} so fast? Doesn't it actually creates a copy of input string?
Update I run the benchmark given here using the following line: g++ -std=c++1z -O2 main.cpp && ./a.out
Update 2 Fixing move constructor, as #Caleth suggests (remove const from the const std::string&& text) improves things!
Editor(std::string&& text) : _text {std::move(text)} {}
Now benchmark looks like:
Copy: 938647
Move: 64

It also depends on your optimization flags. With no optimization, you can (and I did!) get even worse results for the move:
Copy: 4164540
Move: 6344331
Running the same code with -O2 optimization gives a much different result:
Copy: 1264581
Move: 791
See it live on Wandbox.
That's with clang 9.0. On GCC 9.1, the difference is about the same for -O2 and -O3 but not quite as stark between copy and move:
Copy: 775
Move: 508
I'm guessing that's a small string optimization kicking in.
In general, the containers in the standard library work best with optimizations on because they have a lot of little functions that the compiler can easily inline and collapse when asked to do so.
Also in that first constructor, per Herb Sutter, "Prefer passing a read-only parameter by value if you’re going to make a copy of the parameter anyway, because it enables move from rvalue arguments."
Update: For very long strings (300k characters), the results are similar to the above (now using std::chrono::duration in milliseconds to avoid int overflows) with GCC 9.1 and optimizations:
Copy: 22560
Move: 1371
and without optimizations:
Copy: 22259
Move: 1404

const std::string&& looks like a typo.
You can't move from it, so you get a copy instead.

So your tests is really looking at the number of times we have to "build" a string object.
So in the fist test:
for (int i = 0; i < N; i++) {
std::string a("abcdefgh"s); // Build a string once.
Editor {a}; // copy! // Here you build the string again.
} // So basically two expensive memory
// allocations and a copying the string
While in the second test:
for (int i = 0; i < N; i++) {
Editor {"abcdefgh"s}; // You build a string once.
// Then internally you move the allocated
// memory (so only one expensive memory
// allocation and copying the string
}
So the difference between the two loops is one extra string copy.
The problem here. I as a human can spot one easy peephole optimization (and the compiler is better than me).
for (int i = 0; i < N; i++) {
std::string a("abcdefgh"s); // This string is only used in a single
// place where it is passed to a
// function as a const parameter
// So we can optimize it out of the loop.
Editor {a};
}
So if we do a manually yanking of the string outside the loop (equivalent to a valid compiler optimization).
So this loop has the same affect:
std::string a("abcdefgh"s);
for (int i = 0; i < N; i++) {
Editor {a};
}
Now this loop only has 1 allocation and copy.
So now both loops look the same in terms of the expensive operations.
Now as a human I am not going to spot (quickly) all the optimization possible. I am just trying to point out here that your quick test here you will not spot a lot of optimizations that the compiler will do and thus estimations and doing timings like this are hard.

On paper you're right, but in practice this is quite easily optimisable so you'll probably find the compiler has ruined your benchmark.
You could benchmark with "optimisations" turned off, but that in itself holds little real-world benefit. It may be possible to trick the compiler in release mode by adding some code that prevents such an optimisation, but off the top of my head I can't imagine what that would look like here.
It's also a relatively small string that can be copied really quickly nowadays.
I think you should just trust your instinct here (because it's correct), while remembering that in practice it might not actually make a lot of difference. But the move certainly won't be worse than the copy.
Sometimes we can and should write obviously "more efficient" code without being able to prove that it'll actually perform better on any particular day of the week with any particular phase of the moon/planetary alignment, because compilers are already trying to make your code as fast as possible.
People may tell you that this is therefore a "premature optimisation", but it really isn't: it's just sensible code.

Should I return reference to heap object or return value?

I have these two simple functions. I thought that func1 was a good solution since you pass an object by reference. My textbook gave func2 as the answer for the best solution. Is this only because you aren't deallocateing heapstr? What if I declared heapstr in main and then passed it to the function so I was able to delete it afterwards?
#include <iostream>
using namespace std;
string& func1(const string &str) {
string* heapstr=new string();
for (int i = 0; i < str.size(); ++i) {
*heapstr += str[i];
}
return *heapstr;
}
string func2(const string &str) {
string heapstr;
for (int i = 0; i < str.size(); ++i) {
heapstr += str[i];
}
return heapstr;
}
int main() {
cout << func1("aaa") << endl;
cout << func2("aaa") << endl;
}

Should I return reference to heap object or return value?
Return by value.
There are a lot of reasons why, but none are really related to performance, because the compiler is good enough at optimising things, and even if it wasn't, most programs are I/O-bound, i.e. the time you wait for data from files or network sockets eats up all your performance, not the time spent by CPU operations themselves.
See for example the "C++ Core Guidelines" by Herb Sutter and Bjarne Stroustrup, which say at section "Return containers by value (relying on move or copy elision for efficiency)":
Reason
To simplify code and eliminate a need for explicit memory management.
As for your two functions...
My textbook gave func2 as the answer for the best solution. Is this only because you aren't deallocateing heapstr?
The memory leak is one of the problems. But the point is simply that returning by value is simpler and less error-prone. It's all about correctness, not speed. You wouldn't return an int* if you could just return an int instead, would you?
What if I declared heapstr in main and then passed it to the function so I was able to delete it afterwards?
You would introduce a lot of possibilities for memory leaks, crashes and undefined behaviour into your code. It would become longer, harder to write, harder to read, harder to maintain, harder to debug and harder to justify in a code review. In return, you would gain absolutely nothing.

The text book is correct. (Shocker.)
Func1 is faulty in every respect in which it differs from func2. It allocates an object from the heap without regard to how that object will be deleted. Then it returns a reference to the new object, hiding the pointer that might have otherwise been used to delete it. There is no efficiency gain, in fact Func1 is probably a bit slower. In any case, recite after me: "Avoid Early Optimization."
Since the advent of the Standard Template Library, many moons ago, it is almost never best to use operator new. The last time I used operator new was ca. 2003, and I wrapped the pointer in the equivalent of what we now know as a unique_ptr. Before you use operator new, read all about learn all about smart pointers and RAII.

Since this is textbook example, you should consider its context to underestand what it wants to show exactly(its goal is to minimize memory usage as you or using a safe programming pattern?!). But two hint
When you use new operator to allocate memory, you must de-allocate it using delete. The code has memory leak for heapstr at func1.
Also in the more realistic projects its not safe to share objects between methods. Its management(i.e who currently modified this or who is responsible to de-allocate its memory when object is not needed longer) became hard.
PS: I do not have C++17 but it optimize following as well. For more details read #BoPersson comments.
PS: Stack allocation is faster, but in your example you have a copy operation at func2 return. In your example as #Jive Dadson said there is no difference due ot compiler optimization but in general case, suppose following code
#include <iostream>
#include <string>
using namespace std;
string& func1(const string &str) {
string* heapstr = new string();
cout << "func1 " << heapstr << endl;
for (int i = 0; i < str.size(); ++i) {
*heapstr += str[i];
}
return *heapstr;
}
string func2(const string &str) {
string heapstr;
for (int i = 0; i < str.size(); ++i) {
heapstr += str[i];
}
cout << &heapstr << endl;
return heapstr;
}
int main() {
string a = func1("aaa");
string b = func2("aaa");
cout << "main " << a << endl;
}
PS: (As #Jive Dadson said there is no difference in your example, but in mine)If we define performance as run time, maybe func1. Also if we define performance as memory usage, func1. If we define performance as good programming pattern, func2. Totally func2 is more preferred.

Why is allocation on the heap faster than allocation on the stack?

As far as my knowledge on resource management goes, allocating something on the heap (operator new) should always be slower than allocating on the stack (automatic storage), because the stack is a LIFO-based structure, thus it requires minimal bookkeeping, and the pointer of the next address to allocate is trivial.
So far, so good. Now look at the following code:
/* ...includes... */
using std::cout;
using std::cin;
using std::endl;
int bar() { return 42; }
int main()
{
auto s1 = std::chrono::steady_clock::now();
std::packaged_task<int()> pt1(bar);
auto e1 = std::chrono::steady_clock::now();
auto s2 = std::chrono::steady_clock::now();
auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
auto e2 = std::chrono::steady_clock::now();
auto first = std::chrono::duration_cast<std::chrono::nanoseconds>(e1-s1);
auto second = std::chrono::duration_cast<std::chrono::nanoseconds>(e2-s2);
cout << "Regular: " << first.count() << endl
<< "Make shared: " << second.count() << endl;
pt1();
(*sh_ptr1)();
cout << "As you can see, both are working correctly: "
<< pt1.get_future().get() << " & "
<< sh_ptr1->get_future().get() << endl;
return 0;
}
The results seem to contradict the stuff explained above:
Regular: 6131
Make shared: 843
As you can see, both are working
correctly: 42 & 42
Program ended with exit code: 0
In the second measurement, apart from the call of operator new, the constructor of the std::shared_ptr (auto sh_ptr1) has to finish. I can't seem to understand why is this faster then regular allocation.
What is the explanation for this?

The problem is that the first call to the constructor of std::packaged_task is responsible for initializing a load of per-thread state that is then unfairly attributed to pt1. This is a common problem of benchmarking (particularly microbenchmarking) and is alleviated by warmup; try reading How do I write a correct micro-benchmark in Java?
If I copy your code but run both parts first, the results are the same to within the limits of the resolution of the system clock. This demonstrates another issue of microbenchmarking, that you should run small tests multiple times to allow total time to be measured accurately.
With warmup and running each part 1000 times, I get the following (example):
Regular: 132.986
Make shared: 211.889
The difference (approx 80ns) accords well with the rule of thumb that malloc takes 100ns per call.

It is a problem with your micro-benchmark: if you swap the order in which you measure the timing, you would get opposite results (demo).
It looks like the first-time call of std::packaged_task constructor causes a big hit. Adding an untimed
std::packaged_task<int()> ignore(bar);
before measuring the time fixes this problem (demo):
Regular: 505
Make shared: 937

I've tried your example at ideone and got a result similar to yours:
Regular: 67950
Make shared: 696
Then I reversed the order of tests:
auto s2 = std::chrono::steady_clock::now();
auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
auto e2 = std::chrono::steady_clock::now();
auto s1 = std::chrono::steady_clock::now();
std::packaged_task<int()> pt1(bar);
auto e1 = std::chrono::steady_clock::now();
and found an opposite result:
Regular: 548
Make shared: 68065
So that's not difference of stack vs heap, but difference of first and second call. Maybe you need to look into the internals of std::packaged_task.

What's the point of using boost::mem_fn if we have boost::bind?

I'm having a look at the Boost libraries that were included in C++'s Technical Report 1 and trying to understand what each does.
I've just finished running an example for boost::mem_fn and now I'm wondering what's the point of using it instead of the better boost::bind. As far as I understand, both of them return a function object pointing to a member function. I find mem_fn so limited that I can't find a scenario where using it would be better than bind.
Am I missing something? Is there any case in which bind cannot replace mem_fn?

mem_fn is much smaller than bind, so if you only need the functionality of mem_fn it's a lot less code to pull in.

mem_fn is smaller and faster than bind. Try the following program with your favorite compiler and compare:
The size of the resulting executable and
The number of seconds reported as being spent.
You can compare the performance of bind versus mem_fn by changing the 1 to a 0 in the #if line.
#include <iostream>
#include <functional>
#include <chrono>
struct Foo
{
void bar() {}
};
int main(int argc, const char * argv[])
{
#if 1
auto bound = std::bind( &Foo::bar, std::placeholders::_1 );
#else
auto bound = std::mem_fn( &Foo::bar );
#endif
Foo foo;
auto start = std::chrono::high_resolution_clock::now();
for( size_t i = 0; i < 100000000; ++i )
{
bound( foo );
}
auto end = std::chrono::high_resolution_clock::now();
auto delta = std::chrono::duration_cast< std::chrono::duration< double >>( end - start );
std::cout << "seconds = " << delta.count() << std::endl;
return 0;
}
Results will vary, but on my current system the mem_fn version of the executable is 220 bytes smaller and runs about twice as fast as the bind version.
And as a bonus feature, mem_fn doesn't require you to remember to add std::placeholders::_1 like bind does (on pain of an obscure templated compiler error).
So, prefer mem_fn when you can.

Well, bind depends on mem_fun so there you go. How and why I'll leave for you to discover since although interesting, I haven't got the time to investigate right now (bind is complicated).

boost::lambda has a similar overlap of functionality with the two you mentioned. I think they all sort of evolved with similar intent, about the same time, with different approaches, resulting in confusion and incompatibility issues. It'd be nice if they all merged under one lambda umbrella.
So, no, there is no overarching design that calls for both libraries to co-exist.

How efficient is std::string compared to null-terminated strings?

I've discovered that std::strings are very slow compared to old-fashioned null-terminated strings, so much slow that they significantly slow down my overall program by a factor of 2.
I expected STL to be slower, I didn't realise it was going to be this much slower.
I'm using Visual Studio 2008, release mode. It shows assignment of a string to be 100-1000 times slower than char* assignment (it's very difficult to test the run-time of a char* assignment). I know it's not a fair comparison, a pointer assignment versus string copy, but my program has lots of string assignments and I'm not sure I could use the "const reference" trick in all places. With a reference counting implementation my program would have been fine, but these implementations don't seem to exist anymore.
My real question is: why don't people use reference counting implementations anymore, and does this mean we all need to be much more careful about avoiding common performance pitfalls of std::string?
My full code is below.
#include <string>
#include <iostream>
#include <time.h>
using std::cout;
void stop()
{
}
int main(int argc, char* argv[])
{
#define LIMIT 100000000
clock_t start;
std::string foo1 = "Hello there buddy";
std::string foo2 = "Hello there buddy, yeah you too";
std::string f;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
f = foo1;
foo1 = foo2;
foo2 = f;
}
double stl = double(clock() - start) / CLOCKS\_PER\_SEC;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
}
double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = "Hello there buddy";
char* goo2 = "Hello there buddy, yeah you too";
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
g = goo1;
goo1 = goo2;
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
cout << "Empty loop = " << emptyLoop << "\n";
cout << "char* loop = " << charLoop << "\n";
cout << "std::string = " << stl << "\n";
cout << "slowdown = " << (stl - emptyLoop) / (charLoop - emptyLoop) << "\n";
std::string wait;
std::cin >> wait;
return 0;
}

Well there are definitely known problems regarding the performance of strings and other containers. Most of them have to do with temporaries and unnecessary copies.
It's not too hard to use it right, but it's also quite easy to Do It Wrong. For example, if you see your code accepting strings by value where you don't need a modifiable parameter, you Do It Wrong:
// you do it wrong
void setMember(string a) {
this->a = a; // better: swap(this->a, a);
}
You better had taken that by const reference or done a swap operation inside, instead of yet another copy. Performance penalty increases for a vector or list in that case. However, you are right definitely that there are known problems. For example in this:
// let's add a Foo into the vector
v.push_back(Foo(a, b));
We are creating one temporary Foo just to add a new Foo into our vector. In a manual solution, that might create the Foo directly into the vector. And if the vector reaches its capacity limit, it has to reallocate a larger memory buffer for its elements. What does it do? It copies each element separately to their new place using their copy constructor. A manual solution might behave more intelligent if it knows the type of the elements before-hand.
Another common problem is introduced temporaries. Have a look at this
string a = b + c + e;
There are loads of temporaries created, which you might avoid in a custom solution that you actually optimize onto performance. Back then, the interface of std::string was designed to be copy-on-write friendly. However, with threads becoming more popular, transparent copy on write strings have problems keeping their state consistent. Recent implementations tend to avoid copy on write strings and instead apply other tricks where appropriate.
Most of those problems are solved however for the next version of the Standard. For example instead of push_back, you can use emplace_back to directly create a Foo into your vector
v.emplace_back(a, b);
And instead of creating copies in a concatenation above, std::string will recognize when it concatenates temporaries and optimize for those cases. Reallocation will also avoid making copies, but will move elements where appropriate to their new places.
For an excellent read, consider Move Constructors by Andrei Alexandrescu.
Sometimes, however, comparisons also tend to be unfair. Standard containers have to support the features they have to support. For example if your container does not keep map element references valid while adding/removing elements from your map, then comparing your "faster" map to the standard map can become unfair, because the standard map has to ensure that elements keep being valid. That was just an example, of course, and there are many such cases that you have to keep in mind when stating "my container is faster than standard ones!!!".

It looks like you're misusing char* in the code you pasted. If you have
std::string a = "this is a";
std::string b = "this is b"
a = b;
you're performing a string copy operation. If you do the same with char*, you're performing a pointer copy operation.
The std::string assignment operation allocates enough memory to hold the contents of b in a, then copies each character one by one. In the case of char*, it does not do any memory allocation or copy the individual characters one by one, it just says "a now points to the same memory that b is pointing to."
My guess is that this is why std::string is slower, because it's actually copying the string, which appears to be what you want. To do a copy operation on a char* you'd need to use the strcpy() function to copy into a buffer that's already appropriately sized. Then you'll have an accurate comparison. But for the purposes of your program you should almost definitely use std::string instead.

When writing C++ code using any utility class (whether STL or your own) instead of eg. good old C null terminated strings, you need to rememeber a few things.
If you benchmark without compiler optimisations on (esp. function inlining), classes will lose. They are not built-ins, even stl. They are implemented in terms of method calls.
Do not create unnesessary objects.
Do not copy objects if possible.
Pass objects as references, not copies, if possible,
Use more specialised method and functions and higher level algorithms. Eg.:
std::string a = "String a"
std::string b = "String b"
// Use
a.swap(b);
// Instead of
std::string tmp = a;
a = b;
b = tmp;
And a final note. When your C-like C++ code starts to get more complex, you need to implement more advanced data structures like automatically expanding arrays, dictionaries, efficient priority queues. And suddenly you realise that its a lot of work and your classes are not really faster then stl ones. Just more buggy.

You are most certainly doing something wrong, or at least not comparing "fairly" between STL and your own code. Of course, it's hard to be more specific without code to look at.
It could be that you're structuring your code using STL in a way that causes more constructors to run, or not re-using allocated objects in a way that matches what you do when you implement the operations yourself, and so on.

This test is testing two fundamentally different things: a shallow copy vs. a deep copy. It's essential to understand the difference and how to avoid deep copies in C++ since a C++ object, by default, provides value semantics for its instances (as with the case with plain old data types) which means that assigning one to the other is generally going to copy.
I "corrected" your test and got this:
char* loop = 19.921
string = 0.375
slowdown = 0.0188244
Apparently we should cease using C-style strings since they are soooo much slower! In actuality, I deliberately made my test as flawed as yours by testing shallow copying on the string side vs. strcpy on the :
#include <string>
#include <iostream>
#include <ctime>
using namespace std;
#define LIMIT 100000000
char* make_string(const char* src)
{
return strcpy((char*)malloc(strlen(src)+1), src);
}
int main(int argc, char* argv[])
{
clock_t start;
string foo1 = "Hello there buddy";
string foo2 = "Hello there buddy, yeah you too";
start = clock();
for (int i=0; i < LIMIT; i++)
foo1.swap(foo2);
double stl = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = make_string("Hello there buddy");
char* goo2 = make_string("Hello there buddy, yeah you too");
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
g = make_string(goo1);
free(goo1);
goo1 = make_string(goo2);
free(goo2);
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
cout << "char* loop = " << charLoop << "\n";
cout << "string = " << stl << "\n";
cout << "slowdown = " << stl / charLoop << "\n";
string wait;
cin >> wait;
}
The main point is, and this actually gets to the heart of your ultimate question, you have to know what you are doing with the code. If you use a C++ object, you have to know that assigning one to the other is going to make a copy of that object (unless assignment is disabled, in which case you'll get an error). You also have to know when it's appropriate to use a reference, pointer, or smart pointer to an object, and with C++11, you should also understand the difference between move and copy semantics.
My real question is: why don't people use reference counting
implementations anymore, and does this mean we all need to be much
more careful about avoiding common performance pitfalls of
std::string?
People do use reference-counting implementations. Here's an example of one:
shared_ptr<string> ref_counted = make_shared<string>("test");
shared_ptr<string> shallow_copy = ref_counted; // no deep copies, just
// increase ref count
The difference is that string doesn't do it internally as that would be inefficient for those who don't need it. Things like copy-on-write are generally not done for strings either anymore for similar reasons (plus the fact that it would generally make thread safety an issue). Yet we have all the building blocks right here to do copy-on-write if we wish to do so: we have the ability to swap strings without any deep copying, we have the ability to make pointers, references, or smart pointers to them.
To use C++ effectively, you have to get used to this way of thinking involving value semantics. If you don't, you might enjoy the added safety and convenience but do it at heavy cost to the efficiency of your code (unnecessary copies are certainly a significant part of what makes poorly written C++ code slower than C). After all, your original test is still dealing with pointers to strings, not char[] arrays. If you were using character arrays and not pointers to them, you'd likewise need to strcpy to swap them. With strings you even have a built-in swap method to do exactly what you are doing in your test efficiently, so my advice is to spend a bit more time learning C++.

If you have an indication of the eventual size of your vector you can prevent excessive resizes by calling reserve() before filling it up.

The main rules of optimization:
Rule 1: Don't do it.
Rule 2: (For experts only) Don't do it yet.
Are you sure that you have proven that it is really the STL that is slow, and not your algorithm?

Good performance isn't always easy with STL, but generally, it is designed to give you the power. I found Scott Meyers' "Effective STL" an eye-opener for understanding how to deal with the STL efficiently. Read!
As others said, you are probably running into frequent deep copies of the string, and compare that to a pointer assignment / reference counting implementation.
Generally, any class designed towards your specific needs, will beat a generic class that's designed for the general case. But learn to use the generic class well, and learn to ride the 80:20 rules, and you will be much more efficient than someone rolling everything on their own.
One specific drawback of std::string is that it doesn't give performance guarantees, which makes sense. As Tim Cooper mentioned, STL does not say whether a string assignment creates a deep copy. That's good for a generic class, because reference counting can become a real killer in highly concurrent applications, even though it's usually the best way for a single threaded app.

They didn't go wrong. STL implementation is generally speaking better than yours.
I'm sure that you can write something better for a very particular case, but a factor of 2 is too much... you really must be doing something wrong.

If used correctly, std::string is as efficient as char*, but with the added protection.
If you are experiencing performance problems with the STL, it's likely that you are doing something wrong.
Additionally, STL implementations are not standard across compilers. I know that SGI's STL and STLPort perform generally well.
That said, and I am being completely serious, you could be a C++ genius and have devised code that is far more sophisticated than the STL. It's not likely , but who knows, you could be the LeBron James of C++.

I would say that STL implementations are better than the traditional implementations. Also did you try using a list instead of a vector, because vector is efficient for some purpose and list is efficient for some other

std::string will always be slower than C-strings. C-strings are simply a linear array of memory. You cannot get any more efficient than that, simply as a data structure. The algorithms you use (like strcat() or strcpy()) are generally equivalent to the STL counterparts. The class instantiation and method calls will be, in relative terms, significantly slower than C-string operations (even worse if the implementation uses virtuals). The only way you could get equivalent performance is if the compiler does optimization.

string const string& char* Java string
---------------------------------------------------------------------------------------------------
Efficient no ** yes yes yes
assignment
Thread-safe yes yes yes yes
memory management yes no no yes
done for you
** There are 2 implementations of std::string: reference counting or deep-copy. Reference counting introduces performance problems in multi-threaded programs, EVEN for just reading strings, and deep-copy is obviously slower as shown above. See:
Why VC++ Strings are not reference counted?
As this table shows, 'string' is better than 'char*' in some ways and worse in others, and 'const string&' is similar in properties to 'char*'. Personally I'm going to continue using 'char*' in many places. The enormous amount of copying of std::string's that happens silently, with implicit copy constructors and temporaries makes me somewhat ambivalent about std::string.

A large part of the reason might be the fact that reference-counting is no longer used in modern implementations of STL.
Here's the story (someone correct me if I'm wrong): in the beginning, STL implementations used reference counting, and were fast but not thread-safe - the implementors expected application programmers to insert their own locking mechanisms at higher levels, to make them thread-safe, because if locking was done at 2 levels then this would slow things down twice as much.
However, the programmers of the world were too ignorant or lazy to insert locks everywhere. For example, if a worker thread in a multi-threaded program needed to read a std::string commandline parameter, then a lock would be needed even just to read the string, otherwise crashes could ensue. (2 threads increment the reference count simultaneously on different CPU's (+1), but decrement it separately (-2), so the reference count goes down to zero, and the memory is freed.)
So implementors ditched reference counting and instead had each std::string always own its own copy of the string. More programs worked, but they were all slower.
So now, even a humble assignment of one std::string to another, (or equivalently, passing a std::string as a parameter to a function), takes about 400 machine code instructions instead of the 2 it takes to assign a char*, a slowdown of 200 times.
I tested the magnitude of the inefficiency of std::string on one major program, which had an overall slowdown of about 100% compared with null-terminated strings. I also tested raw std::string assignment using the following code, which said that std::string assignment was 100-900 times slower. (I had trouble measuring the speed of char* assignment). I also debugged into the std::string operator=() function - I ended up knee deep in the stack, about 7 layers deep, before hitting the 'memcpy()'.
I'm not sure there's any solution. Perhaps if you need your program to be fast, use plain old C++, and if you're more concerned about your own productivity, you should use Java.
#define LIMIT 800000000
clock_t start;
std::string foo1 = "Hello there buddy";
std::string foo2 = "Hello there buddy, yeah you too";
std::string f;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
f = foo1;
foo1 = foo2;
foo2 = f;
}
double stl = double(clock() - start) / CLOCKS_PER_SEC;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
}
double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;
char* goo1 = "Hello there buddy";
char* goo2 = "Hello there buddy, yeah you too";
char *g;
start = clock();
for (int i=0; i < LIMIT; i++) {
stop();
g = goo1;
goo1 = goo2;
goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
TfcMessage("done", 'i', "Empty loop = %1.3f s\n"
"char* loop = %1.3f s\n"
"std::string loop = %1.3f s\n\n"
"slowdown = %f",
emptyLoop, charLoop, stl,
(stl - emptyLoop) / (charLoop - emptyLoop));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js