Why is .push_back(x) faster than .push_back(std::move(x))

Why is .push_back(x) faster than .push_back(std::move(x)) - c++

I have a large .txt file that I need to load and store in a vector. The file is around 5MB in size, 500 000 lines, and every line is around 10-20 characters, delimited with '\n'. I was doing some benchmarking on the time needed to read the whole file using a sample code below:
#include<iostream>
#include<vector>
#include<fstream>
int main()
{
std::fstream input("words.txt");
std::vector<std::string> vector;
std::string line;
while(input >> line){
vector.push_back(line);
}
}
I was curious if passing the string as an rvalue reference would prove faster but instead it was slower by about 10ms.
#include<iostream>
#include<vector>
#include<fstream>
int main()
{
std::fstream input("words.txt");
std::vector<std::string> vector;
std::string line;
while(input >> line){
vector.push_back(std::move(line));
}
}
The average load time for the first code example was about 58ms, and for the second one 68-70ms. I was thinking that moving is always faster or equal to copying, and that's why this doesn't seem right to me.
Does anyone know the reason to why this is happening?
The benchmarks were done using:
perf stats -r 100 ./a.out
on Arch Linux, and the code has been compiled using GCC 10.2, C++17 std.
Also if anyone knows a more optimal way to do this would be much appreciated.

If you invoke g++ -E you can ogle the relevant code:
Copy construction:
basic_string(const basic_string& __str)
: _M_dataplus(_M_local_data(),
_Alloc_traits::_S_select_on_copy(__str._M_get_allocator()))
{
_M_construct(__str._M_data(), __str._M_data() + __str.length());
}
Move construction:
# 552 "/usr/local/include/c++/10.2.0/bits/basic_string.h" 3
basic_string(basic_string&& __str) noexcept
: _M_dataplus(_M_local_data(), std::move(__str._M_get_allocator()))
{
if (__str._M_is_local())
{
traits_type::copy(_M_local_buf, __str._M_local_buf,
_S_local_capacity + 1);
}
else
{
_M_data(__str._M_data());
_M_capacity(__str._M_allocated_capacity);
}
_M_length(__str.length());
__str._M_data(__str._M_local_data());
__str._M_set_length(0);
}
Notably, (to support the Short-String-Optimisation) the move constructor needs to look at ._M_is_local() to work out whether to copy or move (so has a branch to predict), and it clears out the moved-from string / sets its length to 0. Extra work = extra time.
#Manuel made an interesting comment:
When you move line it gets empty, so the next iteration needs to allocate space. std::string has a small buffer as an optimization (most of the implementations, if not all) and the copy just copies chars, no memory allocation. That could be the difference.
That doesn't quite add up as stated, but there are subtle differences here. For input space-separated words long enough to need dynamic allocation, the:
a) move version may have cleared out line's dynamic buffer after the last word, and therefore need to reallocate; if the input is long enough it may have to reallocate one or more times to increase capacity.
b) copy version will likely have a large-enough buffer (its capacity will be increased as necessary, effectively being a high-water-mark for the words seen), but then a dynamic allocation will be needed when copy constructing inside push_back. The exact size of that allocation is known up front though - it won't need to be resized to grow capacity.
That does suggest copy could be faster when there's a lot of variation in the input word lengths.
Also if anyone knows a more optimal way to do this would be much appreciated.
If you really care about performance, I'd recommend benchmarking memory mapping the file and creating a vector of string_views into it: that's likely to be considerably faster.

Related

Why StringCopyFromLiteral is faster than StringCopyFromString?

The Quick C++ Benchmarks example:
static void StringCopyFromLiteral(benchmark::State& state) {
// Code inside this loop is measured repeatedly
for (auto _ : state) {
std::string from_literal("hello");
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(from_literal);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromLiteral);
static void StringCopyFromString(benchmark::State& state) {
// Code before the loop is not measured
std::string x = "hello";
for (auto _ : state) {
std::string from_string(x);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromString);
http://quick-bench.com/IcZllt_14hTeMaB_sBZ0CQ8x2Ro
What if I understand assembly...
More results:
http://quick-bench.com/39fLTvRdpR5zdapKSj2ZzE3asCI

The answer is simple. In the case where you construct an std::string from a small string literal, the compiler optimizes this case by directly populating the contents of the string object using constants in assembly. This avoids expensive looping as well as tests to see whether small string optimization (SSO) can be applied. In this case it knows SSO can be applied so the code the compiler generates simply involves writing the string directly into the SSO buffer.
Note this assembly code in the StringCreation case:
// Populate SSO buffer (each set of 4 characters is backwards since
// x86 is little-endian)
19.63% movb $0x6f,0x4(%r15) // "o"
19.35% movl $0x6c6c6568,(%r15) // "lleh"
// Set size
20.26% movq $0x5,0x10(%rsp) // size = 5
// Probably set heap pointer. 0 (nullptr) = use SSO buffer
20.07% movb $0x0,0x1d(%rsp)
You're looking at the constant values right there. That's not very much code, and no loop is required. In fact, the std::string constructor doesn't even have to be invoked! The compiler is just putting stuff in memory in the same places where the std::string constructor would.
If the compiler cannot apply this optimization, the results are quite different -- in particular, if we "hide" the fact that the source is a string literal by first copying the literal into a char array, the results flip:
char x[] = "hello";
for (auto _ : state) {
std::string created_string(x);
benchmark::DoNotOptimize(created_string);
}
Now the "from-char-pointer" case takes twice as long! Why?
I suspect that this is because the "copy from char pointer" case cannot simply check to see how long the string is by looking at a value. It needs to know whether small string optimization can be performed. There's a few ways it could go about this:
Measure the length of the string first, make an allocation (if needed), then copy the source to the destination. In the case where SSO does apply (it almost certainly does here) I'd expect this to take twice as long since it has to walk the source twice -- once to measure, once to copy.
Copy from the source character-by-character, appending to the new string. This requires testing on each append operation whether the string is now too long for SSO and needs to be copied into a heap-allocated char array. If the string is currently in a heap-allocated array, it needs to instead test if the allocation needs to be resized. This would also take quite a bit longer since there is at least one test for each character in the source string.
Copy from the source in chunks to lower the number of tests that need to be performed and to avoid walking the source twice. This would be faster than the character-by-character approach both because the number of tests would be lower and, because the source is not being walked twice, the CPU memory cache is going to be more effective. This would only show significant speed improvements for long strings, which we don't have here. For short strings it would work about the same as the first approach (measure, then copy).
Contrast this to the case when it's copying from another string object: it can simply look at the size() of the other string and immediately know whether it can perform SSO, and if it can't perform SSO then it also knows exactly how much memory to allocate for the new string.

C++ Storing vector<std::string> from a file using reserve and possibly emplace_back

I am looking for a quick way to store strings from a file into a vector of strings such that I can reserve the number of lines ahead of time. What is the best way to do this? Should I cont the new line characters first or just get a total size of the file and just reserve say the size / 80 in order to give a rough estimate on what to reserve. Ideally I don't want to have the vector have to realloc each time which would slow things down tremendously for a large file. Ideally I would count the number of items ahead of time but should I do this by opening in binary mode counting the new lines and then reopening? That seems wasteful, curious on some thoughts for this. Also is there a way to use emplace_back to get rid of the temporary somestring in the getline code below. I did see the following 2 implmentations for counting the number of lines ahead of time Fastest way to find the number of lines in a text (C++)
std::vector<std::string> vs;
std::string somestring;
std::ifstream somefile("somefilename");
while (std::getline(somefile, somestring))
vs.push_back(somestring);
Also I could do something to get the total size ahead of time, can I just transform the char* in this case into the vector directly? This goes back to my reserve hint of saying size / 80 or some constant to give an estimated size to the reserve upfront.
#include <iostream>
#include <fstream>
int main () {
char* contents;
std::ifstream istr ("test.txt");
if (istr)
{
std::streambuf * pbuf = istr.rdbuf();
//which I can use as a reserve hint say size / 80
std::streamsize size = pbuf->pubseekoff(0,istr.end);
//maybe I can construct the vector from the char buf directly?
pbuf->pubseekoff(0,istr.beg);
contents = new char [size];
pbuf->sgetn (contents,size);
}
return 0;
}

Rather than waste time counting the lines ahead of time, I would just reserve() an initial value, then start pushing the actual lines, and if you happen to push the reserved number of items then just reserve() some more space before continuing with more pushing, repeating as needed.

The strategy for reserving space in a std::vector is designed to "grow on demand". That is, you will not allocate one string at a time, you will first allocate one, then, say, ten, then, one hundred and so on (not exactly those numbers, but that's the idea). In other word, the implementation of std::vector::push_back already manages this for you.
Consider the following example: I am reading the entire text of War and Peace (65007 lines) using two versions: one which allocates and one which does not (i.e., one reserves zero space, and the other reserves the full 65007 lines; text from: http://www.gutenberg.org/cache/epub/2600/pg2600.txt)
#include<iostream>
#include<fstream>
#include<vector>
#include<string>
#include<boost/timer/timer.hpp>
void reader(size_t N=0) {
std::string line;
std::vector<std::string> lines;
lines.reserve(N);
std::ifstream fp("wp.txt");
while(std::getline(fp, line)) {
lines.push_back(line);
}
std::cout<<"read "<<lines.size()<<" lines"<<std::endl;
}
int main() {
{
std::cout<<"no pre-allocation"<<std::endl;
boost::timer::auto_cpu_timer t;
reader();
}
{
std::cout<<"full pre-allocation"<<std::endl;
boost::timer::auto_cpu_timer t;
reader(65007);
}
return 0;
}
Results:
no pre-allocation
read 65007 lines
0.027796s wall, 0.020000s user + 0.000000s system = 0.020000s CPU (72.0%)
full pre-allocation
read 65007 lines
0.023914s wall, 0.020000s user + 0.010000s system = 0.030000s CPU (125.4%)
You see, for a non-trivial amount of text I have a difference of milliseconds.
Do you really need to know the lines beforehand? Is it really a bottleneck? Are you saving, say, one second of Wall time but complicating your code ten-fold by preallocating the lines?

C++ faster way to do string addition?

I'm finding standard string addition to be very slow so I'm looking for some tips/hacks that can speed up some code I have.
My code is basically structured as follows:
inline void add_to_string(string data, string &added_data) {
if(added_data.length()<1) added_data = added_data + "{";
added_data = added_data+data;
}
int main()
{
int some_int = 100;
float some_float = 100.0;
string some_string = "test";
string added_data;
added_data.reserve(1000*64);
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
stringstream fragment;
fragment<<some_int <<","<<some_float<<","<<some_string;
add_to_string(fragment.str(),added_data);
}
return;
}
Doing some basic profiling, I'm finding that a ton of time is being used in the for loop. Are there some things I can do that will significantly speed this up? Will it help to use c strings instead of c++ strings?

String addition is not the problem you are facing. std::stringstream is known to be slow due to it's design. On every iteration of your for-loop the stringstream is responsible for at least 2 allocations and 2 deletions. The cost of each of these 4 operations is likely more than that of the string addition.
Profile the following and measure the difference:
std::string stringBuffer;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
char buffer[128];
sprintf(buffer, "%i,%f,%s",some_int,some_float,some_string.c_str());
stringBuffer = buffer;
add_to_string(stringBuffer ,added_data);
}
Ideally, replace sprintf with _snprintf or the equivalent supported by your compiler.
As a rule of thumb, use stringstream for formatting by default and switch to the faster and less safe functions like sprintf, itoa, etc. whenever performance matters.
Edit: that, and what didierc said: added_data += data;

You can save lots of string operations if you do not call add_to_string in your loop.
I believe this does the same (although I am not a C++ expert and do not know exactly what stringstream does):
stringstream fragment;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
fragment<<some_int<<","<<some_float<<","<<some_string;
}
// inlined add_to_string call without the if-statement ;)
added_data = "{" + fragment.str();

I see you used the reserve method on added_data, which should help by avoiding multiple reallocations of the string as it grows.
You should also use the += string operator where possible:
added_data += data;
I think that the above should save up some significant time by avoiding unecessary copies back and forth of added_data in a temporary string when doing the catenation.
This += operator is a simpler version of the string::append method, it just copies data directly at the end of added_data. Since you made the reserve, that operation alone should be very fast (almost equivalent to a strcpy).
But why going through all this, when you are already using a stringstream to handle input? Keep it all in there to begin with!
The stringstream class is indeed not very efficient.
You may have a look at the stringstream class for more information on how to use it, if necessary, but your solution of using a string as a buffer seems to avoid that class speed issue.
At any rate, stay away from any attempt at reimplementing the speed critical code in pure C unless you really know what you are doing. Some other SO posts support the idea of doing it,, but I think it's best (read safer) to rely as much as possible on the standard library, which will be enhanced over time, and take care of many corner cases you (or I) wouldn't think of. If your input data format is set in stone, then you might start thinking about taking that road, but otherwise it's premature optimization.

If you start added_data with a "{", you would be able to remove the if from your add_to_string method: the if gets executed exactly once, when the string is empty, so you might as well make it non-empty right away.
In addition, your add_to_string makes a copy of the data; this is not necessary, because it does not get modified. Accepting the data by const reference should speed things up for you.
Finally, changing your added_data from string to sstream should let you append to it in a loop, without the sstream intermediary that gets created, copied, and thrown away on each iteration of the loop.

Please have a look at Twine used in LLVM.
A Twine is a kind of rope, it represents a concatenated string using a
binary-tree, where the string is the preorder of the nodes. Since the
Twine can be efficiently rendered into a buffer when its result is used,
it avoids the cost of generating temporary values for intermediate string
results -- particularly in cases when the Twine result is never
required. By explicitly tracking the type of leaf nodes, we can also avoid
the creation of temporary strings for conversions operations (such as
appending an integer to a string).
It may helpful in solving your problem.

How about this approach?
This is a DevPartner for MSVC 2010 report.

string newstring = stringA & stringB;
i dont think strings are slow, its the conversions that can make it slow
and maybe your compiler that might check variable types for mismatches.

STL containers & memory leaks

C# coder just wrote this simple C++ method to get text from a file:
static std::vector<std::string> readTextFile(const std::string &filePath) {
std::string line;
std::vector<std::string> lines;
std::ifstream theFile(filePath.c_str());
while (theFile.good()) {
getline (theFile, line);
lines.push_back(line);
}
theFile.close();
return lines;
}
I know this code is not efficient; the text lines are copied once as they are read and a second time when returned-by-value.
Two questions:
(1) can this code leak memory ?
(2) more generally can returning containers of objects by value ever leak memory ? (assuming the objects themselves don't leak)

while (theFile.good()) {
getline (theFile, line);
lines.push_back(line);
}
Forget about efficiency, this code is not correct. It will not read the file correctly. See the following topic to know why:
What's preferred pattern for reading lines from a file in C++?
So the loop should be written as:
while (getline (theFile, line)) {
lines.push_back(line);
}
Now this is correct. If you want to make it efficient, profile your application first. Try see the part which takes most CPU cycles.
(1) can this code leak memory ?
No.
(2) more generally can returning containers of objects by value ever leak memory ?
Depends on the type of the object in the containers. In your case, the type of the object in std::vector is std::string which makes sure that no memory will be leaked.

No and no. Returning by value never leaks memory (assuming the containers and the contained objects are well written). It would be fairly useless if it was any other way.
And I second what Nawaz says, your while loop is wrong. It's frankly incredible how many times we see that, there must be an awful lot of bad advice out there.

(1) can this code leak memory ?
No
(2) more generally can returning containers of objects by value ever leak memory ?
No. You might leak memory that is stored in a container by pointer or through objects that leak. But that would not be caused by returning by value.
I know this code is not efficient; the text lines are copied once as they are read and a second time when returned-by-value.
Most probably not. There are two copies of the string, but not the ones that you are thinking about. The return copy will most likely be optimized in C++03, and will either be optimized away or transformed into a move (cheap) in C++11.
The two copes are rather:
getline (theFile, line);
lines.push_back(line);
The first line copies from the file to line, and the second copies from line to the container. If you are using a C++11 compiler you can change the second line to:
lines.push_back(std::move(line));
to move the contents of the string to the container. Alternatively (and also valid in C++03), you can change the two lines with:
lines.push_back(std::string()); // In most implementations this is *cheap*
// (i.e. no memory allocation)
getline(theFile, lines.back());
And you should test the result of the read (if the read fails, in the last alternative, make sure to resize to one less element to remove the last empty string.

In C++11, you can do:
std::vector<std::string>
read_text_file(const std::string& path)
{
std::string line;
std::vector<std::string> ans;
std::ifstream file(path.c_str());
while (std::getline(file, line))
ans.push_back(std::move(line));
return ans;
}
and no extra copies are made.
In C++03, you accept the extra copies, and painfully remove them only if profiling dictates so.
Note: you don't need to close the file manually, the destructor of std::ifstream does it for you.
Note2: You can template on the char type, this may be useful in some circumstances:
template <typename C, typename T>
std::vector<std::basic_string<C, T>>
read_text_file(const char* path)
{
std::basic_string<C, T> line;
std::vector<std::basic_string<C, T>> ans;
std::basic_ifstream<C, T> file(path);
// Rest as above
}

No, returning container by value should not leak memory. The standard library is designed not to leak memory itself in any case. It can only leak a memory if there is a bug in its implementation. At least there used to be a bug in vector of strings in an old MSVC.

std::ifstream buffer caching

In my application I'm trying to merge sorted files (keeping them sorted of course), so I have to iterate through each element in both files to write the minimal to the third one. This works pretty much slow on big files, as far as I don't see any other choice (the iteration has to be done) I'm trying to optimize file loading. I can use some amount of RAM, which I can use for buffering. I mean instead of reading 4 bytes from both files every time I can read once something like 100Mb and work with that buffer after that, until there will be no element in buffer, then I'll refill the buffer again. But I guess ifstream is already doing that, will it give me more performance and is there any reason? If fstream does, maybe I can change size of that buffer?
added
My current code looks like that (pseudocode)
// this is done in loop
int i1 = input1.read_integer();
int i2 = input2.read_integer();
if (!input1.eof() && !input2.eof())
{
if (i1 < i2)
{
output.write(i1);
input2.seek_back(sizeof(int));
} else
input1.seek_back(sizeof(int));
output.write(i2);
}
} else {
if (input1.eof())
output.write(i2);
else if (input2.eof())
output.write(i1);
}
What I don't like here is
seek_back - I have to seek back to previous position as there is no way to peek 4 bytes
too much reading from file
if one of the streams is in EOF it still continues to check that stream instead of putting contents of another stream directly to output, but this is not a big issue, because chunk sizes are almost always equal.
Can you suggest improvement for that?
Thanks.

Without getting into the discussion on stream buffers, you can get rid of the seek_back and generally make the code much simpler by doing:
using namespace std;
merge(istream_iterator<int>(file1), istream_iterator<int>(),
istream_iterator<int>(file2), istream_iterator<int>(),
ostream_iterator<int>(cout));
Edit:
Added binary capability
#include <algorithm>
#include <iterator>
#include <fstream>
#include <iostream>
struct BinInt
{
int value;
operator int() const { return value; }
friend std::istream& operator>>(std::istream& stream, BinInt& data)
{
return stream.read(reinterpret_cast<char*>(&data.value),sizeof(int));
}
};
int main()
{
std::ifstream file1("f1.txt");
std::ifstream file2("f2.txt");
std::merge(std::istream_iterator<BinInt>(file1), std::istream_iterator<BinInt>(),
std::istream_iterator<BinInt>(file2), std::istream_iterator<BinInt>(),
std::ostream_iterator<int>(std::cout));
}

In decreasing order of performance (best first):
memory-mapped I/O
OS-specific ReadFile or read calls.
fread into a large buffer
ifstream.read into a large buffer
ifstream and extractors

A program like this should be I/O bound, meaning it should be spending at least 80% of it's time waiting for completion of reading or writing a buffer, and if the buffers are reasonably big, it should be keeping the disk heads busy. That's what you want.
Don't assume it is I/O bound, without proof. A way to prove it is by taking several stackshots. If it is, most of the samples will show the program waiting for I/O completion.
It is possible that it is not I/O bound, meaning you may find other things going on in some of the samples that you never expected. If so, then you know what to fix to speed it up. I have seen some code like this spending much more time than necessary in the merge loop, testing for end-of-file, getting data to compare, etc. for example.

You can just use the read function of an ifstream to read large blocks.
http://www.cplusplus.com/reference/iostream/istream/read/
The second parameter is the number of bytes. You should make this a multiple of 4 in your case - maybe 4096? :)
Simply read a chunk at a time and work on it.
As martin-york said, this may not have any beneficial effect on your performance, but try it and find out.

I think it is very likely that you can improve performance by reading big chunks.
Try opening the file with ios::binary as an argument, then use istream::read to read the data.
If you need maximum performance, I would actually suggest skipping iostreams altogether, and using cstdio instead. But I guess this is not what you want.

Unless there is something very special about your data it is unlikely that you will improve on the buffering that is built into the std::fstream object.
The std::fstream objects are designed to be very effecient for general purpose file access. It does not sound like you are doing anything special by accessing the data 4 bytes at a time. You can always profile your code to see where the actual time is spent in your code.
Maybe if you share the code with ous we could spot some major inefficiencies.
Edit:
I don't like your algorithm. Seeking back and forward may be hard on the stream especially of the number lies over a buffer boundary. I would only read one number each time through the loop.
Try this:
Note: This is not optimal (and it assumes stream input of numbers (while yours looks binary)) But I am sure you can use it as a starting point.
#include <fstream>
#include <iostream>
// Return the current val (that was the smaller value)
// and replace it with the next value in the stream.
int getNext(int& val, std::istream& str)
{
int result = val;
str >> val;
return result;
}
int main()
{
std::ifstream f1("f1.txt");
std::ifstream f2("f2.txt");
std::ofstream re("result");
int v1;
int v2;
f1 >> v1;
f2 >> v2;
// While there are values in both stream
// Output one value and replace it using getNext()
while(f1 && f2)
{
re << (v1 < v2)? getNext(v1, f1) : getNext(v2, f2);
}
// At this point one (or both) stream(s) is(are) empty.
// So dump the other stream.
for(;f1;f1 >> v1)
{
// Note if the stream is at the end it will
// never enter the loop
re << v1;
}
for(;f2;f2 >> v2)
{
re << v2;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js