c++ overhead from string concatenation - c++

I'm reading in a text file of random ascii from an ifstream. I need to be able to put the whole message into a string type for character parsing. My current solution works, but I think i'm murdering process time on the more lengthy files by using the equivalent of this:
std::string result;
for (std::string line; std::getline(std::cin, line); )
{
result += line;
}
I'm concerned about the overhead associated with concatenating strings like this (this is happening a few thousand times, with a message 10's of thousands of characters long). I've spent the last few days browsing different potential solutions, but nothing is quite fitting... I don't know the length of the message ahead of time, so I don't think using a dynamically sized character array is my answer.
I read through this SO thread which sounded almost applicable but still left me unsure;
Any suggestions?

The problem really is that you don't know the full size ahead of time, so you cannot allocate memory appropriately. I would expect that the performance hit you get is related to that, not to the way strings are concatenated since it is efficiently done in the standard library.
Thus, I would recommend deferring concatenation until you know the full size of your final string. That is, you start by storing all your strings in a big vector as in:
using namespace std;
vector<string> allLines;
size_t totalSize = 0;
// If you can have access to the total size of the data you want
// to read (size of the input file, ...) then just initialize totalSize
// and use only the second code snippet below.
for (string line; getline(cin, line); )
{
allLines.push_back(line);
totalSize += line.size();
}
Then, you can create your big string knowing its size in advance:
string finalString;
finalString.reserve(totalSize);
for (vector<string>::iterator itS = allLines.begin(); itS != allLines.end(); ++itS)
{
finalString += *itS;
}
Although, I should mention that you should do that only if you experience performance issues. Don't try to optimize things that do not need to, otherwise you will complicate your program with no noticeable benefit. Places where we need to optimize are often counterintuitive and can vary from environment to environment. So do that only if your profiling tool tells you you need to.

If you know the file size, use result's member function 'reserve()' once.

I'm too sleepy to put together any solid data for you but, ultimately, without knowing the size ahead of time you're always going to have to do something like this. And the truth is that your standard library implementation is smart enough to handle string resizing fairly smartly. (That's despite the fact that there's no exponential growth guarantee for std::string, the way that there is for std::vector.)
So although you may see unwanted re-allocations the first fifty or so iterations, after a while, the re-allocated block becomes so large that re-allocations become rare.
If you profile and find that this is still a bottleneck, perhaps use std::string::reserve yourself with a typical quantity.

You're copying the result array for every line in the file (as you expand result). Instead pre-allocate the result and grow it exponentially:
std::string result;
result.reserve(1024); // pre-allocate a typical size
for (std::string line; std::getline(std::cin, line); )
{
// every time we run out of space, double the available space
while(result.capacity() < result.length() + line.length())
result.reserve(result.capacity() * 2);
result += line;
}

Related

Memory and time issue when reading/writing from a file

I am trying to solve a school problem and I did that, but it should run faster and on less memory if possible - can you please help me achieve that?
Problem statement: Read a natural number N and a string from a file, and output in another file the same string N number of times.
Example of input file:
3
dog
Example of output file:
dog
dog
dog
Restrictions:
1 ≤ n ≤ 50, and the length of the line to be read is maximum 1,000,000
Time limit: 0.27 seconds
This is what I tried (but run time exceeds the limit):
#include<fstream>
using namespace std;
ifstream cin("afisaren.in");
ofstream cout("afisaren.out");
short n;
char s[1000005];
int main() {
cin >> n;
cin >> s;
while(n) {
cout << s << '\n';
n--;
}
cin.close();
cout.close();
return 0;
}
Generally when given this type of problem, you should profile your own code to see which part of the code is consuming what amount of time. This can mostly be done by adding a few calls to a timekeeping-function before and after code execution, to see how long it was executing. However this is not so easy with your code, since one of the biggest problems (optimisation-wise) is your char s[1000005]; line. The memory will be allocated before executing your main() function, which is operating system dependant (or rather depends on the libc and compiler used).
So first, do not use pre-allocated char-arrays. You're using C++! Why not simply read the text into a std::(w)string or any of the C++-classes which will do dynamic memory allocation (and not crash your program if line-length does exceed 1,000,000).
And second, the c++ std::streams usually perform a flush-to-disk every time a line-ending character is written. This is highly inefficient unless your text is exactly the same size as the block-size of the underlying file-system. To optimize this, create a memory object (i.e. std::string) and copy your text into it for k times, where k = fs-block-size / text-length. fs-block-size will most likely be 1024, 2048 or 4096 bytes. There are system-calls to find that out, but performance will usually not be affected too much when writing twice (or 4x) the fs-block-size, so you can safely assume it to be 4096 for close-to-or-maximum-performance.
Since the maximum number of repetitions is 1 < n < 50, and line length is 1,000,000 (approx. 1 MiB if ASCII), maximum file size for the output will be 50,000,000 characters. You could also write everything into memory and then write everything in one call to write(). This would probably be the most efficient way in terms of disk-activity, but obviously not regarding memory consumption.
I'm not a c++ expert but I had a similar problem when I used c++ style file streams, after googling a bit, I tried switching to c-style file system and it boosted my performance a lot because c++ file streams copy file contents into internal buffer and that takes time, you can try it c-style but usually it is not recommended to use c in c++.

How can I make my C++ program for Trie structure construction faster?

I'm using C++.
My program reads 200 thousand lines of text file and makes a Trie structure.
Can I save Trie or make it faster than now?
Here is the code of a function that reads data from file and builds the structure.
void buildDictionary(pTrie* root, string name) {
wifstream r_dic;
r_dic.imbue(locale("kor"));
r_dic.open(name,ios::binary);
if (r_dic.fail()) {
cout << name << " open failed" << endl;
exit(-1);
}
wchar_t wch[256];
wstring p1, p2;
while (r_dic >> wch >> p1 >> p2) {
pTrie* pt = (*root).insert(splitJamo(wch).c_str(), p1+L' '+p2);
pt->addArche(wch);
}
r_dic.close();
}
Below are results of a profiling run.
Your profile output suggests that the first area to optimize is the file reading. Specifically:
wchar_t wch[256];
wstring p1, p2;
while (r_dic >> wch >> p1 >> p2) {
pTrie* pt = (*root).insert(splitJamo(wch).c_str(), p1+L' '+p2);
pt->addArche(wch);
}
This reads three strings repeatedly. wch is read into a character array, but then passed to splitJamo() which returns a wstring, which requires memory allocation. That might be a bit slow, but I can't tell because you haven't shown the code for splitJamo().
You read p1 and p2 and immediately concatenate them with a space. This is inefficient: they were separated by whitespace in the input file, and you read them separately, allocating memory for them, but then put them back together again.
Assuming the three strings appear on each line of the input file, I'd read it like this:
wchar_t wch[256];
wstring p1p2;
while (r_dic >> wch && std::getline(r_dic, p1p2)) {
pTrie* pt = root->insert(splitJamo(wch), p1p2);
pt->addArche(wch);
}
This reads p1 and p2 together, which should be an improvement. A further improvement might be to use getline() to read the entire line at once, but we can't tell without seeing the code for splitJamo() and insert().
Also note I removed c_str() from the first argument to insert() because I assume it probably takes a wstring, so we avoid constructing a new one this way. But if it requires wchar_t*, you can put back the c_str().
A general rule about software performance assertions says: whatever you guessed to be a reason of program's performance issues, you are wrong. Use a tool instead of guessing.
In the domain of performance optimization the first tool to use is a profiler. Choose one, run a program under its control, then analyze profiler's report on hotspots (ask on SO if such report is hard to grasp, that is expected), make a hypothesis based on the profiler's data, change your program accordingly to the hypothesis, rerun and remeasure, rinse and repeat until you are satisfied with improvements.
There are a number of profilers out there, integrated into IDEs (in MS visual Studio, maybe smth in XCode), integrated into OSes (Linux perf) or standalone (Intel VTune).
As far as I can tell, you suspect IO to be the reason of slowness, but you are very likely to be wrong. It may be memory allocation inefficiency, locale transformations, string operations overuse, etc. etc. Only hard evidence of profiler is the safest way to having a progress with optimization.

Read a big file by lines in C++

I have a big file nearly 800M, and I want to read it line by line.
At first I wrote my program in Python, I use linecache.getline:
lines = linecache.getlines(fname)
It costs about 1.2s.
Now I want to transplant my program to C++.
I wrote these code:
std::ifstream DATA(fname);
std::string line;
vector<string> lines;
while (std::getline(DATA, line)){
lines.push_back(line);
}
But it's slow(costs minutes). How to improve it?
Joachim Pileborg mentioned mmap(), and on windows CreateFileMapping() will work.
My code runs under VS2013, when I use "DEBUG" mode, it takes 162 seconds;
When I use "RELEASE" mode, only 7 seconds!
(Great Thanks To #DietmarKühl and #Andrew)
First of all, you should probably make sure you are compiling with optimizations enabled. This might not matter for such a simple algorithm, but that really depends on your vector/string library implementations.
As suggested by #angew, std::ios_base::sync_with_stdio(false) makes a big difference on routines like the one you have written.
Another, lesser, optimization would be to use lines.reserve() to preallocate your vector so that push_back() doesn't result in huge copy operations. However, this is most useful if you happen to know in advance approximately how many lines you are likely to receive.
Using the optimizations suggested above, I get the following results for reading an 800MB text stream:
20 seconds ## if average line length = 10 characters
3 seconds ## if average line length = 100 characters
1 second ## if average line length = 1000 characters
As you can see, the speed is dominated by per-line overhead. This overhead is primarily occurring inside the std::string class.
It is likely that any approach based on storing a large quantity of std::string will be suboptimal in terms of memory allocation overhead. On a 64-bit system, std::string will require a minimum of 16 bytes of overhead per string. In fact, it is very possible that the overhead will be significantly greater than that -- and you could find that memory allocation (inside of std::string) becomes a significant bottleneck.
For optimal memory use and performance, consider writing your own routine that reads the file in large blocks rather than using getline(). Then you could apply something similar to the flyweight pattern to manage the indexing of the individual lines using a custom string class.
P.S. Another relevant factor will be the physical disk I/O, which might or might not be bypassed by caching.
For c++ you could try something like this:
void processData(string str)
{
vector<string> arr;
boost::split(arr, str, boost::is_any_of(" \n"));
do_some_operation(arr);
}
int main()
{
unsigned long long int read_bytes = 45 * 1024 *1024;
const char* fname = "input.txt";
ifstream fin(fname, ios::in);
char* memblock;
while(!fin.eof())
{
memblock = new char[read_bytes];
fin.read(memblock, read_bytes);
string str(memblock);
processData(str);
delete [] memblock;
}
return 0;
}

Read large amount of ASCII numbers and write in binary form

I have data files with about 1.5 Gb worth of floating-point numbers stored as ASCII text separated by whitespace, e.g., 1.2334 2.3456 3.4567 and so on.
Before processing such numbers I first translate the original file to binary format. This is helpful because I can choose whether to use float or double, reduce file size (to about 800 MB for double and 400 MB for float), and read in chunks of the appropriate size once I am processing the data.
I wrote the following function to make the ASCII-to-binary translation:
template<typename RealType=float>
void ascii_to_binary(const std::string& fsrc, const std::string& fdst){
RealType value;
std::fstream src(fsrc.c_str(), std::fstream::in | std::fstream::binary);
std::fstream dst(fdst.c_str(), std::fstream::out | std::fstream::binary);
while(src >> value){
dst.write((char*)&value, sizeof(RealType));
}
// RAII closes both files
}
I would like to speed-up acii_to_binary, and I seem unable to come up with anything. I tried reading the file in chunks of 8192 bytes, and then try to process the buffer in another subroutine. This seems very complicated because the last few characters in the buffer may be whitespace (in which case all is good), or a truncated number (which is very bad) - the logic to handle the possible truncation seems hardly worth it.
What would you do to speed up this function? I would rather rely on standard C++ (C++11 is OK) with no additional dependencies, like boost.
Thank you.
Edit:
#DavidSchwarts:
I tried to implement your suggestion as follows:
template<typename RealType=float>
void ascii_to_binary(const std::string& fsrc, const std::string& fdst{
std::vector<RealType> buffer;
typedef typename std::vector<RealType>::iterator VectorIterator;
buffer.reserve(65536);
std::fstream src(fsrc, std::fstream::in | std::fstream::binary);
std::fstream dst(fdst, std::fstream::out | std::fstream::binary);
while(true){
size_t k = 0;
while(k<65536 && src >> buffer[k]) k++;
dst.write((char*)&buffer[0], buffer.size());
if(k<65536){
break;
}
}
}
But it does not seem to be writing the data! I'm working on it...
I did exactly the same thing, except that my fields were separated by tab '\t' and I had to also handle non-numeric comments on the end of each line and header rows interspersed with the data.
Here is the documentation for my utility.
And I also had a speed problem. Here are the things I did to improve performance by around 20x:
Replace explicit file reads with memory-mapped files. Map two blocks at once. When you are in the second block after processing a line, remap with the second and third blocks. This way a line that straddles a block boundary is still contiguous in memory. (Assumes that no line is larger than a block, you can probably increase blocksize to guarantee this.)
Use SIMD instructions such as _mm_cmpeq_epi8 to search for line endings or other separator characters. In my case, any line containing an '=' character was a metadata row that needed different processing.
Use a barebones number parsing function (I used a custom one for parsing times in HH:MM:SS format, strtod and strtol are perfect for grabbing ordinary numbers). These are much faster than istream formatted extraction functions.
Use the OS file write API instead of the standard C++ API.
If you dream of throughput in the 300,000 lines/second range, then you should consider a similar approach.
Your executable also shrinks when you don't use C++ standard streams. I've got 205KB, including a graphical interface, and only dependent on DLLs that ship with Windows (no MSVCRTxx.dll needed). And looking again, I still am using C++ streams for status reporting.
Aggregate the writes into a fixed buffer, using a std::vector of RealType. Your logic should work like this:
Allocate a std::vector<RealType> with 65,536 default-constructed entries.
Read up to 65,536 entries into the vector, replacing the existing entries.
Write out as many entries as you were able to read in.
If you read in exactly 65,536 entries, go to step 2.
Stop, you are done.
This will prevent you from alternating reads and writes to two different files, minimizing the seek activity significantly. It will also allow you make far fewer write calls, reducing copying and buffering logic.

How to optimize input/output in C++

I'm solving a problem which requires very fast input/output. More precisely, the input data file will be up to 15MB. Is there a fast, way to read/print integer values.
Note: I don't know if it helps, but the input file has the following form:
line 1: a number n
line 2..n+1: three numbers a,b,c;
line n+2: a number r
line n+3..n+4+r: four numbers a,b,c,d
Note 2: The input file will be stdin.
Edit: Something like the following isn't fast enough:
void fast_scan(int &n) {
char buffer[10];
gets(buffer);
n=atoi(buffer);
}
void fast_scan_three(int &a,int &b,int &c) {
char buffval[3][20],buffer[60];
gets(buffer);
int n=strlen(buffer);
int buffindex=0, curindex=0;
for(int i=0; i<n; ++i) {
if(!isdigit(buffer[i]) && !isspace(buffer[i]))break;
if(isspace(buffer[i])) {
buffindex++;
curindex=0;
} else {
buffval[buffindex][curindex++]=buffer[i];
}
}
a=atoi(buffval[0]);
b=atoi(buffval[1]);
c=atoi(buffval[2]);
}
General input/output optimization principle is to perform as less I/O operations as possible reading/writing as much data as possible.
So performance-aware solution typically looks like this:
Read all data from device into some buffer (using the principle mentioned above)
Process the data generating resulting data to some buffer (on place or another one)
Output results from buffer to device (using the principle mentioned above)
E.g. you could use std::basic_istream::read to input data by big chunks instead of doing it line by line. The similar idea with output - generate single string as result adding line feed symbols manually and output it at once.
If you want to minimize the physical I/O operation overhead, load the whole file into memory by a technique called memory mapped files. I doubt you'll get a noticable performance gain though. Parsing will most likely be a lot costlier.
Consider using threads. Threading is useful for lots of things, but this is exactly the kind of problem that motivated the invention of threads.
The underlying idea is to separate the input, processing, and output, so these different operations can run in parallel. Do it right and you will see a significant speedup.
Have one thread doing close to pure input. It reads lines into a buffer of lines. Have a second thread do a quick pre-parse and organize the raw input into blocks. You have two things that need to be parsed, the line that contain the number of lines that contain triples and the line that contains the number of lines that contain quads. This thread forms the raw input into blocks that are still mostly text. A third thread parses the triples and quads, re-forming the the input into fully parsed structures. Since the data are now organized into independent blocks, you can have multiple instances of this third operation so as to better take advantage of the multiple processors on your computer. Finally, other threads will operate on these fully-parsed structures. Note: It might be better to combine some of these operations, for example combining the input and pre-parsing operations into one thread.
Put several input lines in a buffer, split them, and then parse them simultaneously in different threads.
It's only 15MB. I would just slurp the whole thing into a memory buffer, then parse it.
The parsing looks something like this, approximately:
#define DIGIT(c)((c) >= '0' && (c) <= '9')
while(*p == ' ') p++;
if (DIGIT(*p)){
a = 0;
while(DIGIT(*p){
a *= 10; a += (*p++ - '0');
}
}
// and so on...
You should be able to write this kind of code in your sleep.
I don't know if that's any faster than atoi, but it's not fussing around figuring out where numbers begin and end. I would stay away from scanf because it goes through nine yards of figuring out its format string.
If you run this whole thing in a loop 1000 times and grab some stack samples, you should see that it is spending nearly 100% of its time reading in the file and generating output (which you didn't mention).
You just can't beat that.
If you do see that noticeable time is spent in the actual parsing, it might be possible to do overlapped I/O, but the machine would have to be really slow, or the I/O really fast (like from a solid state drive) before that would make sense.