I would like to know the best way to write data from vector<string> to text file fast as the data would involve few millions lines.
I have tried ofstream (<<) in C++ as well as fprintf using C, yet, the performance between them is little as i have recorded the time that is used to generate the required file.
vector<string> OBJdata;
OBJdata = assembleOBJ(pointer, vertexCount, facePointer);
FILE * objOutput;
objOutput = fopen("sample.obj", "wt");
for (int i = 0; i < OBJdata.size(); i++)
fwrite(&OBJdata[i],1, sizeof(OBJdata[i].length()),objOutput );
There is no "best". There are only options with different advantages and disadvantages, both of which vary with your host hardware (e.g. writing to a high performance drive will be faster than a slower on), file system, and device drivers (implementation of disk drivers can trade-off performance to increase chances of data being correctly written to the drive).
Generally, however, manipulating data in memory is faster than transferring it to or from a device like a hard drive. There are limitations on this as, with virtual memory, data in physical memory may be transferred in some circumstances to virtual memory - on disk.
So, assuming you have sufficient RAM and a fast CPU, an approach like
// assume your_stream is an object of type derived from ostream
// THRESHOLD is a large-ish positive integer
std::string buffer;
for (std::vector<string>::const_iterator i = yourvec.begin(), end = yourvec.end(); i != end; ++i)
if (buffer.length() + i->length + 1 >= THRESHOLD)
your_stream << buffer;
buffer.append(1, '\n');
your_stream << buffer;
The strategy here is reducing the number of distinct operations that write to the stream. As a rule of thumb, a larger value of THRESHOLD will reduce the number of distinct output operations, but will also consume more memory, so there is usually a sweet spot somewhere in terms of performance. The problem is, that sweet spot depends on the factors I mentioned above (hardware, file system, device drivers, etc). So this approach is worth some effort to find the sweet spot only if you KNOW the exact hardware and host system configuration your program will run on (or you KNOW that the program will only be executed in a small range of configurations). It is not worth the effort if you don't know these things, since what works with one configuration will often not work for another.
Under windows, you might want to use win API functions to work with the file (CreateFile(), WriteFile(), etc) rather than C++ streams. That might give small performance gains, but I wouldn't hold my breath.
You may want to take a look at writev that allows you to write multiple elements at once - thus taking better advantage of the buffering. See: http://linux.die.net/man/2/writev
I am trying to make a simple file-based hash table. Here is my insert member function:
private: std::fstream f; // std::ios::in | std::ios::out | std::ios::binary
public: void insert(const char* this_key, long this_value) {
char* that_key;
long that_value;
long this_hash = std::hash<std::string>{}(this_key) % M;
long that_hash; // also block status
long block = this_hash;
long offset = block * BLOCK_SIZE;
while (true) {
this->f.read((char*) &that_hash, sizeof(long));
if (that_hash > -1) { // -1 (by default) indicates a never allocated block
this->f.read(that_key, BLOCK_SIZE);
if (strcmp(this_key, that_key) == 0) {
this->f.write((char*) &this_value, sizeof(long));
} else {
block = (block + 1) % M; // linear probing
offset = block * BLOCK_SIZE;
} else {
this->f.write((char*) &this_hash, sizeof(long)); // as block status
this->f.write(this_key, KEY_SIZE);
this->f.write((char*) &this_value, sizeof(long));
Tests up to 10M key, value pairs with 50,000,017 blocks were fairly done. (Binary file size was about 3.8GB).
However, a test with 50M keys and 250,000,013 blocks extremely slows down... (Binary file size was more than 19GB in this case). 1,000 inserts usually takes 4~5ms but exceptionally take more than 2,000ms. It gets slower and slower then takes 40~150ms... (x10 ~ x30 slower...) I definitely have no idea...
What causes this exceptional binary file I/O slowing down?
Do seekg&seekp and other I/O operations are affected by file size? (I could not find any references about this question though...)
How key, value stores and databases avoid this I/O slow down?
How can I solve this problem?
Data size
Usually disk drive block size are a power of 2 so if your data block size is also a power of 2, then you can essentially eliminate the case where a data block cross a disk block boundary.
In your case, a value of 64 bytes (or 32 bytes if you don't really need to store the hash) might somewhat perform a bit better.
Insertion order
The other thing you could do to improve performance is to do your insertion is increasing hash order to reduce the number of a time data must be loaded from the disk.
Generally when data is read or written to the disk, the OS will read/write a large chuck at a time (maybe 4k) so if your algorithm is written is a way to write data locally in time, you will reduce the number of time data must actually be read or written to the disk.
Given that you make a lot of insertion, one possibility would be to process insertion in batch of say 1000 or even 10000 key/value pair at a time. Essentially, you would accumulate data in memory and sort it and once you have enough item (or when you are done inserting), you will then write the data in order.
That way, you should be able to reduce disk access which is very slow. This is probably even more important if you are using traditional hard drive as moving the head is slow (in which case, it might be useful to defragment it). Also, be sure that your hard drive have more than enough free space.
In some case, local caching (in your application) might also be helpful particularly if you are aware how your data is used.
File size VS collisions
When you use an hash, you want to find the sweet spot between file size and collisions. If you have too much collisions, then you will waste lot of time and at some point it might degenerate when it become hard to find a free place for almost every insertion.
On the other hand, if your file is really very large, you might end up in a case where you might fill your RAM with data that is mainly empty and still need to replace data with data from the disk on almost all insertion.
For example, if your data is 20GB and you are able to load say 2 GB in memory, then if insert are really random, 90% of the time you might need real access to hard drive.
Well options will depends on the OS and it is beyond the scope of a programming forum. If you want to optimize your computer, then you should look elsewhere.
It might be helpful to read about operating systems (file system, cache layer…) and algorithms (external sorting algorithms, B-Tree and other structures) to get a better understanding.
Extra RAM
Fast SSD
Multithreading (for ex. input and output threads)
Rewriting of the algorithm (for ex. to read/write a whole disk page at once)
Faster CPU / 64 bit computer
Using algorithms designed for such scenarios.
Using a database.
Profiling code
Tuning parameters
I'm trying to read the data contained in a .dat file with size ~1.1GB.
Because I'm doing this on a 16GB RAM machine, I though it would have not be a problem to read the whole file into memory at once, to only after process it.
To do this, I employed the slurp function found in this SO answer.
The problem is that the code sometimes, but not always, throws a bad_alloc exception.
Looking at the task manager I see that there are always at least 10GB of free memory available, so I don't see how memory would be an issue.
Here is the code that reproduces this error
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
using namespace std;
int main()
ifstream file;
cerr << "The file was not found\n";
stringstream sstr;
sstr << file.rdbuf();
string text = sstr.str();
cout << "Successfully read file!\n";
return 0;
What could be causing this problem?
And what are the best practices to avoid it?
The fact that your system has 16GB doesn't mean any program at any time can allocate a given amount of memory. In fact, this might work on a machine that has only 512MB of physical RAM, if enought swap is available, or it might fail on a HPC node with 128GB of RAM – it's totally up to your Operating System to decide how much memory is available to you, here.
I'd also argue that std::string is never the data type of choice if actually dealing with a file, possibly binary, that large.
The point here is that there is absolutely no knowing how much memory stringstream tries to allocate. A pretty reasonable algorithm would double the amount of memory allocated every time the allocated internal buffer becomes too small to contain the incoming bytes. Also, libc++/libc will probably also have their own allocators that will have some allocation overhead, here.
Note that stringstream::str() returns a copy of the data contained in the stringstream's internal state, again leaving you with at least 2.2 GB of heap used up for this task.
Really, if you need to deal with data from a large binary file as something that you can access with the index operator [], look into memory mapping your file; that way, you get a pointer to the beginning of the file, and might work with it as if it was a plain array in memory, letting your OS take care of handling the underlying memory/buffer management. It's what OSes are for!
If you didn't know Boost before, it's kind of "the extended standard library for C++" by now, and of course, it has a class abstracting memory mapping a file: mapped_file.
The file I'm reading contains a series of data in ASCII tabular form, i.e. float1,float2\nfloat3,float4\n....
I'm browsing through the various possible solutions proposed on SO to deal with this kind of problem, but I was left wondering on this (to me) peculiar behaviour. What would you recommend in these kinds of circumstances?
Depends; I actually think the fastest way of dealing with this (since file IO is much, much slower than in-memory parsing of ASCII) is to parse the file incrementally, directly into an in-memory array of float variables; possibly taking advantage of your OS'es pre-fetching SMP capabilities in that you don't even get that much of a speed advantage if you'd spawn separate threads for file reading and float conversion. std::copy, used to read from std::ifstream to a std::vector<float> should work fine, here.
I'm still not getting something: you say that file IO is much slower than in-memory parsing, and this I understand (and is the reason why I wanted to read the whole file at once). Then you say that the best way is to parse the whole file incrementally into an in-memory array of float. What exactly do you mean by this? Doesn't this mean to read the file line-by-line, resulting in a large number of file IO operations?
Yes, and no: First, of course, you will have more context switches then you'd have if you just ordered for the whole to be read at once. But those aren't that expensive -- at least, they're going to be much less expensive when you realize that most OSes and libc's know quite well how to optimize reads, and thus will fetch a whole lot of file at once if you don't use extremely randomized read lengths. Also, you don't infer the penalty of trying to allocate a block of RAM at least 1.1GB in size -- that calls for some serious page table lookups, which aren't that fast, either.
Now, the idea is that your occasional context switch and the fact that, if you're staying single-threaded, there will be times when you don't read the file because you're still busy converting text to float will still mean less of a performance hit, because most of the time, your read will pretty much immediately return, as your OS/runtime has already prefetched a significant part of your file.
Generally, to me, you seem to be worried about all the wrong kinds of things: Performance seems to be important to you (is it really that important, here? You're using a brain-dead file format for interchanging floats, which is both bloaty, loses information, and on top of that is slow to parse), but you'd rather first read the whole file in at once and then start converting it to numbers. Frankly, if performance was of any criticality to your application, you would start to multi-thread/-process it, so that string parsing could already happen while data is still being read. Using buffers of a few kilo- to Megabytes to be read up to \n boundaries and exchanged with a thread that creates the in-memory table of floats sounds like it would basically reduce your read+parse time down to read+non-measurable without sacrificing read performance, and without the need for Gigabytes of RAM just to parse a sequential file.
By the way, to give you an impression of how bad storing floats in ASCII is:
The typical 32bit single-precision IEEE753 floating point number has about 6-9 significant decimal digits. Hence, you will need at least 6 characters to represent these in ASCII, one ., typically one exponential divider, e.g. E, and on average 2.5 digits of decimal exponent, plus on average half a sign character (- or not), if your numbers are uniformly chosen from all possible IEEE754 32bit floats:
That's an average of 11 characters.
Add one , or \n after every number.
Now, your character is 1B, meaning that you blow up your 4B of actual data by a factor of 3, still losing precision.
Now, people always come around telling me that plaintext is more usable, because if in doubt, the user can read it… I've yet to see one user that can skim through 1.1GB (according to my calculations above, that's around 90 million floating point numbers, or 45 million floating point pairs) and not go insane.
In a 32 bit executable, total memory address space is 4gb. Of that, sometimes 1-2 gb is reserved for system use.
To allocate 1 GB, you need 1 GB of contiguous space. To copy it, you need 2 1 GB blocks. This can easily fail, unpredictably.
There are two approaches. First, switch to a 64 bit executable. This will not run on a 32 bit system.
Second, stop allocating 1 GB contiguous blocks. Once you start dealing with that much data, segmenting it and or streaming it starts making a lot of sense. Done right you'll also be able to start to process it prior to finishing reading it.
There are many file io datastructures, from stxxl to boost, or you can roll your own.
The size of the heap (a pool of memory used for dynamic allocations) is limited independently on the amount of RAM your machine has. You should use some other memory allocation technique for such large allocations which will probably force you to change the way you read from the file.
If you are running on UNIX based system you can check the function vmalloc or the VirtualAlloc function if you are running on Windows platform.
I am using an extendible hash and I want to have strings as keys. The problem is that the current hash function that I am using iterates over the whole string/key and I think that this is pretty bad for the program's performance since the hash function is called multiple times especially when I am splitting buckets.
Current hash function
int hash(const string& key)
int seed = 131;
unsigned long hash = 0;
for(unsigned i = 0; i < key.length(); i++)
hash = (hash * seed) + key[i];
return hash;
The keys could be as long as 40 characters.
Example of string/key
string key = "from-to condition"
I have searched over the internet for a better one but I didn't find anything to match my case. Any suggestions?
You should prefer to use std::hash unless measurement shows that you can do better. To limit the number of characters it uses, use something like:
const auto limit = min(key.length(), 16);
for(unsigned i = 0; i < limit; i++)
You will want to experiment to find the best value of 16 to use.
I would actually expect the performance to get worse (because you will have more collisions). If your strings were several k, then limiting to the first 64 bytes might well be worth while.
Depending on your strings, it might be worth starting not at the beginning. For example, hashing filenames you would probably do better using the characters between 20 and 5 from the end (ignore the often constant pathname prefix, and the file extension). But you still have to measure.
I am using an extendible hash and I want to have strings as keys.
As mentioned before, use std::hash until there is a good reason not to.
The problem is that the current hash function that I am using iterates over the whole string/key and I think that this is pretty bad...
It's an understandable thought, but is actually unlikely to be a real concern.
(anticipating) why?
A quick scan over stack overflow will reveal many experienced developers talking about caches and cache lines.
(forgive me if I'm teaching my grandmother to suck eggs)
A modern CPU is incredibly quick at processing instructions and performing (very complex) arithmetic. In almost all cases, what limits its performance is having to talk to memory across a bus, which is by comparison, horribly slow.
So chip designers build in memory caches - extremely fast memory that sits in the CPU (and therefore does not have to be communicated with over a slow bus). Unhappily, there is only so much space available [plus heat constraints - a topic for another day] for this cache memory so the CPU must treat it like an OS does a disk cache, flushing memory and reading in memory as and when it needs to.
As mentioned, communicating across the bus is slow - (simply put) it requires all the electronic components on the motherboard to stop and synchronise with each other. This wastes a horrible amount of time [This would be a fantastic point to go into a discussion about the propagation of electronic signals across the motherboard being constrained by approximately half the speed of light - it's fascinating but there's only so much space here and I have only so much time]. So rather than transfer one byte, word or longword at a time, the memory is accessed in chunks - called cache lines.
It turns out that this is a good decision by chip designers because they understand that most memory is accessed sequentially - because most programs spend most of their time accessing memory linearly (such as when calculating a hash, comparing strings or objects, transforming sequences, copying and initialising sequences and so on).
What's the upshot of all this?
Well, bizarrely, if your string is not already in-cache it turns of that reading one byte of it is almost exactly as expensive as reading all the bytes in the first (say) 128 bytes of it.
Plus, because the cache circuitry assumes that memory access is linear, it will begin a fetch of the next cache line as soon as it has fetched your first one. It will do this while your CPU is performing its hash computation.
I hope you can see that in this case, even if your string was many thousands of bytes long, and you chose to only hash (say) every 128th byte, all you would be doing would be to compute a very much inferior hash which still causing the memory cache to halt the processor while it fetched large chunks of unused memory. It would take just as long - for a worse result!
Having said that, what are good reasons not to use the standard implementation?
Only when:
The users are complaining that your software is too slow to be useful, and
The program is verifiably CPU-bound (using 100% of CPU time), and
The program is not wasting any cycles by spinning, and
Careful profiling has revealed that the program's biggest bottleneck is the hash function, and
Independent analysis by another experienced developer confirms that there is no way to improve the algorithm (for example by calling hash less often).
In short, almost never.
You can directly use std::hashlink instead of implementing your own function.
#include <iostream>
#include <functional>
#include <string>
size_t hash(const std::string& key)
std::hash<std::string> hasher;
return hasher(key);
int main() {
std::cout << hash("abc") << std::endl;
return 0;
See this code here: https://ideone.com/4U89aU
I've been running into some issues with writing to a file - namely, not being able to write fast enough.
To explain, my goal is to capture a stream of data coming in over gigabit Ethernet and simply save it to a file.
The raw data is coming in at a rate of 10MS/s, and it's then saved to a buffer and subsequently written to a file.
Below is the relevant section of code:
std::string path = "Stream/raw.dat";
ofstream outFile(path, ios::out | ios::app| ios::binary);
cout << "Yes" << endl;
rxSamples = rxStream->recv(&rxBuffer[0], rxBuffer.size(), metaData);
//Irrelevant error checking...
//Write data to a file
std::copy(begin(rxBuffer), end(rxBuffer), std::ostream_iterator<complex<float>>(outFile));
The issue I'm encountering is that it's taking too long to write the samples to a file. After a second or so, the device sending the samples reports its buffer has overflowed. After some quick profiling of the code, nearly all of the execution time is spent on std::copy(...) (99.96% of the time to be exact). If I remove this line, I can run the program for hours without encountering any overflow.
That said, I'm rather stumped as to how I can improve the write speed. I've looked through several posts on this site, and it seems like the most common suggestion (in regard to speed) is to implement file writes as I've already done - through the use of std::copy.
If it's helpful, I'm running this program on Ubuntu x86_64. Any suggestions would be appreciated.
So the main problem here is that you try to write in the same thread as you receive, which means that your recv() can only be called again after copy is complete. A few observations:
Move the writing to a different thread. This is about a USRP, so GNU Radio might really be the tool of your choice -- it's inherently multithreaded.
Your output iterator is probably not the most performant solution. Simply "write()" to a file descriptor might be better, but that's performance measurements that are up to you
If your hard drive/file system/OS/CPU aren't up to the rates coming in from the USRP, even if decoupling receiving from writing thread-wise, then there's nothing you can do -- get a faster system.
Try writing to a RAM disk instead
In fact, I don't know how you came up with the std::copy approach. The rx_samples_to_file example that comes with UHD does this with a simple write, and you should definitely favor that over copying; file I/O can, on good OSes, often be done with one copy less, and iterating over all elements is probably very slow.
Let's do a bit of math.
Your samples are (apparently) of type std::complex<std::float>. Given a (typical) 32-bit float, that means each sample is 64 bits. At 10 MS/s, that means the raw data is around 80 megabytes per second--that's within what you can expect to write to a desktop (7200 RPM) hard drive, but getting fairly close to the limit (which is typically around 100-100 megabytes per second or so).
Unfortunately, despite the std::ios::binary, you're actually writing the data in text format (because std::ostream_iterator basically does stream << data;).
This not only loses some precision, but increases the size of the data, at least as a rule. The exact amount of increase depends on the data--a small integer value can actually decrease the quantity of data, but for arbitrary input, a size increase close to 2:1 is fairly common. With a 2:1 increase, your outgoing data is now around 160 megabytes/second--which is faster than most hard drives can handle.
The obvious starting point for an improvement would be to write the data in binary format instead:
uint32_t nItems = std::end(rxBuffer)-std::begin(rxBuffer);
outFile.write((char *)&nItems, sizeof(nItems));
outFile.write((char *)&rxBuffer[0], sizeof(rxBuffer));
For the moment I've used sizeof(rxBuffer) on the assumption that it's a real array. If it's actually a pointer or vector, you'll have to compute the correct size (what you want is the total number of bytes to be written).
I'd also note that as it stands right now, your code has an even more serious problem: since it hasn't specified a separator between elements when it writes the data, the data will be written without anything to separate one item from the next. That means if you wrote two values of (for example) 1 and 0.2, what you'd read back in would not be 1 and 0.2, but a single value of 10.2. Adding separators to your text output will add yet more overhead (figure around 15% more data) to a process that's already failing because it generates too much data.
Writing in binary format means each float will consume precisely 4 bytes, so delimiters are not necessary to read the data back in correctly.
The next step after that would be to descend to a lower-level file I/O routine. Depending on the situation, this might or might not make much difference. On Windows, you can specify FILE_FLAG_NO_BUFFERING when you open a file with CreateFile. This means that reads and writes to that file will basically bypass the cache and go directly to the disk.
In your case, that's probably a win--at 10 MS/s, you're probably going to use up the cache space quite a while before you reread the same data. In such a case, letting the data go into the cache gains you virtually nothing, but costs you some data to copy data to the cache, then somewhat later copy it out to the disk. Worse, it's likely to pollute the cache with all this data, so it's no longer storing other data that's a lot more likely to benefit from caching.
I am reading block of data from volume snapshot using CreateFile/ReadFile and buffersize of 4096 bytes.
The problem I am facing is ReadFile is too slow, I am able to read 68439 blocks i.e. 267 Mb in 45 seconds, How can I increase the speed? Below is a part of my code that I am using,
if(block_handle != INVALID_HANDLE_VALUE)
DWORD pos = -1;
for(ULONG i = 0; i < 68439; i++)
sectorno = (i*8);
distance = sectorno * sectorsize;
phyoff.QuadPart = distance;
if(pos != phyoff.u.LowPart)
pos=SetFilePointer(block_handle, phyoff.u.LowPart,&phyoff.u.HighPart,FILE_BEGIN);
if (phyoff.u.LowPart == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR)
printf("SetFilePointer Error: %d\n", GetLastError());
phyoff.QuadPart = -1;
ret = ReadFile(block_handle, data, 4096, &dwRead, 0);
if(ret == FALSE)
printf("Error Read");
pos += 4096;
Should I use OVERLAPPED structure? or what can be the possible solution?
Note: The code is not threaded.
Awaiting a positive response.
I'm not quite sure why you're using these extremely low level system functions for this.
Personally I have used C-style file operations (using fopen and fread) as well as C++-style operations (using fstream and read, see this link), to read raw binary files. From a local disk the read speed is on the order of 100MB/second.
In your case, if you don't want to use the standard C or C++ file operations, my guess is that the reason your code is slower is due to the fact that you're performing a seek after each block. Do you really need to call SetFilePointer for every block? If the blocks are sequential you shouldn't need to do this.
Also, experiment with different block sizes, don't be afraid to go up and beyond 1MB.
Your problem is the fragmented data reads. You cannot solve this by fiddling with ReadFile parameters. You need to defragment your reads. here are three approaches:
Defragment the data on the disk
Defragment the reads. That is, collect all the reads you need, but do not read anything yet. Sort the reads into order. Read everything in order, skipping the SetFilePointer wherever possible ( i.e. sequential blocks ). This will speed the total read greatly, but introduce a lag before the first read starts.
Memory map the data. Copy ALL the data into memory and do random access reads from memory. Whether or not this is possible depends on how much data there is in total.
Also, you might want to get fancy, and experiment with caching. When you read a block of data, it might be that although the next read is not sequential, it may well have a high probability of being close by. So when you read a block, sequentially read an enormous block of nearby data into memory. Before the next read, check if the new read is already in memory - thus saving a seek and a disk access. Testing, debugging and tuning this is a lot of work, so I do not really recommend it unless this is a mission critical optimization. Also note that your OS and/or your disk hardware may already be doing something along these lines, so be prepared to see no improvement whatsoever.
If possible, read sequentially (and tell CreateFile you intend to read sequentially with FILE_FLAG_SEQUENTIAL_SCAN).
Avoid unnecessary seeks. If you're reading sequentially, you shouldn't need any seeks.
Read larger chunks (like an integer multiple of the typical cluster size). I believe Windows's own file copy uses reads on the order of 8 MB rather than 4 KB. Consider using an integer multiple of the system's allocation granularity (available from GetSystemInfo).
Read from aligned offsets (you seem to be doing this).
Read to a page-aligned buffer. Consider using VirtualAlloc to allocate the buffer.
Be aware that fragmentation of the file can cause expensive seeking. There's not much you can do about this.
Be aware that volume compression can make seeks especially expensive because it may have to decompress the file from the beginning to find the starting point in the middle of the file.
Be aware that volume encryption might slow things down. Not much you can do but be aware.
Be aware that other software, like anti-malware, may be scanning the entire file every time you touch it. Fewer operations will minimize this hit.