How can I read multiple files faster?

How can I read multiple files faster? - c++

in my program I want to read several text files (more than ~800 files), each with 256 lines and their filenames starting from 1.txt to n.txt, and store them into a database after several processing steps. My problem is the data's reading speed. I could speed the program up to about twice the speed it had before by using OpenMP multithreading for the reading loop. Is there a way to speed it up a bit more? My actual code is
std::string CCD_Folder = CCDFolder; //CCDFolder is a pointer to a char array
int b = 0;
int PosCounter = 0;
int WAVENUMBER, WAVELUT;
std::vector<std::string> tempstr;
std::string inputline;
//Input
omp_set_num_threads(YValue);
#pragma omp parallel for private(WAVENUMBER) private(WAVELUT) private(PosCounter) private(tempstr) private(inputline)
for(int i = 1; i < (CCD_Filenumbers+1); i++)
{
//std::cout << omp_get_thread_num() << ' ' << i << '\n';
//Umwandlung und Erstellung des Dateinamens, Öffnen des Lesekanals
std::string CCD_Filenumber = boost::lexical_cast<string>(i);
std::string CCD_Filename = CCD_Folder + '\\' + CCD_Filenumber + ".txt";
std::ifstream datain(CCD_Filename, std::ifstream::in);
while(!datain.eof())
{
std::getline(datain, inputline);
//Processing
};
};
All variables which are not defined here are defined somewhere else in my code, and it is working. So is there a possibility to speed this code a bit more up?
Thank you very much!

Some experiment:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Windows.h>
void generateFiles(int n) {
char fileName[32];
char fileStr[1032];
for (int i=0;i<n;i++) {
sprintf( fileName, "c:\\t\\%i.txt", i );
FILE * f = fopen( fileName, "w" );
for (int j=0;j<256;j++) {
int lineLen = rand() % 1024;
memset(fileStr, 'X', lineLen );
fileStr[lineLen] = 0x0D;
fileStr[lineLen+1] = 0x0A;
fileStr[lineLen+2] = 0x00;
fwrite( fileStr, 1, lineLen+2, f );
}
fclose(f);
}
}
void readFiles(int n) {
char fileName[32];
for (int i=0;i<n;i++) {
sprintf( fileName, "c:\\t\\%i.txt", i );
FILE * f = fopen( fileName, "r" );
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
}
}
DWORD WINAPI readInThread( LPVOID lpParam )
{
int * number = (int *)lpParam;
char fileName[32];
sprintf( fileName, "c:\\t\\%i.txt", *number );
FILE * f = fopen( fileName, "r" );
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
return 0;
}
int main(int argc, char ** argv) {
long t1 = GetTickCount();
generateFiles(256);
printf("Write: %li ms\n", GetTickCount() - t1 );
t1 = GetTickCount();
readFiles(256);
printf("Read: %li ms\n", GetTickCount() - t1 );
t1 = GetTickCount();
const int MAX_THREADS = 256;
int pDataArray[MAX_THREADS];
DWORD dwThreadIdArray[MAX_THREADS];
HANDLE hThreadArray[MAX_THREADS];
for( int i=0; i<MAX_THREADS; i++ )
{
pDataArray[i] = (int) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY,
sizeof(int));
pDataArray[i] = i;
hThreadArray[i] = CreateThread(
NULL,
0,
readInThread,
&pDataArray[i],
0,
&dwThreadIdArray[i]);
}
WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
printf("Read (threaded): %li ms\n", GetTickCount() - t1 );
}
first function just ugly thing to make a test dataset ( I know it can be done much better, but I honestly have no time )
1st experiment - sequential read
2nd experiment - read all in parallel
results:
256 files:
Write: 250 ms
Read: 140 ms
Read (threaded): 78 ms
1024 files:
Write: 1250 ms
Read: 547 ms
Read (threaded): 843 ms
I think second attempt clearly shows that on a long run 'dumb' thread creation just makes things even worse. Of course it needs improvements in a sense of preallocated workers, some thread pool etc, but I think with such fast operation as reading 100-200k from disk there is no really benefit of moving this functionality into thread. I have no time to write more 'clever' solution, but I have my doubts that it will be much faster because you will have to add system calls for mutexes etc...
going extreme you could think of preallocating memory pools etc.. but as being mentioned before code your posted just wrong.. it's a matter of milliseconds, but for sure not seconds
800 files (20 chars per line, 256 lines)
Write: 250 ms
Read: 63 ms
Read (threaded): 500 ms
Conclusion:
ANSWER IS:
Your reading code is wrong, you reading files so slow that there is a significant increase in speed then you make tasks runs in parallel. In the code above reading is actually faster then an expenses to spawn a thread

Your primary bottleneck is physically reading from the hard disk.
Unless you have the files on separate drives, the drive can only read data from one file at a time. Your best bet is to read each file as a whole rather read a portion of one file, tell the drive to locate to another file, read from there, and repeat. Repositioning the drive head to other locations, especially other files, is usually more expensive than letting the drive finish reading the single file.
The next bottle neck is the data channel between the processor and the hard drive. If your hard drives share any kind of communications channel, you will see a bottleneck, as data from each drive must come through the communications channel to your processor. Your processor will be sending commands to the drive(s) through this communications channel (PATA, SATA, USB, etc.).
The objective of the next steps is to reduce the overhead of the "middle men" between your program's memory and the hard drive communications interface. The most efficient is to access the controller directly; lesser efficient are using the OS functions; the "C" functions (fread and familiy) and least is the C++ streams. With increased efficiency comes tighter coupling with the platform and reduced safety (and simplicity).
I suggest the following:
Create multiple buffers in memory, large enough to save time, small
enough to prevent the OS from paging the memory to the hard drive.
Create a thread that reads the files into memory, as necessary.
Search the web for "double buffering". As long as there is space in
the buffer, this thread will read data.
Create multiple "outgoing" buffers.
Create a second thread that removes data from memory and "processes"
it, and inserts into the "outgoing" buffers.
Create a third thread that takes the data in the "outgoing" buffers
and sends to the databases.
Adjust the size of the buffers for the best efficiency within the
limitations of memory.
If you can access the DMA channels, use them to read from the hard drive into the "read buffers".
Next, you can optimize your code to efficiently use the data cache of the processor. For example, set up your "processing" so the data structures to not exceed a data line in the cache. Also, optimize your code to use registers (either specify the register keyword or use statement blocks so that the compiler knows when variables can be reused).
Other optimizations that may help:
Align data to the processors native word size, pad if necessary. For
example, prefer using 32 bytes instead of 13 or 24.
Fetch data in quantities of the processor's word size. For example,
access 4 octets (bytes) at a time on a 32-bit processor rather than 4
accesses of 1 byte.
Unroll loops - more instructions inside the loop, as branch
instructions slow down processing.

You are probably hitting the read limit of your disks, which means your options are somewhat limited. If this is a constant problem you could consider a different RAID structure, which will give you greater read throughput because more than one read head can access data at the same time.
To see if disk access really is the bottleneck, run your program with the time command:
>> /usr/bin/time -v <my program>
In the output you'll see how much CPU time you were utilizing compared to the amount of time required for things like disk access.

I would try going with C code for reading the file. I suspect that it'll be faster.
FILE* f = ::fopen( CCD_Filename.c_str(), "rb" );
if( f == NULL )
{
return;
}
::fseek( f, 0, SEEK_END );
const long lFileBytes = ::ftell( f );
::fseek( f, 0, SEEK_SET );
char* fileContents = new char[lFileBytes + 1];
const size_t numObjectsRead = ::fread( fileContents, lFileBytes, 1, f );
::fclose( f );
if( numObjectsRead < 1 )
{
delete [] fileContents;
return;
}
fileContents[lFileBytes] = '\0';
// assign char buffer of file contents here
delete [] fileContents;

Related

Fastest way to read millions of integers from stdin C++?

I am working on a sorting project and I've come to the point where a main bottleneck is reading in the data. It takes my program about 20 seconds to sort 100,000,000 integers read in from stdin using cin and std::ios::sync_with_stdio(false); but it turns out that 10 of those seconds is reading in the data to sort. We do know how many integers we will be reading in (the count is at the top of the file we need to sort).
How can I make this faster? I know it's possible because a student in a previous semester was able to do counting sort in a little over 3 seconds (and that's basically purely read time).
The program is just fed the contents of a file with integers separated by newlines like $ ./program < numstosort.txt
Thanks
Here is the relevant code:
std::ios::sync_with_stdio(false);
int max;
cin >> max;
short num;
short* a = new short[max];
int n = 0;
while(cin >> num) {
a[n] = num;
n++;
}

This will get your data into memory about as fast as possible, assuming Linux/POSIX running on commodity hardware. Note that since you apparently aren't allowed to use compiler optimizations, C++ IO is not going to be the fastest way to read data. As others have noted, without optimizations the C++ code will not run anywhere near as fast as it can.
Given that the redirected file is already open as stdin/STDIN_FILENO, use low-level system call/C-style IO. That won't need to be optimized, as it will run just about as fast as possible:
struct stat sb;
int rc = ::fstat( STDIN_FILENO, &sb );
// use C-style calloc() to get memory that's been
// set to zero as calloc() is often optimized to be
// faster than a new followed by a memset().
char *data = (char *)::calloc( 1, sb.st_size + 1 );
size_t totalRead = 0UL;
while ( totalRead < sb.st_size )
{
ssize_t bytesRead = ::read( STDIN_FILENO,
data + totalRead, sb.st_size - totalRead );
if ( bytesRead <= 0 )
{
break;
}
totalRead += bytesRead;
}
// data is now in memory - start processing it
That code will read your data into memory as one long C-style string. And the lack of compiler optimizations won't matter one bit as it's all almost bare-metal system calls.
Using fstat() to get the file size allows allocating all the needed memory at once - no realloc() or copying data around is necessary.
You'll need to add some error checking, and a more robust version of the code would check to be sure the data returned from fstat() actually is a regular file with an actual size, and not a "useless use of cat" such as cat filename | YourProgram, because in that case the fstat() call won't return a useful file size. You'll need to examine the sb.st_mode field of the struct stat after the call to see what the stdin stream really is:
::fstat( STDIN_FILENO, &sb );
...
if ( S_ISREG( sb.st_mode ) )
{
// regular file...
}
(And for really high-performance systems, it can be important to ensure that the memory pages you're reading data into are actually mapped in your process address space. Performance can really stall if data arrives faster than the kernel's memory management system can create virtual-to-physical mappings for the pages data is getting dumped into.)
To handle a large file as fast as possible, you'd want to go multithreaded, with one thread reading data and feeding one or more data processing threads so you can start processing data before you're done reading it.
Edit: parsing the data.
Again, preventing compiler optimizations probably makes the overhead of C++ operations slower than C-style processing. Based on that assumption, something simple will probably run faster.
This would probably work a lot faster in a non-optimized binary, assuming the data is in a C-style string read in as above:
char *next;
long count = ::strtol( data, &next, 0 );
long *values = new long[ count ];
for ( long ii = 0; ii < count; ii++ )
{
values[ ii ] = ::strtol( next, &next, 0 );
}
That is also very fragile. It relies on strtol() skipping over leading whitespace, meaning if there's anything other than whitespace between the numeric values it will fail. It also relies on the initial count of values being correct. Again - that code will fail if that's not true. And because it can replace the value of next before checking for errors, if it ever goes off the rails because of bad data it'll be hopelessly lost.
But it should be about as fast as possible without allowing compiler optimizations.
That's what crazy about not allowing compiler optimizations. You can write simple, robust C++ code to do all your processing, make use of a good optimizing compiler, and probably run almost as fast as the code I posted - which has no error checking and will fail spectacularly in unexpected and undefined ways if fed unexpected data.

You can make it faster if you use a SolidState hard drive. If you want to ask something about code performance, you need to post how are you doing things in the first place.

You may be able to speed up your program by reading the data into a buffer, then converting the text in the buffer to internal representation.
The thought behind this is that all stream devices like to keep streaming. Starting and stopping the stream wastes time. A block read transfers a lot of data with one transaction.
Although cin is buffered, by using cin.read and a buffer, you can make the buffer a lot bigger than cin uses.
If the data has fixed width fields, there are opportunities to speed up the input and conversion processes.
Edit 1: Example
const unsigned int BUFFER_SIZE = 65536;
char text_buffer[BUFFER_SIZE];
//...
cin.read(text_buffer, BUFFER_SIZE);
//...
int value1;
int arguments_scanned = snscanf(&text_buffer, REMAINING_BUFFER_SIZE,
"%d", &value1);
The tricky part is handling the cases where the text of a number is cut off at the end of the buffer.

Can you ran this little test in compare to your test with and without commented line?
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[20] {0,};
int t = 0;
while( std::cin.get(buffer, 20) )
{
// t = std::atoi(buffer);
std::cin.ignore(1);
}
return 0;
}
Pure read test:
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[1024*1024];
while( std::cin.read(buffer, 1024*1024) )
{
}
return 0;
}

C++ ifstream::read slow due to memcpy

Recently I decided to optimize some file reading I was doing, because as everyone says, reading a large chunk of data to a buffer and then working with it is faster than using lots of small reads. And my code certainly is much faster now, but after doing some profiling it appears memcpy is taking up a lot of time.
The gist of my code is...
ifstream file("some huge file");
char buffer[0x1000000];
for (yada yada) {
int size = some arbitrary size usually around a megabyte;
file.read(buffer, size);
//Do stuff with buffer
}
I'm using Visual Studio 11 and after profiling my code it says ifstream::read() eventually calls xsgetn() which copies from the internal buffer to my buffer. This operation takes up over 80% of the time! In second place comes uflow() which takes up 10% of the time.
Is there any way I can get around this copying? Can I somehow tell the ifstream to buffer the size I need directly into my buffer? Does the C-style FILE* also use such an internal buffer?
UPDATE: Due to people telling me to use cstdio... I have done a benchmark.
EDIT: Unfortunately the old code was full of fail (it wasn't even reading the entire file!). You can see it here: http://pastebin.com/4dGEQ6S7
Here's my new benchmark:
const int MAX = 0x10000;
char buf[MAX];
string fpath = "largefile";
int main() {
{
clock_t start = clock();
ifstream file(fpath, ios::binary);
while (!file.eof()) {
file.read(buf, MAX);
}
clock_t end = clock();
cout << end-start << endl;
}
{
clock_t start = clock();
FILE* file = fopen(fpath.c_str(), "rb");
setvbuf(file, NULL, _IOFBF, 1024);
while (!feof(file)) {
fread(buf, 0x1, MAX, file);
}
fclose(file);
clock_t end = clock();
cout << end-start << endl;
}
{
clock_t start = clock();
HANDLE file = CreateFile(fpath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_ALWAYS, NULL, NULL);
while (true) {
DWORD used;
ReadFile(file, buf, MAX, &used, NULL);
if (used < MAX) break;
}
CloseHandle(file);
clock_t end = clock();
cout << end-start << endl;
}
system("PAUSE");
}
Times are:
185
80
78
Well... looks like using the C-style fread is faster than ifstream::read. As well, using the windows ReadFile gives only a slight advantage which is negligible (I looked at the code and fread basically is a wrapper around ReadFile). Looks like I'll be switching to fread after all.
Man it is confusing to write a benchmark which actually tests this stuff correctly.
CONCLUSION: Using <cstdio> is faster than <fstream>. The reason fstream is slower is because c++ streams have their own internal buffer. This results in extra copying whenever you read/write and this copying accounts for the entire extra time taken by fstream. Even more shocking is that the extra time taken is longer than the time taken to actually read the file.

Can I somehow tell the ifstream to buffer the size I need directly
into my buffer?
Yes, this is what pubsetbuf() is for.
But if you're that concerned with copying whlie reading a file, consider memory mapping as well, boost has a portable implementation.

If you want to speed up file I/O I suggest you to use the good ol' <cstdio> because it can outperform the C++ one by a large margin.

It has been proven several times that the fastest way of reading data is mmap() on linux systems. I don't know about Windows. However it for sure will do without this buffering.
fopen(), fread(), fwrite() (FILE*) is somewhat higher-level and may induce a buffer, while open(), read(), write() functions are low level and the only buffer you may have there come from the Os kernel.

What C++ Write function should I use?

I prefer not to use XML library parser out there, so can you give me suggestion which good write function to use to write data to XML file? I will make alot of to calls to the write function so the write function should be able to keep track of the last write position and it should not take too much resource. I have two different write below but I can't keep track the last write position unless I have to read the file until end of file.
case#1
FILE *pfile = _tfopen(GetFileNameXML(), _T("w"));
if(pfile)
{
_fputts(TEXT(""), pfile);
}
if(pfile)
{
fclose(pfile);
pfile = NULL;
}
case#2
HANDLE hFile = CreateFile(GetFileNameXML(), GENERIC_READ|GENERIC_WRITE,
FILE_SHARE_WRITE|FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
WriteFile(hFile,,,,,);
}
CloseHandle(hFile);
thanks.

If all you need is to write some text files, use C++'s standard library file facilities. The samples here will be helpful: http://www.cplusplus.com/doc/tutorial/files/

First, what's your aversion to using a standard XML processing library?
Next, if you decide to roll your own, definitely don't go directly at the Win32 APIs - at least not unless you're going to write out the generated XML in large chunks, or you're going to implement your own buffering layer.
It's not going to matter for dealing with tiny files, but you specifically mention good performance and many calls to the write function. WriteFile has a fair amount of overhead, it does a lot of work and involves user->kernel->user mode switches, which are expensive. If you're dealing with "normally sized" XML files you probably won't be able to see much of a difference, but if you're generating monstrously sized dumps it's definitely something to keep in mind.
You mention tracking the last write position - first off, it should be easy... with FILE buffers you have ftell, with raw Win32 API you have SetFilePointerEx - call it with liDistanceToMove=0 and dwMoveMethod=FILE_CURRENT, and you get the current file position after a write. But why do you need this? If you're streaming out an XML file, you should generally keep on streaming until you're done writing - are you closing and re-opening the file? Or are you writing a valid XML file which you want to insert more data into later?
As for the overhead of the Win32 file functions, it may or may not be relevant in your case (depending on the size of the files you're dealing with), but with larger files it matters a lot - included below is a micro-benchmark that simpy reads a file to memory with ReadFile, letting you specify different buffer sizes from the command line. It's interesting to look at, say, Process Explorer's IO tab while running the tool. Here's some statistics from my measly laptop (Win7-SP1 x64, core2duo P7350#2.0GHz, 4GB ram, 120GB Intel-320 SSD).
Take it for what it is, a micro-benchmark. The performance might or might not matter in your particular situation, but I do believe the numbers demonstrate that there's considerable overhead to the Win32 file APIs, and that doing a little buffering of your own helps.
With a fully cached 2GB file:
BlkSz Speed
32 14.4MB/s
64 28.6MB/s
128 56MB/s
256 107MB/s
512 205MB/s
1024 350MB/s
4096 800MB/s
32768 ~2GB/s
With a "so big there will only be cache misses" 4GB file:
BlkSz Speed CPU
32 13MB/s 49%
64 26MB/s 49%
128 52MB/s 49%
256 99MB/s 49%
512 180MB/s 49%
1024 200MB/s 32%
4096 185MB/s 22%
32768 205MB/s 13%
Keep in mind that 49% CPU usage means that one CPU core is pretty much fully pegged - a single thread can't really push the machine much harder. Notice the pathological behavior of the 4kb buffer in the second table - it was reproducible, and I don't have an explanation for it.
Crappy micro-benchmark code goes here:
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <string>
#include <assert.h>
unsigned getDuration(FILETIME& timeStart, FILETIME& timeEnd)
{
// duration is in 100-nanoseconds, we want milliseconds
// 1 millisecond = 1000 microseconds = 1000000 nanoseconds
LARGE_INTEGER ts, te, res;
ts.HighPart = timeStart.dwHighDateTime; ts.LowPart = timeStart.dwLowDateTime;
te.HighPart = timeEnd.dwHighDateTime; te.LowPart = timeEnd.dwLowDateTime;
res.QuadPart = ((te.QuadPart - ts.QuadPart) / 10000);
assert(res.QuadPart < UINT_MAX);
return res.QuadPart;
}
int main(int argc, char* argv[])
{
if(argc < 2) {
puts("Syntax: ReadFile [filename] [blocksize]");
return 0;
}
char *filename= argv[1];
int blockSize = atoi(argv[2]);
if(blockSize < 1) {
puts("Please specify a blocksize larger than 0");
return 1;
}
HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
if(INVALID_HANDLE_VALUE == hFile) {
puts("error opening input file");
return 1;
}
std::vector<char> buffer(blockSize);
LARGE_INTEGER fileSize;
if(!GetFileSizeEx(hFile, &fileSize)) {
puts("Failed getting file size.");
return 1;
}
std::cout << "File size " << fileSize.QuadPart << ", that's " << (fileSize.QuadPart / blockSize) <<
" blocks of " << blockSize << " bytes - reading..." << std::endl;
FILETIME dummy, kernelStart, userStart;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelStart, &userStart);
DWORD ticks = GetTickCount();
DWORD bytesRead = 0;
do {
if(!ReadFile(hFile, &buffer[0], blockSize, &bytesRead, 0)) {
puts("Error calling ReadFile");
return 1;
}
} while(bytesRead == blockSize);
ticks = GetTickCount() - ticks;
FILETIME kernelEnd, userEnd;
GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelEnd, &userEnd);
CloseHandle(hFile);
std::cout << "Reading with " << blockSize << " sized blocks took " << ticks << "ms, spending " <<
getDuration(kernelStart, kernelEnd) << "ms in kernel and " <<
getDuration(userStart, userEnd) << "ms in user mode. Hit enter to countinue." << std::endl;
std::string dummyString;
std::cin >> dummyString;
return 0;
}

C/C++ best way to send a number of bytes to stdout

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
}
Matrix is defined globally to be a unsigned char matrix[3][3];

IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.
The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.

The top rated answer claims that IO is slow.
Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.
Write 10 million records from a nine byte array
Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1
340ms to /dev/null
710ms to 90MB output file
15254ms to 90MB output file in "dribs" mode
FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0
450ms to /dev/null
550ms to 90MB output file on ZFS triple mirror
1150ms to 90MB output file on FFS system drive
22154ms to 90MB output file in "dribs" mode
There's nothing slow about IO if you can afford to buffer properly.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char* argv[])
{
int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
int err;
int i;
enum { BigBuf = 4*1024*1024 };
char* outbuf = malloc (BigBuf);
assert (outbuf != NULL);
err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering
assert (err == 0);
enum { ArraySize = 9 };
char temp[ArraySize];
enum { Count = 10*1000*1000 };
for (i = 0; i < Count; ++i) {
fwrite (temp, 1, ArraySize, stdout);
if (dribs) fflush (stdout);
}
fflush (stdout); // seems to be needed after setting own buffer
fclose (stdout);
if (outbuf) { free (outbuf); outbuf = NULL; }
}

The rawest form of output you can do is the probable the write system call, like this
write (1, matrix, 9);
1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.
I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this
fcntl (1, F_SETFL, O_NONBLOCK);
YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.

Perhaps your problem is not that fwrite() is slow, but that it is buffered.
Try calling fflush(stdout) after the fwrite().
This all really depends on your definition of slow in this context.

All printing is fairly slow, although iostreams are really slow for printing.
Your best bet would be to use printf, something along the lines of:
printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.
If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt

What's wrong with:
fwrite(matrix,1,9,stdout);
both the one and the two dimensional arrays take up the same memory.

Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.

So first, don't print on every entry. Basically what i am saying is do not do like that.
for(int i = 0; i<100; i++){
printf("Your stuff");
}
instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that
char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
char[i] = 1; //your 8 byte value goes here
}
//once you are done print it to a ocnsole with
write(1, buffer, 100);
but in your case, just use write(1, temp, 9);

I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:
❯ yes | dd of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s
vs
> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s
The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.

You can simply:
std::cout << temp;
printf is more C-Style.
Yet, IO operations are costly, so use them wisely.

Fastest way to create large file in c++?

Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times

using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.

Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."
#include <stdio.h>
int main ()
{
const int FILE_SiZE = 50000; //size in KB
const int BUFFER_SIZE = 1024;
char buffer [BUFFER_SIZE + 1];
int i;
for(i = 0; i < BUFFER_SIZE; i++)
buffer[i] = (char)(i%8 + 'a');
buffer[BUFFER_SIZE] = '\0';
FILE *pFile = fopen ("somefile.txt", "w");
for (i = 0; i < FILE_SIZE; i++)
fprintf(pFile, buffer);
fclose(pFile);
return 0;
}

You haven't mentioned the OS but I'll assume creat/open/close/write are available.
For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:
open the file.
allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
print repeated string into the memory 4k times, filling the blocks precisely.
Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
close the file.
This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.
This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.

I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.
int i;
FILE *fp;
fp = fopen(fname,"wb");
if (fp != NULL) {
// create big block's data
uint8_t b[278528]; // some big chunk size
for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
{
b[i] = 0xFF;
}
// write all blocks to file
for( i = 0; i < TOT_BLOCKS; i++ )
fwrite(&b, sizeof(b), 1, fp);
fclose (fp);
}
Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.

Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.
Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.
preallocate the file using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
create a string containing the "Added first line\n" a thousand times.
find it's length.
preallocate the file using old style file io
fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file
open the file for read/write
loop 4000 times,
writing the string to the file.
close the file.
That's my best guess.
I'm sure there are a lot of ways to do it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js