I have a program that basically does this:
Opens some binary file
Reads the file backwards (by backwards, I mean it starts near EOF, and ends reading at beginning of file, i.e. reads the file "right-to-left"), using 4MB chunks
Closes the file
My question is: why memory consumption looks like below, even though there are no obvious memory leaks in my attached code?
Here's the source of program that was run to obtain above image:
#include <stdio.h>
#include <string.h>
int main(void)
{
//allocate stuff
const int bufferSize = 4*1024*1024;
FILE *fileHandle = fopen("./input.txt", "rb");
if (!fileHandle)
{
fprintf(stderr, "No file for you\n");
return 1;
}
unsigned char *buffer = new unsigned char[bufferSize];
if (!buffer)
{
fprintf(stderr, "No buffer for you\n");
return 1;
}
//get file size. file can be BIG, hence the fseeko() and ftello()
//instead of fseek() and ftell().
fseeko(fileHandle, 0, SEEK_END);
off_t totalSize = ftello(fileHandle);
fseeko(fileHandle, 0, SEEK_SET);
//read the file... in reverse order. This is important.
for (off_t pos = totalSize - bufferSize, j = 0;
pos >= 0;
pos -= bufferSize, j ++)
{
if (j % 10 == 0)
{
fprintf(stderr,
"reading like crazy: %lld / %lld\n",
pos, totalSize);
}
/*
* below is the heart of the problem. see notes below
*/
//seek to desired position
fseeko(fileHandle, pos, SEEK_SET);
//read the chunk
fread(buffer, sizeof(unsigned char), bufferSize, fileHandle);
}
fclose(fileHandle);
delete []buffer;
}
I have also following observations:
Even though RAM usage jumps by 1GB, the whole program uses only 5MB thorough whole execution.
Commenting call to fread() out makes memory leak go away. This is weird, since I don't allocate anything anywhere near it, that could trigger memory leak...
Also, reading the file normally instead of backwards (= commenting call to fseeko() out), makes memory leak go away as well. This is the ultra-weird part.
Further information...
Following doesn't help:
Checking results of fread() - yields nothing out of ordinary.
Switching to normal, 32-bit fseek and ftell.
Doing stuff like setbuf(fileHandle, NULL).
Doing stuff like setvbuf(fileHandle, NULL, _IONBF, *any integer*).
Compiled with g++ 4.5.3 on Windows 7 via cygwin and mingw; without any optimalizations, just g++ test.cpp -o test. Both present such behaviour.
The file used in tests was 4GB long, full of zeros.
The weird pause in the middle of the chart could be explained with some kind of temporary I/O hangup, unrelated to this question.
Finally, if I wrap reading in infinite loop... the memory usage stops increasing after first iteration.
I think it has to do with some kind of internal cache building up till it's filled with whole file. How does it really work behind the scenes? How can I prevent that in a portable way??
I think, this is more an OS issue (or even an OS resource use reporting issue) than an issue with your program. Of course, it only uses 5 MB of memory: 1 MB for itself (libs, stack etc.) and 4 MB for the buffer. Whenever you do a fread(), the OS seems to "bind" part of the file to your process, and seems to release it not at the same speed. As memory use on your machine is low, this is not a problem: The OS just keeps the already read data "hanging around" longer than necessary, probably assuming, that your application might read it again, soon, and then it doesn't have to do that binding again.
If memory pressure was higher, than the OS is very likely to unbind the memory faster, so that jump on your memory usage history would be smaller.
I had the exact same problem, although in Java but it doesn't matter in this context. I solved it by reading much bigger chunks at a time. I also read 4Mb size chunks, but when I increased it to 100-200 Mb the problem went away. Perhaps it'll do that for you as well. I'm on Windows 7.
Related
I am working on a sorting project and I've come to the point where a main bottleneck is reading in the data. It takes my program about 20 seconds to sort 100,000,000 integers read in from stdin using cin and std::ios::sync_with_stdio(false); but it turns out that 10 of those seconds is reading in the data to sort. We do know how many integers we will be reading in (the count is at the top of the file we need to sort).
How can I make this faster? I know it's possible because a student in a previous semester was able to do counting sort in a little over 3 seconds (and that's basically purely read time).
The program is just fed the contents of a file with integers separated by newlines like $ ./program < numstosort.txt
Thanks
Here is the relevant code:
std::ios::sync_with_stdio(false);
int max;
cin >> max;
short num;
short* a = new short[max];
int n = 0;
while(cin >> num) {
a[n] = num;
n++;
}
This will get your data into memory about as fast as possible, assuming Linux/POSIX running on commodity hardware. Note that since you apparently aren't allowed to use compiler optimizations, C++ IO is not going to be the fastest way to read data. As others have noted, without optimizations the C++ code will not run anywhere near as fast as it can.
Given that the redirected file is already open as stdin/STDIN_FILENO, use low-level system call/C-style IO. That won't need to be optimized, as it will run just about as fast as possible:
struct stat sb;
int rc = ::fstat( STDIN_FILENO, &sb );
// use C-style calloc() to get memory that's been
// set to zero as calloc() is often optimized to be
// faster than a new followed by a memset().
char *data = (char *)::calloc( 1, sb.st_size + 1 );
size_t totalRead = 0UL;
while ( totalRead < sb.st_size )
{
ssize_t bytesRead = ::read( STDIN_FILENO,
data + totalRead, sb.st_size - totalRead );
if ( bytesRead <= 0 )
{
break;
}
totalRead += bytesRead;
}
// data is now in memory - start processing it
That code will read your data into memory as one long C-style string. And the lack of compiler optimizations won't matter one bit as it's all almost bare-metal system calls.
Using fstat() to get the file size allows allocating all the needed memory at once - no realloc() or copying data around is necessary.
You'll need to add some error checking, and a more robust version of the code would check to be sure the data returned from fstat() actually is a regular file with an actual size, and not a "useless use of cat" such as cat filename | YourProgram, because in that case the fstat() call won't return a useful file size. You'll need to examine the sb.st_mode field of the struct stat after the call to see what the stdin stream really is:
::fstat( STDIN_FILENO, &sb );
...
if ( S_ISREG( sb.st_mode ) )
{
// regular file...
}
(And for really high-performance systems, it can be important to ensure that the memory pages you're reading data into are actually mapped in your process address space. Performance can really stall if data arrives faster than the kernel's memory management system can create virtual-to-physical mappings for the pages data is getting dumped into.)
To handle a large file as fast as possible, you'd want to go multithreaded, with one thread reading data and feeding one or more data processing threads so you can start processing data before you're done reading it.
Edit: parsing the data.
Again, preventing compiler optimizations probably makes the overhead of C++ operations slower than C-style processing. Based on that assumption, something simple will probably run faster.
This would probably work a lot faster in a non-optimized binary, assuming the data is in a C-style string read in as above:
char *next;
long count = ::strtol( data, &next, 0 );
long *values = new long[ count ];
for ( long ii = 0; ii < count; ii++ )
{
values[ ii ] = ::strtol( next, &next, 0 );
}
That is also very fragile. It relies on strtol() skipping over leading whitespace, meaning if there's anything other than whitespace between the numeric values it will fail. It also relies on the initial count of values being correct. Again - that code will fail if that's not true. And because it can replace the value of next before checking for errors, if it ever goes off the rails because of bad data it'll be hopelessly lost.
But it should be about as fast as possible without allowing compiler optimizations.
That's what crazy about not allowing compiler optimizations. You can write simple, robust C++ code to do all your processing, make use of a good optimizing compiler, and probably run almost as fast as the code I posted - which has no error checking and will fail spectacularly in unexpected and undefined ways if fed unexpected data.
You can make it faster if you use a SolidState hard drive. If you want to ask something about code performance, you need to post how are you doing things in the first place.
You may be able to speed up your program by reading the data into a buffer, then converting the text in the buffer to internal representation.
The thought behind this is that all stream devices like to keep streaming. Starting and stopping the stream wastes time. A block read transfers a lot of data with one transaction.
Although cin is buffered, by using cin.read and a buffer, you can make the buffer a lot bigger than cin uses.
If the data has fixed width fields, there are opportunities to speed up the input and conversion processes.
Edit 1: Example
const unsigned int BUFFER_SIZE = 65536;
char text_buffer[BUFFER_SIZE];
//...
cin.read(text_buffer, BUFFER_SIZE);
//...
int value1;
int arguments_scanned = snscanf(&text_buffer, REMAINING_BUFFER_SIZE,
"%d", &value1);
The tricky part is handling the cases where the text of a number is cut off at the end of the buffer.
Can you ran this little test in compare to your test with and without commented line?
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[20] {0,};
int t = 0;
while( std::cin.get(buffer, 20) )
{
// t = std::atoi(buffer);
std::cin.ignore(1);
}
return 0;
}
Pure read test:
#include <iostream>
#include <cstdlib>
int main()
{
std::ios::sync_with_stdio(false);
char buffer[1024*1024];
while( std::cin.read(buffer, 1024*1024) )
{
}
return 0;
}
I have a complex interpreter reading in commands from (sometimes) multiples files (the exact details are out of scope) but it requires iterating over these multiple files (some could be GB is size, preventing nice buffering) multiple times.
I am looking to increase the speed of reading in each command from a file.
I have used the RDTSC (program counter) register to micro benchmark the code enough to know about >80% of the time is spent reading in from the files.
Here is the thing: the program that generates the input file is literally faster than to read in the file in my small interpreter. i.e. instead of outputting the file i could (in theory) just link the generator of the data to the interpreter and skip the file but that shouldn't be faster, right?
What am I doing wrong? Or is writing suppose to be 2x to 3x (at least) faster than reading from a file?
I have considered mmap but some of the results on http://lemire.me/blog/archives/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ appear to indicate it is no faster than ifstream. or would mmap help in this case?
details:
I have (so far) tried adding a buffer, tweaking parameters, removing the ifstream buffer (that slowed it down by 6x in my test case), i am currently at a loss for ideas after searching around.
The important section of the code is below. It does the following:
if data is left in buffer, copy form buffer to memblock (where it is then used)
if data is not left in the buffer, check to see how much data is left in the file, if more than the size of the buffer, copy a buffer sized chunk
if less than the file
//if data in buffer
if(leftInBuffer[activefile] > 0)
{
//cout <<bufferloc[activefile] <<"\n";
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
else //buffers blank
{
//read in block
long blockleft = (cfilemax -cfileplace) / 16 ;
int read=0;
/* slow block starts here */
if(blockleft >= MAXBUFELEMENTS)
{
currentFile->read((char *)(&(buffer[activefile][0])),16*MAXBUFELEMENTS);
leftInBuffer[activefile] = 16*MAXBUFELEMENTS;
bufferloc[activefile]=0;
read =16*MAXBUFELEMENTS;
}
else //read in part of the block
{
currentFile->read((char *)(&(buffer[activefile][0])),16*(blockleft));
leftInBuffer[activefile] = 16*blockleft;
bufferloc[activefile]=0;
read =16*blockleft;
}
/* slow block ends here */
memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16);
bufferloc[activefile]+=16;
leftInBuffer[activefile]-=16;
}
edit: this is on a mac, osx 10.9.5, with an i7 with a SSD
Solution:
as was suggested below, mmap was able to increase the speed by about 10x.
(for anyone else who searches for this)
specifically open with:
uint8_t * openMMap(string name, long & size)
{
int m_fd;
struct stat statbuf;
uint8_t * m_ptr_begin;
if ((m_fd = open(name.c_str(), O_RDONLY)) < 0)
{
perror("can't open file for reading");
}
if (fstat(m_fd, &statbuf) < 0)
{
perror("fstat in openMMap failed");
}
if ((m_ptr_begin = (uint8_t *)mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, m_fd, 0)) == MAP_FAILED)
{
perror("mmap in openMMap failed");
}
uint8_t * m_ptr = m_ptr_begin;
size = statbuf.st_size;
return m_ptr;
}
read by:
uint8_t * mmfile = openMMap("my_file", length);
uint32_t * memblockmm;
memblockmm = (uint32_t *)mmfile; //cast file to uint32 array
uint32_t data = memblockmm[0]; //take int
mmfile +=4; //increment by 4 as I read a 32 bit entry and each entry in mmfile is 8 bits.
This should be a comment, but I don't have 50 reputation to make a comment.
What is the value of MAXBUFELEMENTS? From my experience, many smaller reads is far slower than one read of larger size. I suggest to read the entire file in if possible, some files could be GBs, but even reading in 100MB at once would perform better than reading 1 MB 100 times.
If that's still not good enough, next thing you can try is to compress(zlib) input files(may have to break them into chunks due to size), and decompress them in memory. This method is usually faster than reading in uncompressed files.
As #Tony Jiang said, try experimenting with the buffer size to see if that helps.
Try mmap to see if that helps.
I assume that currentFile is a std::ifstream? There's going to be some overhead for using iostreams (for example, an istream will do its own buffering, adding an extra layer to what you're doing); although I wouldn't expect the overhead to be huge, you can test by using open(2) and read(2) directly.
You should be able to run your code through dtruss -e to verify how long the read system calls take. If those take the bulk of your time, then you're hitting OS and hardware limits, so you can address that by piping, mmap'ing, or adjusting your buffer size. If those take less time than you expect, then look for problems in your application logic (unnecessary work on each iteration, etc.).
RESOLVED
I'm trying to make a simple file loader.
I aim to get the text from a shader file (plain text file) into a char* that I will compile later.
I've tried this function:
char* load_shader(char* pURL)
{
FILE *shaderFile;
char* pShader;
// File opening
fopen_s( &shaderFile, pURL, "r" );
if ( shaderFile == NULL )
return "FILE_ER";
// File size
fseek (shaderFile , 0 , SEEK_END);
int lSize = ftell (shaderFile);
rewind (shaderFile);
// Allocating size to store the content
pShader = (char*) malloc (sizeof(char) * lSize);
if (pShader == NULL)
{
fputs ("Memory error", stderr);
return "MEM_ER";
}
// copy the file into the buffer:
int result = fread (pShader, sizeof(char), lSize, shaderFile);
if (result != lSize)
{
// size of file 106/113
cout << "size of file " << result << "/" << lSize << endl;
fputs ("Reading error", stderr);
return "READ_ER";
}
// Terminate
fclose (shaderFile);
return 0;
}
But as you can see in the code I have a strange size difference at the end of the process which makes my function crash.
I must say I'm quite a beginner in C so I might have missed some subtilities regarding the memory allocation, types, pointers...
How can I solve this size issue?
*EDIT 1:
First, I shouldn't return 0 at the end but pShader; that seemed to be what crashed the program.
Then, I change the type of reult to size_t, and added a end character to pShader, adding pShdaer[result] = '/0'; after its declaration so I can display it correctly.
Finally, as #JamesKanze suggested, I turned fopen_s into fopen as the previous was not usefull in my case.
First, for this sort of raw access, you're probably better off
using the system level functions: CreateFile or open,
ReadFile or read and CloseHandle or close, with
GetFileSize or stat to get the size. Using FILE* or
std::filebuf will only introduce an additional level of
buffering and processing, for no gain in your case.
As to what you are seeing: there is no guarantee that an ftell
will return anything exploitable as a numeric value; it could
very well be just a magic cookie. On most current systems, it
is a byte offset into the physical file, but on any non-Unix
system, the offset into the physical file will not map directly
to the logical file you are reading unless you open the file in
binary mode. If you use "rb" to open the file, you'll
probably see the same values. (Theoretically, you could get
extra 0's at the end of the file, but practically, the OS's
where that happened are either extinct, or only used on legacy
mainframes.)
EDIT:
Since the answer stating this has been deleted: you should loop
on the fread until it returns 0 (setting errno to 0 before
each call, and checking it after the return to see whether the
function returned because of an error or because it reached the
end of file). Having said this: if you're on one of the usual
Windows or Unix systems, and the file is local to the machine,
and not too big, fread will read it all in one go. The
difference in size you are seeing (given the numerical values
you posted) is almost certainly due to the fact that the two
byte Windows line endings are being mapped to a single '\n'
character. To avoid this, you must open in binary mode;
alternatively, if you really are dealing with text (and want
this mapping), you can just ignore the extra bytes in your
buffer, setting the '\0' terminator after the last byte
actually read.
Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
}
Matrix is defined globally to be a unsigned char matrix[3][3];
IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.
The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.
The top rated answer claims that IO is slow.
Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.
Write 10 million records from a nine byte array
Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1
340ms to /dev/null
710ms to 90MB output file
15254ms to 90MB output file in "dribs" mode
FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0
450ms to /dev/null
550ms to 90MB output file on ZFS triple mirror
1150ms to 90MB output file on FFS system drive
22154ms to 90MB output file in "dribs" mode
There's nothing slow about IO if you can afford to buffer properly.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char* argv[])
{
int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
int err;
int i;
enum { BigBuf = 4*1024*1024 };
char* outbuf = malloc (BigBuf);
assert (outbuf != NULL);
err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering
assert (err == 0);
enum { ArraySize = 9 };
char temp[ArraySize];
enum { Count = 10*1000*1000 };
for (i = 0; i < Count; ++i) {
fwrite (temp, 1, ArraySize, stdout);
if (dribs) fflush (stdout);
}
fflush (stdout); // seems to be needed after setting own buffer
fclose (stdout);
if (outbuf) { free (outbuf); outbuf = NULL; }
}
The rawest form of output you can do is the probable the write system call, like this
write (1, matrix, 9);
1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.
I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this
fcntl (1, F_SETFL, O_NONBLOCK);
YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.
Perhaps your problem is not that fwrite() is slow, but that it is buffered.
Try calling fflush(stdout) after the fwrite().
This all really depends on your definition of slow in this context.
All printing is fairly slow, although iostreams are really slow for printing.
Your best bet would be to use printf, something along the lines of:
printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);
As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.
If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt
What's wrong with:
fwrite(matrix,1,9,stdout);
both the one and the two dimensional arrays take up the same memory.
Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.
So first, don't print on every entry. Basically what i am saying is do not do like that.
for(int i = 0; i<100; i++){
printf("Your stuff");
}
instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that
char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
char[i] = 1; //your 8 byte value goes here
}
//once you are done print it to a ocnsole with
write(1, buffer, 100);
but in your case, just use write(1, temp, 9);
I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:
❯ yes | dd of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s
vs
> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s
The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.
You can simply:
std::cout << temp;
printf is more C-Style.
Yet, IO operations are costly, so use them wisely.
Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times
using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.
Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."
#include <stdio.h>
int main ()
{
const int FILE_SiZE = 50000; //size in KB
const int BUFFER_SIZE = 1024;
char buffer [BUFFER_SIZE + 1];
int i;
for(i = 0; i < BUFFER_SIZE; i++)
buffer[i] = (char)(i%8 + 'a');
buffer[BUFFER_SIZE] = '\0';
FILE *pFile = fopen ("somefile.txt", "w");
for (i = 0; i < FILE_SIZE; i++)
fprintf(pFile, buffer);
fclose(pFile);
return 0;
}
You haven't mentioned the OS but I'll assume creat/open/close/write are available.
For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:
open the file.
allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
print repeated string into the memory 4k times, filling the blocks precisely.
Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
close the file.
This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.
This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.
I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.
int i;
FILE *fp;
fp = fopen(fname,"wb");
if (fp != NULL) {
// create big block's data
uint8_t b[278528]; // some big chunk size
for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
{
b[i] = 0xFF;
}
// write all blocks to file
for( i = 0; i < TOT_BLOCKS; i++ )
fwrite(&b, sizeof(b), 1, fp);
fclose (fp);
}
Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.
Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.
Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.
preallocate the file using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
create a string containing the "Added first line\n" a thousand times.
find it's length.
preallocate the file using old style file io
fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file
open the file for read/write
loop 4000 times,
writing the string to the file.
close the file.
That's my best guess.
I'm sure there are a lot of ways to do it.