zlib gzgets extremely slow? - c++

I'm doing stuff related to parsing huge globs of textfiles, and was testing what input method to use.
There is not much of a difference using c++ std::ifstreams vs c FILE,
According to the documentation of zlib, it supports uncompressed files, and will read the file without decompression.
I'm seeing a difference from 12 seconds using non zlib to more than 4 minutes using zlib.h
This I've tested doing multiple runs, so its not a disk cache issue.
Am I using zlib in some wrong way?
thanks
#include <zlib.h>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#define LENS 1000000
size_t fg(const char *fname){
fprintf(stderr,"\t-> using fgets\n");
FILE *fp =fopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(NULL!=fgets(buffer,LENS,fp))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t is(const char *fname){
fprintf(stderr,"\t-> using ifstream\n");
std::ifstream is(fname,std::ios::in);
size_t nLines =0;
char *buffer = new char[LENS];
while(is. getline(buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t iz(const char *fname){
fprintf(stderr,"\t-> using zlib\n");
gzFile fp =gzopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(0!=gzgets(fp,buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
int main(int argc,char**argv){
if(atoi(argv[2])==0)
fg(argv[1]);
if(atoi(argv[2])==1)
is(argv[1]);
if(atoi(argv[2])==2)
iz(argv[1]);
}

I guess you are using zlib-1.2.3. In this version, gzgets() is virtually calling gzread() for each byte. Calling gzread() in this way has a big overhead. You can compare the CPU time of calling gzread(gzfp, buffer, 4096) once and of calling gzread(gzfp, buffer, 1) for 4096 times. The result is the same, but the CPU time is hugely different.
What you should do is to implement buffered I/O for zlib, reading ~4KB data in a chunk with one gzread() call (like what fread() does for read()). The latest zlib-1.2.5 is said to be significantly improved on gzread/gzgetc/.... You may try that as well. As it is released very recently, I have not tried personally.
EDIT:
I have tried zlib-1.2.5 just now. gzgetc and gzgets in 1.2.5 are much faster than those in 1.2.3.

Related

Buffering putc write

I'm new to C++ and am making an app that uses a lot of putc to write data in output which is file. Because of high writes its being slowed down, I used to code in Delphi, so I know how to solve it, like make a memory stream and write into it every time we need to write into output, and if size of memory stream is larger than buffer size we want, write it into output and clear the memory stream. How should I do this with C++ or any better solution?
putc is already buffered, 4 KB is default you can use setvbuf for changing that value :D
setvbuf
Writing to a file should be very quick. It is usually the emptying of the buffer that takes some time. Consider using the character \n instead of std::endl.
I think a good answer to your question is here: Writing a binary file in C++ very fast
Where the answer is:
#include <stdio.h>
const unsigned long long size = 8ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
FILE* pFile;
pFile = fopen("file.binary", "wb");
for (unsigned long long j = 0; j < 1024; ++j){
//Some calculations to fill a[]
fwrite(a, 1, size*sizeof(unsigned long long), pFile);
}
fclose(pFile);
return 0;
}
The most important thing in your case is to write as much data you can, with the least possible I/O requests.

Binary Files in C++, changing the content of raw data on an audio file

I have never worked with binary files before. I opened an .mp3 file using the mode ios::binary, read data from it, assigned 0 to each byte read and then rewrote them to another file opened in ios::binary mode. I opened the output file on a media player, it sounds corrupted but I can still hear the song. I want to know what happened physically.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
Here is my code:
#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;
int main(){
char buffer[256];
ifstream inFile;
inFile.open("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile;
outFile.open("Output.mp3",ios::binary);
while(!inFile.eof()){
inFile.read(buffer,256);
for(int i = 0; i<strlen(buffer); i++){
buffer[i] = 0;
}
outFile.write(buffer,256);
}
inFile.close();
outFile.close();
}
What you did has nothing to do with binary files or audio. You simply copied the file while zeroing some of the bytes. (The reason you didn't zero all of the bytes is because you use i<strlen(buffer), which simply counts up to the first zero byte rather than reporting the size of the buffer. Also you modify the buffer which means strlen(buffer) will report the length as zero after you zero the first byte.)
So the exact change in audio you get is entirely dependent on the mp3 file format and the audio compression it uses. MP3 is not an audio format that can be directly manipulated in useful ways.
If you want to manipulate digital audio, you need to learn about how raw audio is represented by computers.
It's actually not too difficult. For example, here's a program that writes out a raw audio file containing just a 400Hz tone.
#include <fstream>
#include <limits>
int main() {
const double pi = 3.1415926535;
double tone_frequency = 400.0;
int samples_per_second = 44100;
double output_duration_seconds = 5.0;
int output_sample_count =
static_cast<int>(output_duration_seconds * samples_per_second);
std::ofstream out("signed-16-bit_mono-channel_44.1kHz-sample-rate.raw",
std::ios::binary);
for (int sample_i = 0; sample_i < output_sample_count; ++sample_i) {
double t = sample_i / static_cast<double>(samples_per_second);
double sound_amplitude = std::sin(t * 2 * pi * tone_frequency);
// encode amplitude as a 16-bit, signed integral value
short sample_value =
static_cast<short>(sound_amplitude * std::numeric_limits<short>::max());
out.write(reinterpret_cast<char const *>(&sample_value),
sizeof sample_value);
}
}
To play the sound you need a program that can handle raw audio, such as Audacity. After running the program to generate the audio file, you can File > Import > Raw data..., to import the data for playing.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
As pointed out earlier, the reason your existing code is not completely zeroing out the data is because you are using an incorrect buffer size: strlen(buffer). The correct size is the number of bytes read() put into the buffer, which you can get with the function gcount():
inFile.read(buffer,256);
int buffer_size = inFile.gcount();
for(int i = 0; i < buffer_size; i++){
buffer[i] = 0;
}
outFile.write(buffer, buffer_size);
Note: if you were to step through your program using a debugger you probably would have pretty quickly seen the problem yourself when you noticed the inner loop executing less than you expected. Debuggers are a really handy tool to learn how to use.
I notice you're using open() and close() methods here. This is sort of pointless in this program. Just open the file in the constructor, and allow the file to be automatically closed when inFile and outFile go out of scope:
{
ifstream inFile("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile("Output.mp3",ios::binary);
// don't bother calling .close(), it happens automatically.
}

Why are std::fstreams so slow?

I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams and FILE* when reading a big blob of data:
#include <stdio.h>
#include <chrono>
#include <fstream>
#include <iostream>
#include <functional>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main(int argc, const char * argv[])
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 123, BUFFER_SIZE);
measure("FILE* write", [buffer]()
{
FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("FILE* read", [buffer]()
{
FILE* file = fopen("test_file_read", "rb");
fread(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("fstream write", [buffer]()
{
std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);
});
measure("fstream read", [buffer]()
{
std::ifstream stream("test_stream_read", std::ios::binary);
stream.read(buffer, BUFFER_SIZE);
});
delete[] buffer;
}
The results of running this code on my machine are:
FILE* write 1388.59 ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream read 3319.82 ms
fstream write/read are about 2 times slower than FILE* write/read! And this while reading a big blob of data, without any parsing or other features of fstreams. I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB 1600 MHz Ram, SSD drive. Please note that running again same code the time for FILE* read is very low (about 200 ms) probably because the file gets cached... This is why the files opened for reading are not created using the code.
Why when reading just a blob of binary data using fstream is so slow compared to FILE*?
EDIT 1: I updated the code and the times. Sorry for the delay!
EDIT 2: I added command line and new results (very similar to previous ones!)
$ clang++ main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9 ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream read 3052.56 ms
Following the results for the second run:
$ ./a.out
FILE* write 1428.98 ms
FILE* read 196.902 ms
fstream write 3343.69 ms
fstream read 2285.93 ms
It looks like the file gets cached when reading for both FILE* and stream as the time reduces with the same amount for both of them.
EDIT 3: I reduced the code to this:
FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);
std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);
And started the profiler. It seems like stream spends lots of time in xsputn function, and the actual write calls have the same duration (as it should be, it's the same function...)
Running Time Self Symbol Name
3266.0ms 66.9% 0,0 std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)
3265.0ms 66.9% 2145,0 std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsputn(char const*, long)
1120.0ms 22.9% 7,0 std::__1::basic_filebuf<char, std::__1::char_traits<char> >::overflow(int)
1112.0ms 22.7% 2,0 fwrite
1127.0ms 23.0% 0,0 fwrite
EDIT 4 For some reason this question is marked as duplicate. I wanted to point out that I don't use printf at all, I use only std::cout to write the time. The files used in the read part are the output from the write part, copied with different name to avoid caching
It would seem that, on Linux, for this large set of data, the implementation of fwrite is much more efficient, since it uses write rather than writev.
I'm not sure WHY writev is so much slower than write, but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream needs to use that construct in this case.
This can easily be seen by using strace ./a.out (where a.out is the program testing this).
Output:
Fstream:
clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{NULL, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824}], 2) = 1073741824
close(3) = 0
clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
write(1, "fstream write 13261.8 ms\n", 25fstream write 13261.8 ms) = 25
FILE*:
clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 1073741824
clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
write(1, "FILE* write 1653.87 ms\n", 23FILE* write 1653.87 ms) = 23
I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.
As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
It is the actual fstream implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.
And the result is pretty much identical for both cases, and faster than both fstream and FILE* variants in the question.
Edit:
It would seem like, on my machine, right now, if you add fclose(file) after the write, it takes approximately the same amount of time for both fstream and FILE* - on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.
I can however write MUCH faster using this code:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
gives times of about 650-900 ms.
I can also edit the original program to give a time of approximately 1000ms for fwrite - simply remove the fclose.
I also added this method:
measure("fstream write (new)", [buffer]()
{
std::ofstream* stream = new std::ofstream("test", std::ios::binary);
stream->write(buffer, BUFFER_SIZE);
// Intentionally no delete.
});
and then it takes about 1000 ms here too.
So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...
TL;DR: Try adding this to your code before doing the writing:
const size_t bufsize = 256*1024;
char buf[bufsize];
mystream.rdbuf()->pubsetbuf(buf, bufsize);
When working with large files with fstream, make sure to use a stream buffer.
Counterintuitively, disabling stream buffering dramatically reduces performance. At least the MSVC implementation copies 1 char at a time to the filebuf when no buffer was set (see streambuf::xsputn()), which can make your application CPU-bound, which will result in lower I/O rates.
NB: You can find a complete sample application here.
A side note for whom interests.
The main keywords are Windows 2016 server /CloseHandle.
In our app we discovered a NASTY bug on win2016 server.
Our std code under EVERY windows version takes: (ms)
time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 0
on windows 2016 we got:
time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 275
And times grows with dimension of file, that is ABSURD.
After a LOT of investigations (we first found "CloseHandle" is the culprit...) we discovered that under windows2016 MS attached an "hook" in close function that triggers "Windows Defender" to scan ALL the file and prevents returning until done. (in other words scanning is synchronous, that is PURE MADNESS).
When we added exclusion in "Defender" for our file, all works fine.
I think is a BAD design, no antivirus stops normal file active INSIDE program space to scan files. (MS can do it as they have the power to do so.)
In contrary to other answers, a big issue with large file reads comes from buffering by the C standard library. Try using low level read/write calls in large chunks (1024KB) and see the performance jump.
File buffering by the C library is useful for reading or writing small chunks of data (smaller than disk block size).
On Windows I got almost a 3x performance boost dropping file buffering when reading and writing raw video streams.
I also opened the file using native OS (win32) API calls and told the OS not to cache the file as this involves yet another copy.
The stream is somehow broken on the MAC, old implementation or setup.
An old setup could cause the FILE to be written in the exe directory and the stream in the user directory, this shouldn't make any difference unless you got 2 disks or other different setting.
On my lousy Vista I get
Normal buffer+Uncached:
C++ 201103
FILE* write 4756 ms
FILE* read 5007 ms
fstream write 5526 ms
fstream read 5728 ms
Normal buffer+Cached:
C++ 201103
FILE* write 4747 ms
FILE* read 454 ms
fstream write 5490 ms
fstream read 396 ms
Large Buffer+cached:
C++ 201103
5th run:
FILE* write 4760 ms
FILE* read 446 ms
fstream write 5278 ms
fstream read 369 ms
This shows that the FILE write is faster than the fstream, but slower in read than fstream ... but all numbers are within ~10% of each other.
Try adding some more buffering to your stream to see if that helps.
const int MySize = 1024*1024;
char MrBuf[MySize];
stream.rdbuf()->pubsetbuf(MrBuf, MySize);
The equivalent for FILE is
const int MySize = 1024*1024;
if (!setvbuf ( file , NULL , _IOFBF , MySize ))
DieInDisgrace();

Read file to memory, loop through data, then write file [duplicate]

This question already has answers here:
How to read line by line after i read a text into a buffer?
(4 answers)
Closed 10 years ago.
I'm trying to ask a similar question to this post:
C: read binary file to memory, alter buffer, write buffer to file
but the answers didn't help me (I'm new to c++ so I couldn't understand all of it)
How do I have a loop access the data in memory, and go through line by line so that I can write it to a file in a different format?
This is what I have:
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
using namespace std;
int main()
{
char* buffer;
char linearray[250];
int lineposition;
double filesize;
string linedata;
string a;
//obtain the file
FILE *inputfile;
inputfile = fopen("S050508-v3.txt", "r");
//find the filesize
fseek(inputfile, 0, SEEK_END);
filesize = ftell(inputfile);
rewind(inputfile);
//load the file into memory
buffer = (char*) malloc (sizeof(char)*filesize); //allocate mem
fread (buffer,filesize,1,inputfile); //read the file to the memory
fclose(inputfile);
//Check to see if file is correct in Memory
cout.write(buffer,filesize);
free(buffer);
}
I appreciate any help!
Edit (More info on the data):
My data is different files that vary between 5 and 10gb. There are about 300 million lines of data. Each line looks like
M359
T359 3520 359
M400
A3592 zng 392
Where the first element is a character, and the remaining items could be numbers or characters. I'm trying to read this into memory since it will be a lot faster to loop through line by line, than reading a line, processing, and then writing. I am compiling in 64bit linux. Let me know if I need to clarify further. Again thank you.
Edit 2
I am using a switch statement to process each line, where the first character of each line determines how to format the rest of the line. For example 'M' means millisecond, and I put the next three numbers into a structure. Each line has a different first character that I need to do something different for.
So pardon the potentially blatantly obvious, but if you want to process this line by line, then...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
// read lines one at a time
ifstream inf("S050508-v3.txt");
string line;
while (getline(inf, line))
{
// ... process line ...
}
inf.close();
return 0;
}
And just fill in the body of the while loop? Maybe I'm not seeing the real problem (a forest for the trees kinda thing).
EDIT
The OP is inline with using a custom streambuf which may not necessarily be the most portable thing in the world, but he's more interested in avoiding flipping back and forh between input and output files. With enough RAM, this should do the trick.
#include <iostream>
#include <fstream>
#include <iterator>
#include <memory>
using namespace std;
struct membuf : public std::streambuf
{
membuf(size_t len)
: streambuf()
, len(len)
, src(new char[ len ] )
{
setg(src.get(), src.get(), src.get() + len);
}
// direct buffer access for file load.
char * get() { return src.get(); };
size_t size() const { return len; };
private:
std::unique_ptr<char> src;
size_t len;
};
int main(int argc, char *argv[])
{
// open file in binary, retrieve length-by-end-seek
ifstream inf(argv[1], ios::in|ios::binary);
inf.seekg(0,inf.end);
size_t len = inf.tellg();
inf.seekg(0, inf.beg);
// allocate a steam buffer with an internal block
// large enough to hold the entire file.
membuf mb(len+1);
// use our membuf buffer for our file read-op.
inf.read(mb.get(), len);
mb.get()[len] = 0;
// use iss for your nefarious purposes
std::istream iss(&mb);
std::string s;
while (iss >> s)
cout << s << endl;
return EXIT_SUCCESS;
}
You should look into fgets and scanf, in which you can pull out matched pieces of data so it is easier to manipulate, assuming that is what you want to do. Something like this could look like:
FILE *input = fopen("file.txt", "r");
FILE *output = fopen("out.txt","w");
int bufferSize = 64;
char buffer[bufferSize];
while(fgets(buffer,bufferSize,input) != EOF){
char data[16];
sscanf(buffer,"regex",data);
//manipulate data
fprintf(output,"%s",data);
}
fclose(output);
fclose(input);
That would be more of the C way to do it, C++ handles things a little more eloquently by using an istream:
http://www.cplusplus.com/reference/istream/istream/
If I had to do this, I'd probably use code something like this:
std::ifstream in("S050508-v3.txt");
std::istringstream buffer;
buffer << in.rdbuf();
std::string data = buffer.str();
if (check_for_good_data(data))
std::cout << data;
This assumes you really need the entire contents of the input file in memory at once to determine whether it should be copied to output or not. If (for example) you can look at the data one byte at a time, and determine whether that byte should be copied without looking at the others, you could do something more like:
std::ifstream in(...);
std::copy_if(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostream_iterator<char>(std::cout, ""),
is_good_char);
...where is_good_char is a function that returns a bool saying whether that char should be included in the output or not.
Edit: the size of files you're dealing with mostly rules out the first possibility I've given above. You're also correct that reading and writing large chunks of data will almost certainly improve speed over working on one line at a time.

fseek now supports large files

It appears that fseek now, at least in my implementation, supports large files naturally without fseek64, lseek or some strange compiler macro.
When did this happen?
#include <cstdio>
#include <cstdlib>
void writeF(const char*fname,size_t nItems){
FILE *fp=NULL;
if(NULL==(fp=fopen(fname,"w"))){
fprintf(stderr,"\t-> problems opening file:%s\n",fname);
exit(0);
}
for(size_t i=0;i<nItems;i++)
fwrite(&i,sizeof(size_t),1,fp);
fclose(fp);
}
void getIt(const char *fname,size_t offset,int whence,int nItems){
size_t ary[nItems];
FILE *fp = fopen(fname,"r");
fseek(fp,offset*sizeof(size_t),whence);
fread(ary,sizeof(size_t),nItems,fp);
for(int i=0;i<nItems;i++)
fprintf(stderr,"%lu\n",ary[i]);
fclose(fp);
}
int main(){
const char * fname = "temp.bin";
writeF(fname,1000000000);//writefile
getIt(fname,999999990,SEEK_SET,10);//get last 10 seek from start
getIt(fname,-10,SEEK_END,10);//get last 10 seek from start
return 0;
}
The code above writes a big file with the entries 1-10^9 in binary size_t format.
And then writes the last 10 entries, seeking from the beginning of the file, and seek from the end of file.
Linux x86-64 has had large file support (LFS) from pretty much day one; and doesn't require any special macros etc to enable it - both traditional fseek()) and LFS fseek64() already use a 64bit off_t.
Linux i386 (32bit) typically defaults to 32-bit off_t as otherwise it would break a huge number of applications - but you can test what is defined in your environment by checking the value of the _FILE_OFFSET_BITS macro.
See http://www.suse.de/~aj/linux_lfs.html for full details on Linux large file support.
The signature is
int fseek ( FILE * stream, long int offset, int origin );
so the range depends on the size of long.
On some systems it is 32-bit, and you have a problem with large files, on other systems it is 64-bit.
999999990 is a normal int and fits perfectly into 32 bits. I don't believe that you'd get away with this though:
getIt(fname,99999999990LL,SEEK_SET,10);