I have a buffer of size 42131221 bytes (42.1MB) that I used to store some compressed data. Only the first 20MB actually store the compressed data and I am trying to write this to a file using fwrite:
fwrite (buffer , WHAT_GOES_HERE, buffer_length, pFile);
The second parameter requires the size of each element to write, but this isn't applicable in this case as I just want to write the whole buffer and since it's compressed, it isn't the case that there exists a size of each element.
Any idea on how I can get this to work?
WHAT_GOES_HERE should be sizeof(type of the buffer). Also, the buffer_length should be the number of "types" you want to write to the file. I mention this since it seems you do not want to write the entire buffer but only the for 20MB.
fwrite works on streams, which are buffered.
write is a lower-level API based on file descriptors. It doesn't know about buffering.
Rule of thumb is,
If you want to write a single large buffer, go for write.
You use fwrite if you want to write in smaller chunks.
so, you could go for write here.
Related
Although i have read about buffer and stream and it's working with files in c++ but i don't know what is the need of a buffer if a stream is there, stream is always there to transfer the data of one file to the program. So why do we use buffers to store data(performing same task that stream does) and what are buffered and unbuffered stream.
Consider a stream that writes to a file. If there were no buffer, if your program wrote a single byte to the stream, you'd have to write a single byte to the file. That's very inefficient. So streams have buffers to decouple operations one on side of the stream from operations on the other side of the stream.
Ok lets lets start from the scratch suppose you want to work with files. For this purpose you would have to manage how the data is entered into your file and if the sending of data into the file was successful or not, and all other basic working problems. Now either you can manage all that on your own which would take a lots a time and hard work or What you can do is you can use a stream.
Yes, you can allocate a stream for such purposes. Streams work with abstraction mechanism i.e. we c++ programmers don't know how they are working but we only know that we are at the one side of a stream (on our program's side) we offer our data to the stream and it has the responsibility to transfer data from one end to the other (file's side)
Eg-
ofstream file("abc.txt"); //Here an object of output file stream is created
file<<"Hello"; //We are just giving our data to stream and it transfers that
file.close(); //The closing of file
Now if you work with files you should know that working with files is a really expensive operation i.e. it takes more time to access file than to access memory and we also don't have to perform file operations every time. So programmers created a new feature called buffer which is a part of computer's memory and stores data temporarily for handling files.
Suppose at the place of reading file every time to read data you just read some memory location where all the data of file is copied temporarily.Now it would be a less expensive task as you are reading memory not file.
Those streams that have a buffer for their working i.e. they open the file and by default copy all the data of file to the buffer are called as buffered streams whereas those streams which don't use any buffer are called as unbuffered streams.
Now if you enter data to a buffered stream then that data will be queued up until the stream is not flushed (flushing means replacing the data of buffer with that of file). Unbuffered streams are faster in working (from the point of user at one end of the stream) as data is not temporarily stored into a buffer and is sent to the file as it comes to the stream.
A buffer and a stream are different concepts.
A buffer is a part of the memory to temporarily store data. It can be implemented and structured in various ways. For instance, if one wants to read a very large file, chunks of the file can be read and stored in the buffer. Once a certain chunk is processed the data can be discarded and the next chunk can be read. A chunk in this case could be a line of the file.
Streams are the way C++ handles input and output. Their implementation uses buffers.
I do agree that stream is probably the poorest written and the most badly udnerstood part of standard library. People use it every day and many of them have not a slightest clue how the constructs they use work. For a little fun, try asking what is std::endl around - you might find that some answers are funny.
On any rate, streams and streambufs have different responsibilities. Streams are supposed to provide formatted input and output - that is, translate an integer to a sequence of bytes (or the other way around), and buffers are responsible of conveying the sequence of bytes to the media.
Unfortunately, this design is not clear from the implementation. For instance, we have all those numerous streams - file stream and string stream for example - while the only difference between those are the buffer. The stream code remains exactly the same. I believe, many people would redesign streams if they had their way, but I am afraid, this is not going to happen.
I have some code that looks approximately like this:
boost::iostreams::filtering_istreambuf in;
in.push(Lz4DecompressionFilter());
in.push(AesDecryptionFilter());
in.push(file_source("somefile"));
I already have meta-data that stores the length of the result:
std::vector<char> buf;
buf.reserve(resultLength /* retrieved from a meta-data server */);
std::streamsize ret = in.read(buf, buf.capacity);
By adding trace-points, I observed that the Lz4 and Aes filter only get reads of 128 bytes. Also, if I replace file_source with a custom device, it only gets reads of 4096 bytes.
Since I know exactly the size the reads should have, is there a way to disable buffering in iostreams entirely and just chain the read down the filter? I know I can change the buffer sizes, but I am interested in completely disabling them.
Standard streams by definition use a buffer abstraction. This is largely because some of the functions exposed necessitate the presence of a buffer (peek/putback).
How would compression and encryption still function without buffering? Compression and block ciphers both require operating on (sometimes even fixed-size) chunks.
Re:
Also, if I replace file_source with a custom device, it only gets reads of 4096 bytes.
What behaviour would you have expected instead? Do you expect infinite size reads?
Using blocks of >4k is highly unusual in stream-oriented processing. In that case, did just just want to copy all input into one large buffer first (perhaps using array_sink...)?
Really, it looks like you just wanted to increase the buffer size yes.
I need to sequentially read a file in C++, dealing with 4 characters at a time (but it's a sliding window, so the next character is handled along with the 3 before it). I could read chunks of the file into a buffer (I know mmap() will be more efficient but I want to stick to platform-independent plain C++), or I could read the file a character at a time using std::cin.read(). The file could be arbitrary large, so reading the whole file is not an option.
Which approach is more efficient?
The most efficient method is to read a lot of data into memory using the fewest function calls or requests.
The objective is to keep the hard drive spinning. One of the bottlenecks is waiting for the hard drive to spin to correct speed. Another is trying to locate the sectors on the hard drive where your requested data lives. A third bottleneck is collisions with the database and memory.
So I vote for the read method into a buffer and search the buffer.
Determine what the largest chunk of data you can read at a time. Then read the file by the chunks.
Say you can only deal with 2K characters at a time. Then, use:
std::ifstream if(filename);
char chunk[2048];
while ( if.read(chunk, 2048)) )
{
std::streamsize nread = in.gcount();
// Process nread number of characters of the chunk.
}
I am writing a large binary output buffer through ofstream::write(). Since I know the size of the output file, but sometimes have to write it in chunks, I thought it would be a good idea to call fallocate() (or posix_fallocate()) first to preallocate the buffer on disk. Those do, however, require a file descriptor, which ofstream does not provide me with.
Is there an ofstream interface for calling fallocate(), or possibly to get the underlying file descriptor so that I can call it myself? (Or is it not worth the bother?)
Since you going to write in chunks use fwrite
also see http://www.cplusplus.com/reference/cstdio/setvbuf/ to control buffer size
to optimize you can have buffer size = N * chunk size
I'm trying to find out what is the best way to read large text (at least 5 mb) files in C++, considering speed and efficiency. Any preferred class or function to use and why?
By the way, I'm running on specifically on UNIX environment.
The stream classes (ifstream) actually do a good job; assuming you're not restricted otherwise make sure to turn off sync_with_stdio (in ios_base::). You can use getline() to read directly into std::strings, though from a performance perspective using a fixed buffer as a char* (vector of chars or old-school char[]) may be faster (at a higher risk/complexity).
You can go the mmap route if you're willing to play games with page size calculations and the like. I'd probably build it out first using the stream classes and see if it's good enough.
Depending on what you're doing with each line of data, you might start finding your processing routines are the optimization point and not the I/O.
Use old style file io.
fopen the file for binary read
fseek to the end of the file
ftell to find out how many bytes are in the file.
malloc a chunk of memory to hold all of the bytes + 1
set the extra byte at the end of the buffer to NUL.
fread the entire file into memory.
create a vector of const char *
push_back the address of the first byte into the vector.
repeatedly
strstr - search the memory block for the carriage control character(s).
put a NUL at the found position
move past the carriage control characters
push_back that address into the vector
until all of the text in the buffer has been processed.
----------------
use the vector to find the strings,
and process as needed.
when done, delete the memory block
and the vector should self-destruct.
If you are using text file storing integers, floats and small strings, my experience is that FILE, fopen, fscanf are already fast enough and also you can get the numbers directly. I think memory mapping is the fastest, but it requires you to write code to parse the file, which needs extra work.