Reading a binary file incrementally - c++

I recently made another question about parsing binary file, and I sort of got it to work thanks to everyone here.
https://stackoverflow.com/questions/37755225/need-help-reading-binary-file-to-a-structure?noredirect=1#comment62983158_37755225
But I now face a new challenge and I am in need of help.
My binary file looks something like this but much much longer...
ST.........¸.°Ý.ø...0.œ...........ESZ4 1975..........IYH.testDBDBDBDBST...........°Ý.ø...................DBDBDBDBST.........P.´Ý.ø...0.œ...........ESZ4 1975..........HTC.testDBDBDBDBST.........‹‚´Ý.ø...................DBDBDBDBST.........ƒD.Þ.ø...0.œ...........ESZ4 1975..........ARM.testDBDBDBDBST.........«E.Þ.ø...................DBDBDBDB
Basically, every message starts with 'ST' and ends with 'DBDBDBDB'. The goal is to parse each message and store the message into a data structure. In addition, every message is different depending on the type, and different type of message will have additional members.
The problem I am having is, I have no idea how to iterate through this binary file... If its a regular file, I can just do while(getline(file,s)), but what about binary file?? Is there a way to say, find the first "ST" and "DBDBDBDB" and parse the middle stuff, then move on to the next set of ST and DB? Or somehow read the file incrementally keeping track of where I am?
I apologise ahead of time for posting so much code.
#pragma pack(push, 1)
struct Header
{
uint16_t marker;
uint8_t msg_type;
uint64_t sequence_id;
uint64_t timestamp;
uint8_t msg_direction;
uint16_t msg_len;
};
#pragma pack(pop)
struct OrderEntryMessage
{
Header header;
uint64_t price;
uint32_t qty;
char instrument[10];
uint8_t side;
uint64_t client_assigned_id;
uint8_t time_in_force;
char trader_tag[3];
uint8_t firm_id;
char firm[256] ;
char termination_string[8];
};
struct AcknowledgementMessage
{
Header header;
uint32_t order_id;
uint64_t client_id;
uint8_t order_status;
uint8_t reject_code;
char termination_string[8];
};
struct OrderFillMessage
{
Header header;
uint32_t order_id;
uint64_t fill_price;
uint32_t fill_qty;
uint8_t no_of_contras;
uint8_t firm_id;
char trader_tag[3];
uint32_t qty;
char termination_string[8];
};
void TradeDecoder::createMessage()
{
ifstream file("example_data_file.bin", std::ios::binary);
//I want to somehow Loop here to keep looking for headers ST
Header h;
file.read ((char*)&h.marker, sizeof(h.marker));
file.read ((char*)&h.msg_type, sizeof(h.msg_type));
file.read ((char*)&h.sequence_id, sizeof(h.sequence_id));
file.read ((char*)&h.timestamp, sizeof(h.timestamp));
file.read ((char*)&h.msg_direction, sizeof(h.msg_direction));
file.read ((char*)&h.msg_len, sizeof(h.msg_len));
file.close();
switch(h.sequence_id)
{
case 1:
createOrderEntryMessage(h); //this methods creates a OrderEntryMessage with the header
break;
case 2:
createOrderAckMessage(h); //same as above
break;
case 3:
createOrderFillMessage(h); //same as above
break;
default:
break;
}
}
Much much thanks.....

You can read the whole file to the buffer and then parse the buffer according to your requirements.
I would use
fread
to read the whole file to the buffer and then process/parse the buffer byte by byte.
This is an example:
/* fread - read an entire file to a buffer */
#include <stdio.h>
#include <stdlib.h>
int main () {
FILE * pFile;
long lSize;
char * buffer;
size_t result;
pFile = fopen ( "myfile.bin" , "rb" );
if (pFile==NULL)
{fputs ("File error",stderr); exit (-1);}
// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);
// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);
if (buffer == NULL) // malloc failed
{fputs ("Memory error",stderr); exit (-2);}
// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);
if (result != lSize)
{fputs ("Reading error",stderr); exit (-3);}
// the whole file is now loaded in the memory buffer.
// you can process the whole buffer now:
// for (int i=0; i<lSize;i++)
// {
// processBufferByteByByte(buffer[i]);
// }
// or
// processBuffer(buffer,lSize);
// terminate
fclose (pFile);
free (buffer);
return 0;
}

Related

read from binary file and store to a buffer

Can somebody tell if this is correct?
I try to read from binary file line by line and store it in a buffer? does the new line that it stores in the buffer delete the previous stored line?
ifs.open(filename, std::ios::binary);
for (std::string line; getline(ifs, line,' '); )
{
ifs.read(reinterpret_cast<char *> (buffer), 3*h*w);
}
For some reason you are mixing getline which is text-based reading, and read(), which is binary reading.
Also, it's completely unclear, what is buffer and what's it size. So, here is a simple example for you to start:
ifs.open(filename, std::ios::binary); // assume, that everything is OK
constexpr size_t bufSize = 256;
char buffer[bufSize];
size_t charsRead{ 0 };
do {
charsRead = ifs.read(buffer, bufSize)
// check if charsRead == 0, if it's ok
// do something with filled buffer.
// Note, that last read will have less than bufSize characters,
// So, query charsRead each time.
} while (charsRead == bufSize);

Read last X bytes of a file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Could anyone tell me a simple way, how to read the last X bytes of a specific file?
If I'm right, I should use ifstream, but I'm not sure how to use it. Currently I'm learning C++ ( at least I'm trying to learn :) ).
Input file streams have the seekg() method that repositions the current position to an absolute position or a relative position. One overload takes a positon type that represents an absolute value. The other takes an offset type and direction mask that determines the relative position to move to. Negating the offset allows you to move backward. Specifying the end constant moves the indicator relative to the end.
file.seekg(-x, std::ios_base::end);
This is a C solution, but works and handles errors. The trick is to use a negative index in fseek to "seek from EOF" (ie: seek from the "right").
#include <stdio.h>
#define BUF_SIZE (4096)
int main(void) {
int i;
const char* fileName = "test.raw";
char buf[BUF_SIZE] = { 0 };
int bytesRead = 0;
FILE* fp; /* handle for the input file */
size_t fileSize; /* size of the input file */
int lastXBytes = 100; /* number of bytes at the end-of-file to read */
/* open file as a binary file in read-only mode */
if ((fp = fopen("./test.txt", "rb")) == NULL) {
printf("Could not open input file; Aborting\n");
return 1;
}
/* find out the size of the file; reset pointer to beginning of file */
fseek(fp, 0L, SEEK_END);
fileSize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
/* make sure the file is big enough to read lastXBytes of data */
if (fileSize < lastXBytes) {
printf("File too small; Aborting\n");
fclose(fp);
return 1;
} else {
/* read lastXBytes of file */
fseek(fp, -lastXBytes, SEEK_END);
bytesRead = fread(buf, sizeof(char), lastXBytes, fp);
printf("Read %d bytes from %s, expected %d\n", bytesRead, fileName, lastXBytes);
if (bytesRead > 0) {
for (i=0; i<bytesRead; i++) {
printf("%c", buf[i]);
}
}
}
fclose(fp);
return 0;
}
You need to use he seekg function and pass a negative offset from the end of the stream.
std::ifstream is("file.txt");
if (is)
{
is.seekg(-x, is.end); // x is the number of bytes to read before the end
}
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv)
{
ifstream ifs("F:\\test.data", ifstream::binary);
if(ifs.fail())
{
cout << "Error:fail to open file" << endl;
return -1;
}
//read the last 10 bits of file
const int X = 10;
char* buf = new char[X];
ifs.seekg(-X, SEEK_END);
ifs.read(buf, X);
ifs.close();
delete buf;
return 0;
}
Use seekg() for relative positionning from end of file, then use read():
ifstream ifs("test.txt");
int x=10;
char buffer[11]={};
ifs.seekg(-x, ios_base::end);
if (!ifs.read(buffer, x))
cerr << "There's a problem !\n";
else cout <<buffer<<endl;
Note that read() just takes the x bytes from the file and puts them in the buffer, without adding a '\0' at the end. So if you expect a C string, you have to make sure that your buffer ends with a 0.

Reading and printing an entire file in binary mode using C++

a follow up to my previous question (Reading an entire file in binary mode using C++)
After reading a jpg file in binary mode, the result of the read operation is always 4 bytes. The code is:
FILE *fd = fopen("c:\\Temp\\img.jpg", "rb");
if(fd == NULL) {
cerr << "Error opening file\n";
return;
}
fseek(fd, 0, SEEK_END);
long fileSize = ftell(fd);
int *stream = (int *)malloc(fileSize);
fseek(fd, 0, SEEK_SET);
int bytes_read = fread(stream, fileSize, 1, fd);
printf("%x\n", *stream);
fclose(fd);
The second last printf statement is always printing the first 4 bytes and not the entire file contents. How can I print the entire content of the jpg file?
Thanks.
You want it in C++? This opens a file, reads the entire contents into an array and prints the output to the screen:
#include <fstream>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
void hexdump(void *ptr, int buflen)
{
unsigned char *buf = (unsigned char*)ptr;
int i, j;
for (i=0; i<buflen; i+=16) {
printf("%06x: ", i);
for (j=0; j<16; j++) {
if (i+j < buflen)
printf("%02x ", buf[i+j]);
else
printf(" ");
}
printf(" ");
for (j=0; j<16; j++) {
if (i+j < buflen)
printf("%c", isprint(buf[i+j]) ? buf[i+j] : '.');
}
printf("\n");
}
}
int main()
{
ifstream in;
in.open("C:\\ISO\\ITCHOUT.txt", ios::in | ios::binary);
if(in.is_open())
{
// get the starting position
streampos start = in.tellg();
// go to the end
in.seekg(0, std::ios::end);
// get the ending position
streampos end = in.tellg();
// go back to the start
in.seekg(0, std::ios::beg);
// create a vector to hold the data that
// is resized to the total size of the file
std::vector<char> contents;
contents.resize(static_cast<size_t>(end - start));
// read it in
in.read(&contents[0], contents.size());
// print it out (for clarity)
hexdump(contents.data(), contents.size());
}
}
stream is a pointer to an int (the first element of the array you allocated1). *stream dereferences that pointer and gives you the first int.
A pointer is not an array. A pointer is not a buffer. Therefore, it carries no information about the size of the array it points to. There is no way you can print the entire array by providing only a pointer to the first element.
Whatever method you use to print that out, you'll need to provide the size information along with the pointer.
C++ happens to have a pointer + size package in its standard library: std::vector. I would recommend using that. Alternatively, you can just loop through the array yourself (which means using the size information) and print all its elements.
1Make sure the size of the file is a multiple of sizeof(int)!
Something like the following should do it. bytes_read() gives you the number of blocks read, in your case the block size is the file size so only one block can be read.
You should use a for loop to print the whole file. You're only printing one pointer address.
char *stream = (char *)malloc(fileSize);
fseek(fd, 0, SEEK_SET);
int bytes_read = fread(stream, fileSize, 1, fd);
for(int i=0; i<fileSize; i++){
printf("%d ", stream[i]);
}
I print the chars as numbers as binary data is not readable in the console. I don't know how you wanted the data to be formatted.
This is just meant as reference to your sample. You should really consider using Chad's sample. This is a far worse solution (as mixing C/C++ far too much) just for sake of completeness.

How to compress a buffer with zlib?

There is a usage example at the zlib website: http://www.zlib.net/zlib_how.html
However in the example they are compressing a file. I would like to compress a binary data stored in a buffer in memory. I don't want to save the compressed buffer to disk either.
Basically here is my buffer:
fIplImageHeader->imageData = (char*)imageIn->getFrame();
How can I compress it with zlib?
I would appreciate some code example of how to do that.
zlib.h has all the functions you need: compress (or compress2) and uncompress. See the source code of zlib for an answer.
ZEXTERN int ZEXPORT compress OF((Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen));
/*
Compresses the source buffer into the destination buffer. sourceLen is
the byte length of the source buffer. Upon entry, destLen is the total size
of the destination buffer, which must be at least the value returned by
compressBound(sourceLen). Upon exit, destLen is the actual size of the
compressed buffer.
compress returns Z_OK if success, Z_MEM_ERROR if there was not
enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer.
*/
ZEXTERN int ZEXPORT uncompress OF((Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen));
/*
Decompresses the source buffer into the destination buffer. sourceLen is
the byte length of the source buffer. Upon entry, destLen is the total size
of the destination buffer, which must be large enough to hold the entire
uncompressed data. (The size of the uncompressed data must have been saved
previously by the compressor and transmitted to the decompressor by some
mechanism outside the scope of this compression library.) Upon exit, destLen
is the actual size of the uncompressed buffer.
uncompress returns Z_OK if success, Z_MEM_ERROR if there was not
enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete. In
the case where there is not enough room, uncompress() will fill the output
buffer with the uncompressed data up to that point.
*/
This is an example to pack a buffer with zlib and save the compressed contents in a vector.
void compress_memory(void *in_data, size_t in_data_size, std::vector<uint8_t> &out_data)
{
std::vector<uint8_t> buffer;
const size_t BUFSIZE = 128 * 1024;
uint8_t temp_buffer[BUFSIZE];
z_stream strm;
strm.zalloc = 0;
strm.zfree = 0;
strm.next_in = reinterpret_cast<uint8_t *>(in_data);
strm.avail_in = in_data_size;
strm.next_out = temp_buffer;
strm.avail_out = BUFSIZE;
deflateInit(&strm, Z_BEST_COMPRESSION);
while (strm.avail_in != 0)
{
int res = deflate(&strm, Z_NO_FLUSH);
assert(res == Z_OK);
if (strm.avail_out == 0)
{
buffer.insert(buffer.end(), temp_buffer, temp_buffer + BUFSIZE);
strm.next_out = temp_buffer;
strm.avail_out = BUFSIZE;
}
}
int deflate_res = Z_OK;
while (deflate_res == Z_OK)
{
if (strm.avail_out == 0)
{
buffer.insert(buffer.end(), temp_buffer, temp_buffer + BUFSIZE);
strm.next_out = temp_buffer;
strm.avail_out = BUFSIZE;
}
deflate_res = deflate(&strm, Z_FINISH);
}
assert(deflate_res == Z_STREAM_END);
buffer.insert(buffer.end(), temp_buffer, temp_buffer + BUFSIZE - strm.avail_out);
deflateEnd(&strm);
out_data.swap(buffer);
}
You can easily adapt the example by replacing fread() and fwrite() calls with direct pointers to your data. For zlib compression (referred to as deflate as you "take out all the air of your data") you allocate z_stream structure, call deflateInit() and then:
fill next_in with the next chunk of data you want to compress
set avail_in to the number of bytes available in next_in
set next_out to where the compressed data should be written which should usually be a pointer inside your buffer that advances as you go along
set avail_out to the number of bytes available in next_out
call deflate
repeat steps 3-5 until avail_out is non-zero (i.e. there's more room in the output buffer than zlib needs - no more data to write)
repeat steps 1-6 while you have data to compress
Eventually you call deflateEnd() and you're done.
You're basically feeding it chunks of input and output until you're out of input and it is out of output.
The classic way more convenient with C++ features
Here's a full example which demonstrates compression and decompression using C++ std::vector objects:
#include <cstdio>
#include <iosfwd>
#include <iostream>
#include <vector>
#include <zconf.h>
#include <zlib.h>
#include <iomanip>
#include <cassert>
void add_buffer_to_vector(std::vector<char> &vector, const char *buffer, uLongf length) {
for (int character_index = 0; character_index < length; character_index++) {
char current_character = buffer[character_index];
vector.push_back(current_character);
}
}
int compress_vector(std::vector<char> source, std::vector<char> &destination) {
unsigned long source_length = source.size();
uLongf destination_length = compressBound(source_length);
char *destination_data = (char *) malloc(destination_length);
if (destination_data == nullptr) {
return Z_MEM_ERROR;
}
Bytef *source_data = (Bytef *) source.data();
int return_value = compress2((Bytef *) destination_data, &destination_length, source_data, source_length,
Z_BEST_COMPRESSION);
add_buffer_to_vector(destination, destination_data, destination_length);
free(destination_data);
return return_value;
}
int decompress_vector(std::vector<char> source, std::vector<char> &destination) {
unsigned long source_length = source.size();
uLongf destination_length = compressBound(source_length);
char *destination_data = (char *) malloc(destination_length);
if (destination_data == nullptr) {
return Z_MEM_ERROR;
}
Bytef *source_data = (Bytef *) source.data();
int return_value = uncompress((Bytef *) destination_data, &destination_length, source_data, source.size());
add_buffer_to_vector(destination, destination_data, destination_length);
free(destination_data);
return return_value;
}
void add_string_to_vector(std::vector<char> &uncompressed_data,
const char *my_string) {
int character_index = 0;
while (true) {
char current_character = my_string[character_index];
uncompressed_data.push_back(current_character);
if (current_character == '\00') {
break;
}
character_index++;
}
}
// https://stackoverflow.com/a/27173017/3764804
void print_bytes(std::ostream &stream, const unsigned char *data, size_t data_length, bool format = true) {
stream << std::setfill('0');
for (size_t data_index = 0; data_index < data_length; ++data_index) {
stream << std::hex << std::setw(2) << (int) data[data_index];
if (format) {
stream << (((data_index + 1) % 16 == 0) ? "\n" : " ");
}
}
stream << std::endl;
}
void test_compression() {
std::vector<char> uncompressed(0);
auto *my_string = (char *) "Hello, world!";
add_string_to_vector(uncompressed, my_string);
std::vector<char> compressed(0);
int compression_result = compress_vector(uncompressed, compressed);
assert(compression_result == F_OK);
std::vector<char> decompressed(0);
int decompression_result = decompress_vector(compressed, decompressed);
assert(decompression_result == F_OK);
printf("Uncompressed: %s\n", uncompressed.data());
printf("Compressed: ");
std::ostream &standard_output = std::cout;
print_bytes(standard_output, (const unsigned char *) compressed.data(), compressed.size(), false);
printf("Decompressed: %s\n", decompressed.data());
}
In your main.cpp simply call:
int main(int argc, char *argv[]) {
test_compression();
return EXIT_SUCCESS;
}
The output produced:
Uncompressed: Hello, world!
Compressed: 78daf348cdc9c9d75128cf2fca495164000024e8048a
Decompressed: Hello, world!
The Boost way
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/zlib.hpp>
std::string compress(const std::string &data) {
boost::iostreams::filtering_streambuf<boost::iostreams::output> output_stream;
output_stream.push(boost::iostreams::zlib_compressor());
std::stringstream string_stream;
output_stream.push(string_stream);
boost::iostreams::copy(boost::iostreams::basic_array_source<char>(data.c_str(),
data.size()), output_stream);
return string_stream.str();
}
std::string decompress(const std::string &cipher_text) {
std::stringstream string_stream;
string_stream << cipher_text;
boost::iostreams::filtering_streambuf<boost::iostreams::input> input_stream;
input_stream.push(boost::iostreams::zlib_decompressor());
input_stream.push(string_stream);
std::stringstream unpacked_text;
boost::iostreams::copy(input_stream, unpacked_text);
return unpacked_text.str();
}
TEST_CASE("zlib") {
std::string plain_text = "Hello, world!";
const auto cipher_text = compress(plain_text);
const auto decompressed_plain_text = decompress(cipher_text);
REQUIRE(plain_text == decompressed_plain_text);
}
This is not a direct answer on your question about the zlib API, but you may be interested in boost::iostreams library paired with zlib.
This allows to use zlib-driven packing algorithms using the basic "stream" operations notation and then your data could be easily compressed by opening some memory stream and doing the << data operation on it.
In case of boost::iostreams this would automatically invoke the corresponding packing filter for every data that passes through the stream.

How do you determine the amount of Linux system RAM in C++?

I just wrote the following C++ function to programmatically determine how much RAM a system has installed. It works, but it seems to me that there should be a simpler way to do this. Am I missing something?
getRAM()
{
FILE* stream = popen("head -n1 /proc/meminfo", "r");
std::ostringstream output;
int bufsize = 128;
while( !feof(stream) && !ferror(stream))
{
char buf[bufsize];
int bytesRead = fread(buf, 1, bufsize, stream);
output.write(buf, bytesRead);
}
std::string result = output.str();
std::string label, ram;
std::istringstream iss(result);
iss >> label;
iss >> ram;
return ram;
}
First, I'm using popen("head -n1 /proc/meminfo") to get the first line of the meminfo file from the system. The output of that command looks like
MemTotal: 775280 kB
Once I've got that output in an istringstream, it's simple to tokenize it to get at the information I want. Is there a simpler way to read in the output of this command? Is there a standard C++ library call to read in the amount of system RAM?
On Linux, you can use the function sysinfo which sets values in the following struct:
#include <sys/sysinfo.h>
int sysinfo(struct sysinfo *info);
struct sysinfo {
long uptime; /* Seconds since boot */
unsigned long loads[3]; /* 1, 5, and 15 minute load averages */
unsigned long totalram; /* Total usable main memory size */
unsigned long freeram; /* Available memory size */
unsigned long sharedram; /* Amount of shared memory */
unsigned long bufferram; /* Memory used by buffers */
unsigned long totalswap; /* Total swap space size */
unsigned long freeswap; /* swap space still available */
unsigned short procs; /* Number of current processes */
unsigned long totalhigh; /* Total high memory size */
unsigned long freehigh; /* Available high memory size */
unsigned int mem_unit; /* Memory unit size in bytes */
char _f[20-2*sizeof(long)-sizeof(int)]; /* Padding for libc5 */
};
If you want to do it solely using functions of C++ (I would stick to sysinfo), I recommend taking a C++ approach using std::ifstream and std::string:
unsigned long get_mem_total() {
std::string token;
std::ifstream file("/proc/meminfo");
while(file >> token) {
if(token == "MemTotal:") {
unsigned long mem;
if(file >> mem) {
return mem;
} else {
return 0;
}
}
// Ignore the rest of the line
file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return 0; // Nothing found
}
There isn't any need to use popen(). You can just read the file yourself.
Also, if their first line isn't what you're looking for, you'll fail, since head -n1 only reads the first line and then exits. I'm not sure why you're mixing C and C++ I/O like that; it's perfectly OK, but you should probably opt to go all C or all C++. I'd probably do it something like this:
int GetRamInKB(void)
{
FILE *meminfo = fopen("/proc/meminfo", "r");
if(meminfo == NULL)
... // handle error
char line[256];
while(fgets(line, sizeof(line), meminfo))
{
int ram;
if(sscanf(line, "MemTotal: %d kB", &ram) == 1)
{
fclose(meminfo);
return ram;
}
}
// If we got here, then we couldn't find the proper line in the meminfo file:
// do something appropriate like return an error code, throw an exception, etc.
fclose(meminfo);
return -1;
}
Remember /proc/meminfo is just a file. Open the file, read the first line, and close the file. Voilà!
Even top (from procps) parses /proc/meminfo. See here.