Reading data from a Binary/Random Access File - c++

I have a file in binary format having a large amount of data.
If I have knowledge of the file structure, how do I read information from the binary file, and populate a record of these structures?
The data is complex.
I'd like to do it with Qt, but I would do it in C++ as well, if required.
Thanks for your help..

If the binary file is really large then it’s better to load it as (char*) array if enough RAM is available via low level read function http://crasseux.com/books/ctutorial/Reading-files-at-a-low-level.html
and then you can parse it.
But this will only help you to load large files, not to parse complex structures.
Not sure, but you could also take a look at yacc.

This doesn't sound like yacc will be a solution, he isn't trying to parse the file, he wants to read binary formatted data to a data structure.
You can read the data in and then map it to a struct that matches the format. If the data is complex you may need to lay structs over it in a variety of ways depending on how the data layout works. So basically read the file into a char* or and then select the element where your struct starts, cast that element to a pointer to your stuct and then access the element. Without more detail it's impossible to be more specific than this.

http://courses.cs.vt.edu/~cs2604/fall00/binio.html would be help for you. I've learned from there. (hint always cast your data as(char*) ).

Related

Decompressing a byte array with zlib to a byte array

Context: I'm using a .mbtiles file, a geomapping file format, which is a sqlite database file containing vector tiles.
Those vector tiles are packed using protocol buffer and then gzipped.
I'm using C++, and currently reading the zlib usage decompression example, but I am not sure about how to handle chunks and the end of stream event.
SQLite gives me a void* pointer and a length.
I quote the page:
For applications where zlib streams are embedded in other data, this
routine would need to be modified to return the unused data, or at
least indicate how much of the input data was not used, so the
application would know where to pick up after the zlib stream.
The protocol buffer class methods either take void* or std::string. I guess I should go with void*.
I'm not sure how those events work, and the example doesn't seem to provide a case for bytes arrays. How should I change the code to avoid errors ?
It sounds like SQLite is giving you a zlib stream without anything after it. If so, then that comment doesn't apply.
In any case, you are looking at the right page. (You didn't say what "the page" is, but I recognize the quote, since I wrote it.) That shows in general how to use the zlib functions. You should be able to figure out how to apply it to a byte array instead of file input.
If the data is really "gzipped", then you will need to use inflateInit2() instead of inflateInit(). Read the zlib documentation in zlib.h.

Reading data from file - store in variables or read again and again

I have an xml file which contains numerous data that is needed for a program. The data x,y,z are only needed for function 'a' while p,q,r are needed for the whole project. Some data items in the file tend to be very large. (i.e - a float vector with 50,000 items).
Is it better to read this data at once and store it in variables or to read it only when the method is called, this will cause the file to be opened twice. (I.m using pugixml to read data)
Thanks.
I think that that depends on your requirements, which will determine which will be chosen.
For example, the performance of the program, if it will need a large memory if read all the data once. If not, you can read once, because frequent IO is not a good choice.

How to read info from a binary data file into an array of structures C++

I am a beginning C++ student. I have a structure array that holds employee info.
I can put values into the structure, write those values to a binary dat file and
read the values back into the program so that it can be displayed to the console.
Here is my problem. Once I close the program, I can't get the file to read the data from the file back into memory - instead it reads "garbage."
I tried a few things and then read this in my book:
NOTE: Structures containing pointers cannot be correctly stored to
disk using the techniques of this section. This is because if the
structure is read into memory on a subsequent run of the program, it
cannot be guaranteed that all program variables will be at the same
memory locations.
I am pretty sure this is what is going on when I try to open a .dat file with previously stored information and try to read it into a structure array.
I can send my code examples if that would help clarify my question.
Any suggestions would be appreciated.
Speaking generally (since I don't have your code) there's two reasons you generally shouldn't just write the bytes of a struct or class to a file:
As your book mentioned, writing a pointer to disk is pointless, since you're just storing a random address and not the data at that address. You need to write out the data itself.
You especially should not attempt to write a struct/class all at once with something like
fwrite(file, myStruct, sizeof(myStruct)). Compilers sometimes put empty bytes between variables in structs in order to let the processor read them faster - this is called padding. Using a different compiler or compiling for a different computer architecture can pad structures differently, so a file that opens correctly on one computer might not open correctly on another.
There's lots of ways you can write out data to a file, be it in a binary format or in some human-readable format like XML. Regardless of what method you choose to use (each has strengths and weaknesses), every one of them involves writing each piece of data you care to save one by one, then reading them back in one by one. Higher level languages like Java or C# have ways to do this automagically by storing metadata about objects, but this comes at the cost of more memory usage and slower program execution.

Store and retrieve a Data structure

I have a situation where I'm parsing an xml file in c++(using libxml) and with the extracted info I'm creating a data structure on the fly & modifying the D.S according to the further extracted info from the parsed file. Now I need to save the D.S as it is, in secondary memory and I want to retrieve back the D.S from memory later so that I can continue working further without the need of creating the D.S once again. Can someone please help me out on how to do this?
I suggest using a library like Boost.Serialization for this.
I assume that 'secondary memory' is a hard drive or something. In that case use fwrite and fread, or in C++ overload << and >> if you want. How you do this depends on your data structure, if it has members that are pointers then things get a bit more complicated.
There's not enough info in your question to really help you.

interactive binary decoder to structs

I have looked a bit but I was unable to find what I figured might have been something that has already been created.
I am looking for an application that would read in a binary file, allow the inputing of the types of patterns/rules in someway that are expected (like a set of messages each of which are header + data) and then deserialize the data into a text format based on the patterns/rules (e.g., the binary file is a set of M messages with a header that contains the type of struct and the number of bytes the struct's serialization takes up directly serialized to the file).
Specifically, lets say I know ahead of time that I will have a file that contains a sequence of serialized C structs (or C++ classes) which are all prepended by a header indicating which struct in serialized in the next N bytes (where N is contained in the header).
I know how to write C/C++ code to go through and deserialize the data (provided I know all the types ahead of time) but I am wondering if there exists some type of application which would help facilitate this process if you were not entirely sure of the format/structs ahead of time (other than a hexeditor). Something graphical where you could see the dynamic effect of changing the structs/rules/patterns would be optimal if it exists.
boost::serialization already does something quite similar to this, without having to get your hands quite as dirty in the details. It supports various archive formats, including XML, text and binary ones, is very extensible and can cope with smart pointers, containers etc.