write data structure into a file using binary mode - c++

code looks like this:
struct Dog {
string name;
unsigned int age;
};
int main()
{
Dog d = {.age = 3, .name = "Lion"};
FILE *fp = fopen("dog.txt", "wb");
fwrite(&d, sizeof(d), 1, fp); //write d into dog.txt
}
My problem is what's the point of write a data object or structure into a binary file? I assume it is for making the data generated in a running program persistent, right? If yes, then how can I get the data back? Using fread?
This makes me think of database-like stuff, dose database write data into disk the same way?

You can do it but you will have a lot of issues to care about:
structure types: all your data needs really be into struct or you can just writing a pointer to some other place.
structure changes: if you need change your structure you will need write a converter to read old struct and write the new.
language interoperability: will be hard to access the data using other language
It was a common practice in the early days before relational databases popularization. You can make index files pointing to a record number.
However nowadays I will advice you to make serialization and write strings instead binaries.
NOTE:
if string is something like char[40] your code maybe will survive... but if your question is about C++ and string is a class then kill you child before it grows up! The string object characters are not into your struct but in the heap.

Writing data in binary is extremely useful and much faster then reading/writing in text, take for instance video games (Although not every video game does this), when the game is saved all of the nescessary structures/classes and other data are written into a save file in binary.
It is just one use for using binary, but the major reason for doing this is speed.
And to read the data back, you will need to know the format that you saved it in, for instance as a simple example, if I saved an integer, char array of n size, and a boolean, I would need to read the binary file in as an integer, char array of n size, and a boolean. Otherwise the data is read improperly and will not be very useful at all

Be careful. The type of field 'name' in your structure is 'string'. This class contains data allocated dynamically. So writing 'string' data into file this way only pointers will be writed, not data itself.

The C++ Middleware Writer supports binary serialization to/from files.
From a marshalling perspective the "unsigned int age" member of your struct is a potential problem. I'd consider changing the type to uint32_t.

Related

What is the C++ equivalent of binary.write in Golang?

I am working on project in C++ that adopts many ideas from a golang project.
I don't properly understand how this binary.write works from the documentation and how I can replicate it in C++. I am stuck at this line in my project.
binary.Write(e.offsets, nativeEndian, e.offset)
The type of e.offsets is *bytes.Buffer and e.offset is uint64
In C++ standard libs, it is generally up to you to deal with endian concerns. So let's skip that for the time being. If you just want to write binary data to a stream such as a file, you can do something like this:
uint64_t value = 0xfeedfacedeadbeef;
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value), sizeof(value));
The cast is necessary because the file stream deals with char*, but you can write whatever byte streams to it you like.
You can write entire structures this way as well so long as they are "Plain Old Data" (POD). For example:
struct T {
uint32_t a;
uint16_t b;
};
T value2 = { 123, 45 };
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value2), sizeof(value2));
Reading these things back is similar using file.read, but as mentioned, if you REALLY do care about endian, then you need to take care of that yourself.
If you are dealing with non-POD types (such as std::string), then you will need to deal with a more involved data serialization system. There are numerous options to deal with this if needed.

Create vector from CSV at compile time in C++

I'm trying to create a lookup table for my Xilinx Zynq SoC (the ARM Cortex).
I have a CSV file with 1330 entries which I can not read or parse during runtime. The latest I can do that is at compile time. I have read it is possible to embed a file into an executable so it can be used during runtime.
In other words, I want to read and parse the CSV file at runtime, without the original file actually being on any filesystem since it's an embedded device. So I would need to somehow embed the CSV file into the executable. How would I achieve something like this?
The CSV file looks like this (full file is here);
0,0,48,112,160,208,272,320,368,....,65440,65488
You asked for a vector but I'm not sure why you'd necessarily want that. The data will unavoidably occupy space in the application's read-only (".text" or ".rodata" or something like that) section, and while you can convert it into a vector if necessary (which will consume heap space and require runtime construction and initialization from the data in the read-only .text/.rodata section), you might as well just use it as a POD array since I doubt you'll be changing the data at runtime. So to create a const POD array of the data you could just do something like this....
const int myArray[] =
{
#include "myCsvFile.csv"
};
If the number of elements is not fixed, your program can determine the number with sizeof(myArray)/sizeof(myArray[0]). Even if it is a fixed size, this technique is probably best anyway. And of course if all of your entries are unsigned and can fit within 16 bits (a cursory examination suggested so), instead of defining it as an array of int, you can define it as an array of unsigned short or uint16_t to save space.
I should also mention that the const keyword is important here: Without that, your array will occupy twice as much memory: first, it will occupy space in the read-only section (.text or .rodata or whatever), and during application initialization, the runtime will make a read/writable copy of the read-only data in the read/write data section (.data probably), where myArray is allocated. To avoid that, define it as const and then the address of myArray will be in the read-only section, and it won't be copied to the read/write data section.
As your data is a plain array of unsigned integers you can use precompiler
Assuming you csv file is in data.csv file.
Then simply in .cpp file you can following:
const unsigned int k_Data[] = {
#include "data.csv" // << Preprocessor will replace this line with the contents of data.csv
};
#include <iostream>
int main()
{
std::cout << k_Data[3];
}
112
For the specific type of CSV data you have, e.g.
0,0,48,112,160,208,272,320,368,432,480,512,576,640,704,752,800,848,896,......
which is basically just a bunch of numbers, you should be able to include them using an actual #include statement, so;
const unsigned short myCSV[] ={
#include "./theCSV.data"
};
Im using unsigned short since the largest number looks like 64k and it would save you some space -- but you may want to use int instead, if you beleive the number can be larger than 64k

Three nested Object to file *.txt in C++

i want write an Object to file (*.txt)
class MyCircle{
double x;
double y;
double radius;
char index;
int check;}
class Question{
int index;
int quantityOfAnswers;
MyCircle [] arrCircles;
char solution;}
class answersShee{
int answersSheetID;
string answersSheetName;
int quantityOfCandidateID;
int quantityOfCodeExamination;
int quantityOfQuestion;
Question arrCandidateIDs[quantityOfCandidateID];
Question arrCodeExaminations[quantityOfCodeExamination];
Question arrQuestions[quantityOfQuestion];}
i want write and read class answersSheet to a file(*.txt) but in C++ dont't have write an Object like in Java :(
I really don't understand. There are many solutions existing on the internet. Search for "c++ object serialization".
Binary or Textual representation in file.
Before you start writing an object to a file, you need to decide whether you are writing binary or textual representations.
A binary representation mirrors the bits and bytes of the platform's representation. Numbers and other structures may not be easily readable when viewed by a text editor. This is usually the most efficient, but least portable solution. Pointers don't do well, neither may multi-byte integers (research "Endianess" or "little endian"). Variable length fields require more structure in the binary field (such as a length field or sentinel value).
A textual representation is the human readable form. The textual form is more portable and easier to read by humans. This form is slower to read and write (due to the translations to and from textual format). Also, pointer values do not port with this format (unless you use numeric link values).
Writing textual representation to file.
There are a plethora of examples on the internet. I will summarize and say that this functionality usually involves overloading operator << and operator >>. Which means adding functionality to your classes and structures (or you could make a free standing function).
Serialization Libraries
Many people use these. I cannot recommend or not recommend these because I haven't used them. In summary, these libraries simplify the reading and writing of objects from and to a file.
Writing Binary Format
Again, this requires adding functionality to your classes and structures (or making free standing functions). I like to augment the typical case of using the std::istream::read and std::ostream::write methods.
I use a technique of:
1. Asking each object for the size that the object would occupy in a buffer, and summing the values.
1.1. This is recursive, each object would ask each member for its size.
2. Allocating a buffer of uint8_t for the size.
3. Pass a pointer to the buffer to each object and tell each object to write its contents at the location pointed to by the buffer pointer (the objects will increment the pointer accordingly).
4. Write the buffer as one big block, using binary translation, to the file.
5. The reading is similar, different order.

Compressing/decompressing data on the go

I'm running a physics simulation and I'd like to improve the way it is handling its data. I'm saving and reading files that contained one float then two ints followed by 512*512 = 262144 +1 or -1 weighting 595 kb per datafile in the end. All these numbers are separated by a single space.
I'm saving hundreds of thousands of these files so it quickly adds up to gigas of storage, I'd like to know if there is a quick (hopefully light cpu-effort-wise way) of compressing and decompressing this kind of data on the go (I mean not tarring/untarring before/after use).
How much could I expect saving in the end?
If you want relatively fast read-write, you would probably want to store and read them in "binary" format, i.e. native to the way they are internally stored in bytes. A float uses 4-bytes of data, and you do not need any kind of "separator" when storing a large sequence of them.
To do this you might consider boost's "serialize" library.
Note that using data compression methods (zlib etc) will save you on bytes stored but will be relatively slow to zip and unzip them for usage.
Storing in binary format will not only use less disk storage (than storing in text format) but should also be more performant, not just because there is less file I/O but also because there is no string writing/parsing going on.
Note that when you input/output to binary_iarchive or binary_oarchive you pass in an underlying istream or ostream and if this is a file, you need to open it with ios::binary flag because of the issue of line-endings being potentially converted.
Even if you do decide that data-compression (zlib or some other library) is the way to go, it is still worth using boost::serialize to get your data into a "blob" to compress. In that case you would probably use std::ostringstream as your output stream to create the blob.
Incidentally, if you have 2^18 "boolean" values that can only be 1 or -1, you only need 1 bit for each one, (they would be physically stored as 1 or 0 but you would logically translate that). That would come to 2^15 bytes which is 32K not 595K
Given the extra info about the valid data, define your class like this:-
class Data
{
float m_float_value;
int m_int_value_1, m_int_value_2;
unsigned m_weights [8192];
};
Then use binary file IO to stream this class to and from a file, don't convert to text!
The weights are stored as Boolean values, packed into unsigned integers.
To get the weight, add an accessor:-
int Data::GetWeight (size_t index)
{
return m_weights [index >> 5] & (1 << (index & 31)) ? 1 : -1;
}
This gives you a data file size of 32780 bytes (5.4%) if there's no packing in the class data.
I would suggest that if you are concerned about size a binary format would be the most useful way to "compress" your data. It sounds like you are dealing with something like the following:
struct data {
float a;
int b, c;
signed char d[512][512];
};
someFunc() {
data* someData = new data;
std::ifstream inFile("inputData.bin", std::ifstream::binary);
std::ofstream outFile("outputData.bin", std::ofstream::binary);
// Read from file
inFile.read(someData, sizeof(data));
inFile.close();
// Write to file
outFile.write(someData, sizeof(data));
outFile.close();
delete someData;
}
I should also mention then if you encode your +1/-1 as bits you should get a lot of space savings (another factor of 8 on top of what I'm showing here).
For that amount of data, anything homemade isn't going perform anywhere near as well as good-quality open-source binary-storage libraries. Try boost serialize or - for this type of storage requirement - HDF5. I've used HDF5 successfully on a few projects with very very large amounts of double, float, long and int data . Found it useful one can control the compression-rate vs cpu-effort on the fly per "file". Also useful is storing millions of "files" in a hierarchically-structured single "disk" file. NASA - probably ripping my style;) - also uses it.

Reading Superblock into a C Structure

I have a disk image which contains a standard image using fuse. The Superblock contains the following, and I have a function read_superblock(*buf) that returns the following raw data:
Bytes 0-3: Magic Number (0xC0000112)
4-7: Block Size (1024)
8-11: Total file system size (in blocks)
12-15: FAT length (in blocks)
16-19: Root Directory (block number)
20-1023: NOT USED
I am very new to C and to get me started on this project I am curious what is a simple way to read this into a structure or some variables and simply print them out to the screen using printf for debugging.
I was initially thinking of doing something like the following thinking I could see the raw data, but I think this is not the case. There is also no structure and I am trying to read it in as a string which also seems terribly wrong. for me to grab data out of. Is there a way for me to specify the structure and define the number of bytes in each variable?
char *buf;
read_superblock(*buf);
printf("%s", buf);
Yes, I think you'd be better off reading this into a structure. The fields containing useful data are all 32-bit integers, so you could define a structure that looks like this (using the types defined in the standard header file stdint.h):
typedef struct SuperBlock_Struct {
uint32_t magic_number;
uint32_t block_size;
uint32_t fs_size;
uint32_t fat_length;
uint32_t root_dir;
} SuperBlock_t;
You can cast the structure to a char* when calling read_superblock, like this:
SuperBlock_t sb;
read_superblock((char*) &sb);
Now to print out your data, you can make a call like the following:
printf("%d %d %d %d\n",
sb.magic_number,
sb.block_size,
sb.fs_size,
sb.fat_length,
sb.root_dir);
Note that you need to be aware of your platform's endianness when using a technique like this, since you're reading integer data (i.e., you may need to swap bytes when reading your data). You should be able to determine that quickly using the magic number in the first field.
Note that it's usually preferable to pass a structure like this without casting it; this allows you to take advantage of the compiler's type-checking and eliminates potential problems that casting may hide. However, that would entail changing your implementation of read_superblock to read data directly into a structure. This is not difficult and can be done using the standard C runtime function fread (assuming your data is in a file, as hinted at in your question), like so:
fread(&sb.magic_number, sizeof(sb.magic_number), 1, fp);
fread(&sb.block_size, sizeof(sb.block_size), 1, fp);
...
Two things to add here:
It's a good idea, when pulling raw data into a struct, to set the struct to have zero padding, even if it's entirely composed of 32-bit unsigned integers. In gcc you do this with #pragma pack(0) before the struct definition and #pragma pack() after it.
For dealing with potential endianness issues, two calls to look at are ntohs() and ntohl(), for 16- and 32-bit values respectively. Note that these swap from network byte order to host byte order; if these are the same (which they aren't on x86-based platforms), they do nothing. You go from host to network byte order with htons() and htonl(). However, since this data is coming from your filesystem and not the network, I don't know if endianness is an issue. It should be easy enough to figure out by comparing the values you expect (e.g. the block size) with the values you get, in hex.
It's not difficult to print the data after you successfully copied data into a structure Emerick proposed. Suppose the instance of the structure you use to hold data is named SuperBlock_t_Instance.
Then you can print its fields like this:
printf("Magic Number:\t%u\nBlock Size:\t%u\n etc",
SuperBlock_t_Instance.magic_number,
SuperBlock_t_Instance.block_size);