Consider the following code:
#pragma pack(2)
struct file_data {
uint8_t data0;
uint16_t data1;
uint8_t data2;
}
#pragma pack()
file_data readFile(const std::string& pathFile) {
file_data result;
std::ifstream(pathFile, std::ios::in | std::ios::binary);
if(!file.is_open()) return file_data();
file.read(reinterpret_cast<char*>(result), sizeof(file_data));
file.close();
return result;
}
int main(int argc, char* argv[]) {
file_data = readFile("the/path/to/the/file");
return 0;
}
In plain English, the code reads the file to the variable result, of type struct file_data, and returns it.
However, say I have already read the bytes of a file and stored them in a std::vector of type int8_t. I want to then write that data to an instance of file_data. I do not want to do this field by field, as the structure may change.
This has been my workaround:
file_data readData(const std::vector<int8_t>& bytes) {
std::stringstream ss;
for(int8_t byte : bytes) ss.write(reinterpret_cast<const char*>(&byte), sizeof(int8_t));
file_data result;
ss.read(reinterpret_cast<char*>(&result), sizeof(file_data));
return result;
}
So, first the vector is written back to a stream, then read into the struct.
I feel like this is highly inefficient, and there is some knowledge I am missing here. Can someone please provide a better solution to my readData() method?
Aside: int8_t might not be char, you should use it instead.
file_data readData(const std::vector<char>& bytes) {
file_data result;
std::copy_n(bytes.data(), sizeof(file_data), reinterpret_cast<char*>(&result));
return result;
}
This works similarly with std::string and std::array<char, N> (for N >= sizeof(file_data))
Related
struct MyStruct {
Items item[100][60];
string Something;
int X;
int Y;
};
I have this struct "MyStruct" with a 2D Array of 100 * 60.
If I want to save the struct in Json Array for the item[100][60]
how I can do it using nlohmann json?
could anyone help me please?
Or if there is a way to save as binary file without using boost, I'll take that too.
void Save(std::string name, MyStruct test) {
std::string filename = name + ".dat";
std::ofstream out(filename);
boost::archive::binary_oarchive binary_output_archive(out);
binary_output_archive& test;
out.close();
}
void Read(std::string filename) {
std::ifstream in(filename + ".dat");
boost::archive::binary_iarchive binary_input_archive(in);
MyStruct test;
binary_input_archive& test;
in.close();
}
I tried this but it also crash sometimes so I want a better way
void Save(const std::string& name, const MyStruct& test) {
auto result = nlohmann::json{
{"item", json::array()},
{"Something", test.Something}
{"X", test.X},
{"Y", test.Y},
};
for (auto i = 0u; i < 100u; ++i) {
auto& outer = result["item"];
for (auto j = 0u; j < 60u; ++j) {
// You'll need to convert whatever Items is into a JSON type first
outer[i].push_back(test[i][j]);
}
}
auto out = std::ofstream out(name + ".dat");
out << result;
}
Something like this will suffice for saving, you can work out the deserialisation from this and the docs.
I strongly advise that you do not use Items item[100][60], the stack is not for items that large. Use a vector of std::arrays to use the heap but retain the same memory layout.
Before I start, consider this code:
One data transfer object ObjectDTO
class ObjectDTO {
public:
int id;
string string1;
string string2;
string string3;
int code1;
vector<string> stringList1;
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive &archive, const unsigned int version) {
archive & id;
archive & string1;
archive & string2;
archive & string3;
archive & code1;
archive & stringList1;
}
Serialization
void OutputStreamService::writeReportsToFile(vector<ObjectDTO> objects, int filename){
ofstream outputFileStream(to_string(filename));
boost::archive::binary_oarchive outputArchive(outputFileStream);
outputArchive << objects;
}
Deserialization
vector<ObjectDTO> InputStreamService::readObjects() {
ifstream inputFileStream(to_string(fileNumber++));
boost::archive::binary_iarchive inputArchive(inputFileStream);
vector<ObjectDTO> objects;
inputArchive >> objects;
return objects;
}
I am using Boost Serialization C++ librarys to serialize a vector of ObjectDTOs and read it back later.
Supose i generated 30GB of random ObjectDTOs and saved it to the same file
How can i read only some of them to avoid reaching memory limit?
I am using Boost Serialization because it was the simples way i found to solve the first problem but i can change to any other approach if necessary!
Use Google Protocol buffers instead, there are CodedOutputStream class for serialization and CodedInputStream for deserialization.
One of CodedOutputStream methods is WriteVarint32, which allows to write a number which could be used as an index in the stream.
In CodeInputStream there is corresponding ReadVarint32 method, eg.
Serialization:
char text[[]] = "Hello world!";
coded_output->WriteVarint32(strlen(text));
coded_output->WriteRaw(text, strlen(text));
Deserialization:
uint32 size;
coded_input->ReadVarint32(&size);
char* text = new char[size + 1];
coded_input->ReadRaw(buffer, size);
The last line allows you to read the content of serialized stream starting from given index.
Here are my two methods to serialize/deserialize streams with given length at the start.
template < class T>
void TProtoBufSerializer::SerializeImplementation(const T& protoBuf, std::vector<char>& buffer )
{
int bufLength = protoBuf.ByteSize() + google::protobuf::io::CodedOutputStream::VarintSize32(protoBuf.ByteSize());
buffer.resize(bufLength);
google::protobuf::io::ArrayOutputStream arrayOutput(&buffer[0], bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);
codedOutput.WriteVarint32(protoBuf.ByteSize());
protoBuf.SerializeToCodedStream(&codedOutput);
}
template < class T>
bool TProtoBufSerializer::DeSerializeImplementation(std::vector<char>& buffer, T& protoBuf )
{
bool deserialized = false;
google::protobuf::io::ArrayInputStream arrayInput(&buffer[0],buffer.size());
google::protobuf::io::CodedInputStream codedInput(&arrayInput);
unsigned int object_size;
bool header_readed = codedInput.ReadVarint32(&object_size);
if(header_readed && object_size > 0)
{
if( buffer.size() >= codedInput.CurrentPosition() + object_size )
{
google::protobuf::io::CodedInputStream::Limit limit = codedInput.PushLimit(object_size);
if(protoBuf.ParseFromCodedStream(&codedInput))
{
std::vector<char>::iterator it = buffer.begin();
std::advance(it,codedInput.CurrentPosition());
std::move(it,buffer.end(),buffer.begin() );
buffer.resize(buffer.size() - codedInput.CurrentPosition());
deserialized = true;
}
else
{
throw TProtoBufSerializerPayloadException();
}
codedInput.PopLimit(limit);
}
}
else
{
//varint32 which is used in header is at the most 5 bytes long,
//if given buffer is 5 bytes or more long and header still cannot be decoded - raise exception
if(buffer.size() >= 5)
{
throw TProtoBufSerializerHeaderException();
}
}
return deserialized;
}
I solved the problem discarding Boost Serialization and vectors in favor of arrays with plain old C++ write and read on ofstream and ifstream respectively.
My OutputStreamService writeObjectsToFile ended like this:
void OutputStreamService::writeObjectssToFile(ObjectDTO * objects, int filename){
ofstream outputFileStream(to_string(filename), std::ios::binary);
outputFileStream.write((char *)&objects, sizeof(objects));
}
And InputStreamService with readObjects:
ObjectDTO * InputStreamService::readObjects() {
ifstream inputFileStream(to_string(fileNumber++), std::ios::binary);
ObjectDTO objects[10];
inputFileStream.read((char *)&objects, sizeof(objects));
return objects;
}
This way i can define 10 or any other integer as the number of objects i want to read in.
To solve the mais problem, i can now calculate the aprox number of objects my memory can handle and then limit the number of reads!
Ty!
I cannot work out why this isn't working. From what I can tell, it doesn't appear to be reading the whole image file... Though I cannot tell. I basically have some raw image that I'd like to read onto the heap.
unsigned char* ReadImageFromFile(const char* FILENAME, unsigned int SIZE_BYTES)
{
unsigned char *data = (unsigned char*) malloc(SIZE_BYTES);
std::ifstream image(FILENAME);
image.read((char*) data, SIZE_BYTES);
image.close();
return data;
}
1) open the file in binary mode
2) don't return a raw pointer that needs to be freed
std::string readImageFromFile(const char* filename)
{
std::ifstream image(filename, std::ios::binary);
std::ostringstream data;
data << image.rdbuf();
return data.str();
}
Or if your prefer to write error-prone code (seems to be popular with the embedded crowd) you could do it this way:
char* readImageFromFile(const char* filename)
{
std::ifstream image(filename, std::ios::binary);
std::ostrstream data;
data << image.rdbuf();
data.freeze();
return data.str();
}
Of course there's a good reason strstreams are deprecated.
Try std::ifstream image(FILENAME, std::ios_base::binary); (note the second argument to ifstream constructor).
I am trying to read a binary file into memory, and then use it like so:
struct myStruct {
std::string mystring; // is 40 bytes long
uint myint1; // is 4 bytes long
};
typedef unsigned char byte;
byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????
I need a way of accessing the binary file with an offset, and getting a certain length at that offset.
This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))
Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.
If you're not going to use a serialization library, then I suggesting adding serialization support to each class:
struct My_Struct
{
std::string my_string;
unsigned int my_int;
void Load_From_Buffer(unsigned char const *& p_buffer)
{
my_string = std::string(p_buffer);
p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
my_int = *((unsigned int *) p_buffer);
p_buffer += sizeof(my_int);
}
};
unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);
Some other useful interface methods:
unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.
With templates you can extend the serialization functionality:
void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
s = std::string((char *)p_buffer);
p_buffer += s.length() + 1;
}
void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
object.Load_From_Buffer(p_buffer);
}
Edit 1: Reason not to write structure directly
In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.
For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.
Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.
Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.
Edit 2: Variant records
When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.
Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.
The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5
In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.
You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:
struct Record {
std::string name;
int value;
};
std::istream& operator>>(std::istream& in, Record& record) {
char name[40] = { 0 };
int32_t value(0);
in.read(name, 40);
in.read(reinterpret_cast<char*>(&value), 4);
record.name.assign(name, 40);
record.value = value;
return in;
}
std::ostream& operator<<(std::ostream& out, const Record& record) {
std::string name(record.name);
name.resize(40, '\0');
out.write(name.c_str(), 40);
out.write(reinterpret_cast<const char*>(&record.value), 4);
return out;
}
int main(int argc, char **argv) {
const char* filename("records");
Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
int n(sizeof(r)/sizeof(r[0]));
std::ofstream out(filename, std::ios::binary);
for (int i = 0; i < n; ++i) {
out << r[i];
}
out.close();
std::ifstream in(filename, std::ios::binary);
std::vector<Record> rIn;
Record record;
while (in >> record) {
rIn.push_back(record);
}
for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
std::cout << "name: " << i->name << ", value: " << i->value
<< std::endl;
}
return 0;
}
I am having problems trying to serialise a vector (std::vector) into a binary format and then correctly deserialise it and be able to read the data. This is my first time using a binary format (I was using ASCII but that has become too hard to use now) so I am starting simple with just a vector of ints.
Whenever I read the data back the vector always has the right length but the data is either 0, undefined or random.
class Example
{
public:
std::vector<int> val;
};
WRITE:
Example example = Example();
example.val.push_back(10);
size_t size = sizeof BinaryExample + (sizeof(int) * example.val.size());
std::fstream file ("Levels/example.sld", std::ios::out | std::ios::binary);
if (file.is_open())
{
file.seekg(0);
file.write((char*)&example, size);
file.close();
}
READ:
BinaryExample example = BinaryExample();
std::ifstream::pos_type size;
std::ifstream file ("Levels/example.sld", std::ios::in | std::ios::binary | std::ios::ate);
if (file.is_open())
{
size = file.tellg();
file.seekg(0, std::ios::beg);
file.read((char*)&example, size);
file.close();
}
Does anyone know what I am doing wrong or what to do or be able to point me in the direction that I need to do?
You can't unserialise a non-POD class by overwriting an existing instance as you seem to be trying to do - you need to give the class a constructor that reads the data from the stream and constructs a new instance of the class with it.
In outline, given something like this:
class A {
A();
A( istream & is );
void serialise( ostream & os );
vector <int> v;
};
then serialise() would write the length of the vector followed by the vector contents. The constructor would read the vector length, resize the vector using the length, then read the vector contents:
void A :: serialise( ostream & os ) {
size_t vsize = v.size();
os.write((char*)&vsize, sizeof(vsize));
os.write((char*)&v[0], vsize * sizeof(int) );
}
A :: A( istream & is ) {
size_t vsize;
is.read((char*)&vsize, sizeof(vsize));
v.resize( vsize );
is.read((char*)&v[0], vsize * sizeof(int));
}
You're using the address of the vector. What you need/want is the address of the data being held by the vector. Writing, for example, would be something like:
size = example.size();
file.write((char *)&size, sizeof(size));
file.write((char *)&example[0], sizeof(example[0] * size));
I would write in network byte order to ensure file can be written&read on any platform. So:
#include <fstream>
#include <iostream>
#include <iomanip>
#include <vector>
#include <arpa/inet.h>
int main(void) {
std::vector<int32_t> v = std::vector<int32_t>();
v.push_back(111);
v.push_back(222);
v.push_back(333);
{
std::ofstream ofs;
ofs.open("vecdmp.bin", std::ios::out | std::ios::binary);
uint32_t sz = htonl(v.size());
ofs.write((const char*)&sz, sizeof(uint32_t));
for (uint32_t i = 0, end_i = v.size(); i < end_i; ++i) {
int32_t val = htonl(v[i]);
ofs.write((const char*)&val, sizeof(int32_t));
}
ofs.close();
}
{
std::ifstream ifs;
ifs.open("vecdmp.bin", std::ios::in | std::ios::binary);
uint32_t sz = 0;
ifs.read((char*)&sz, sizeof(uint32_t));
sz = ntohl(sz);
for (uint32_t i = 0; i < sz; ++i) {
int32_t val = 0;
ifs.read((char*)&val, sizeof(int32_t));
val = ntohl(val);
std::cout << i << '=' << val << '\n';
}
}
return 0;
}
Read the other's answer to see how you should read/write a binary structure.
I add this one because I believe your motivations for using a binary format are mistaken. A binary format won't be easier that an ASCII one, usually it's the other way around.
You have many options to save/read data for long term use (ORM, databases, structured formats, configuration files, etc). The flat binary file is usually the worst and the harder to maintain except for very simple structures.