Writing/reading raw objects to file

Writing/reading raw objects to file - c++

Problem: I need to write/read objects from a file.This because I need to write/read a std::list to file, but in whatever case.Not only with T=int (this would be simple), but with whatever parameter.
In Java with OutputFileStream and InputFileStream this was possibile, but I suppose it's just a JVM feature.
However I am trying to read/write objects to a file:
template <class T>
bool write_object(std::fstream& out, const T& object)
{
bool result=false;
char* ptr;
const unsigned long size=sizeof(object);
if(out.good())
{
result=true;
ptr=(char*)&object;
for(unsigned int i=0;i<size;i++)
{
out << *ptr;
ptr++;
}
}
return result;
}
template <class T>
bool read_object(std::fstream& in, T& object)
{
bool result=false;
T* ptr;
T swap_temp;
const unsigned long size=sizeof(object);
char temp[size];
std::streampos pos;
if(in.good())
{
pos=in.tellg();
if(pos!=-1)
{
result=true;
for(unsigned long i=0; i<size; i++)
{
if(!in.good())
{
result=false;
break;
}
else
{
in >> temp[i];
}
}
}
}
if(result==false)
{
in.seekg(pos);
}
else
{
ptr=(T*)temp;
swap_temp=*ptr;
object=swap_temp;
}
return result;
}
But I have encountered the following problems:
-sizeof operator just returns the size of all fields, it does not consider also the data pointed by internal fields;
-If in the class there is a pointer, then this pointer could point to a "wrong" memory address, (e.g.) if I use a pointer which points to a C-style string in memory, once the program ends the string is deallocated.When the instance of the program runs again,this area of memory could be anywhere.
This method is wrong because for example sizeof(string) with my compiler returns 4.
So I suppose it uses a char pointer (I am on a 32-bit machine) to point to the C-style string allocated.Probably it does not even keep trace of the length.
So if the string has 32 characters I don't notice it, it just copies the value of the pointer.

Your approach can't work since C++ doesn't know java-like techniques like reflection so you can't distinguish between pointers and other members.
What you want is called serialisazion and you can use it with libraries like Boost.Serialization (Demo).
But even then, you can't write a general function, you have to define it specifically for each object.

Related

Implementing a String class with implicit conversion to char* (C++)

It might not be advisable according to what I have read at a couple of places (and that's probably the reason std::string doesn't do it already), but in a controlled environment and with careful usage, I think it might be ok to write a string class which can be implicitly converted to a proper writable char buffer when needed by third party library methods (which take only char* as an argument), and still behave like a modern string having methods like Find(), Split(), SubString() etc. While I can try to implement the usual other string manipulation methods later, I first wanted to ask about the efficient and safe way to do this main task. Currently, we have to allocate a char array of roughly the maximum size of the char* output that is expected from the third party method, pass it there, then convert the return char* to a std::string to be able to use the convenient methods it allows, then again pass its (const char*) result to another method using string.c_str(). This is both lengthy and makes the code look a little messy.
Here is my very initial implementation so far:
MyString.h
#pragma once
#include<string>
using namespace std;
class MyString
{
private:
bool mBufferInitialized;
size_t mAllocSize;
string mString;
char *mBuffer;
public:
MyString(size_t size);
MyString(const char* cstr);
MyString();
~MyString();
operator char*() { return GetBuffer(); }
operator const char*() { return GetAsConstChar(); }
const char* GetAsConstChar() { InvalidateBuffer(); return mString.c_str(); }
private:
char* GetBuffer();
void InvalidateBuffer();
};
MyString.cpp
#include "MyString.h"
MyString::MyString(size_t size)
:mAllocSize(size)
,mBufferInitialized(false)
,mBuffer(nullptr)
{
mString.reserve(size);
}
MyString::MyString(const char * cstr)
:MyString()
{
mString.assign(cstr);
}
MyString::MyString()
:MyString((size_t)1024)
{
}
MyString::~MyString()
{
if (mBufferInitialized)
delete[] mBuffer;
}
char * MyString::GetBuffer()
{
if (!mBufferInitialized)
{
mBuffer = new char[mAllocSize]{ '\0' };
mBufferInitialized = true;
}
if (mString.length() > 0)
memcpy(mBuffer, mString.c_str(), mString.length());
return mBuffer;
}
void MyString::InvalidateBuffer()
{
if (mBufferInitialized && mBuffer && strlen(mBuffer) > 0)
{
mString.assign(mBuffer);
mBuffer[0] = '\0';
}
}
Sample usage (main.cpp)
#include "MyString.h"
#include <iostream>
void testSetChars(char * name)
{
if (!name)
return;
//This length is not known to us, but the maximum
//return length is known for each function.
char str[] = "random random name";
strcpy_s(name, strlen(str) + 1, str);
}
int main(int, char*)
{
MyString cs("test initializer");
cout << cs.GetAsConstChar() << '\n';
testSetChars(cs);
cout << cs.GetAsConstChar() << '\n';
getchar();
return 0;
}
Now, I plan to call the InvalidateBuffer() in almost all the methods before doing anything else. Now some of my questions are :
Is there a better way to do it in terms of memory/performance and/or safety, especially in C++ 11 (apart from the usual move constructor/assignment operators which I plan to add to it soon)?
I had initially implemented the 'buffer' using a std::vector of chars, which was easier to implement and more C++ like, but was concerned about performance. So the GetBuffer() method would just return the beginning pointer of the resized vector of . Do you think there are any major pros/cons of using a vector instead of char* here?
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
Is it too much overkill rather than just doing the usual way of passing char array pointer, converting to a std::string and doing our work with it. The third party function calls expecting char* arguments are used heavily in the code and I plan to completely replace both char* and std::string with this new string if it works.
Thank you for your patience and help!

If I understood you correctly, you want this to work:
mystring foo;
c_function(foo);
// use the filled foo
with a c_function like ...
void c_function(char * dest) {
strcpy(dest, "FOOOOO");
}
Instead, I propose this (ideone example):
template<std::size_t max>
struct string_filler {
char data[max+1];
std::string & destination;
string_filler(std::string & d) : destination(d) {
data[0] = '\0'; // paranoia
}
~string_filler() {
destination = data;
}
operator char *() {
return data;
}
};
and using it like:
std::string foo;
c_function(string_filler<80>{foo});
This way you provide a "normal" buffer to the C function with a maximum that you specify (which you should know either way ... otherwise calling the function would be unsafe). On destruction of the temporary (which, according to the standard, must happen after that expression with the function call) the string is copied (using std::string assignment operator) into a buffer managed by the std::string.
Addressing your questions:
Do you think there are any major pros/cons of using a vector instead of char* here?
Yes: Using a vector frees your from manual memory management. This is a huge pro.
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
A union is a bad idea. How do you know which member is currently active? You need a flag outside of the union. Do you really want every string to carry that around? Instead look what the standard library is doing: It's using templates to provide this abstraction.
Is it too much overkill [..]
Writing a string class? Yes, way too much.

What you want to do already exists. For example with this plain old C function:
/**
* Write n characters into buffer.
* n cann't be more than size
* Return number of written characters
*/
ssize_t fillString(char * buffer, ssize_t size);
Since C++11:
std::string str;
// Resize string to be sure to have memory
str.resize(80);
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
or without first resizing:
std::string str;
if (!str.empty()) // To avoid UB
{
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
}
But before C++11, std::string isn't guaranteed to be stored in a single chunk of contiguous memory. So you have to pass through a std::vector<char> before;
std::vector<char> v;
// Resize string to be sure to have memor
v.resize(80);
ssize_t newSize = fillSrting(&v[0], v.size());
std::string str(v.begin(), v.begin() + newSize);
You can use it easily with something like Daniel's proposition

C++, array of objects, customize where they are stored in memory

Currently I working on a existing project (DLL ) which I have to extend.
For the transport through the DLL I have a struct for example 'ExternEntry'
and a struct which passes a array of it.
struct ExternEntry
{
unsigned int MyInt;
const wchar_t* Text;
}
struct ExternEntries
{
const ExternEntry* Data;
const unsigned int Length;
ExternEntries(const ExternEntry* ptr, const unsigned int size)
: Data(ptr)
, Length(size);
{
}
}
In the existing project architecture, it will be the first time that a array is passed to the DLL callers. So the existing architecture doesn't allow arrays and if a struct is passed to a caller, normally there is a wrapper-struct for it (because of their str pointers).
Inside the DLL I need to wrap the ExternEntry so have a valid Text pointer.
struct InternEntry
{
ExternEntry Data;
std::wstring Text;
inline const ExternEntry* operator&() const { return& Data }
UpdateText() { Data.Text = Text.c_str(); }
}
struct InternEntries
{
std::vector<InternEntry> Data;
operator ExternEntries() const
{
return ExternEntries(Data.data()->operator&(), Data.size());
}
}
So the problem is, when the Caller received the ExternEntries and created a vector again:
auto container = DllFuncReturnInternEntries(); // returns ExternEntries
std::vector<ExternEntry> v(container.Data, container.Data + container.Length);
The first element is valid. All other elements are pointing to the wrong memory because in memory the InternEntry (with the wstring Text) is stored between the next InternEntry.
Maybe I'm wrong with the reason why this can't work.
[Data][std::wstring][Data][std::wstring][Data][std::wstring]
Caller knows just about the size of the [Data]
So the vector is doing the following:
[Data][std::wstring][Data][std::wstring][Data][std::wstring] 
  |       |       |
 Get     Get     Get
instead of
[Data][std::wstring][Data][std::wstring][Data][std::wstring]
  |                   |                   |
 Get                 Get                 Get
Do I have any possibilities to customize how the vector stores InternEntry objects in memory?
like Data,Data,Data ..anywhere else wstring,wstring,wstring
I hope I have explained my problem well

What's the most simple way to read and write data from a struct to and from a file in c++ without serialization library?

I am writing a program to that regularly stores and reads structs in the form below.
struct Node {
int leftChild = 0;
int rightChild = 0;
std::string value;
int count = 1;
int balanceFactor = 0;
};
How would I read and write nodes to a file? I would like to use the fstream class with seekg and seekp to do the serialization manually but I'm not sure how it works based off of the documentation and am struggling with finding decent examples.
[edit] specified that i do not want to use a serialization library.

This problem is known as serialization. Use a serializing library like e.g. Google's Protocol Buffers or Flatbuffers.

To serialize objects, you will need to stick to the concept that the object is writing its members to the stream and reading members from the stream. Also, member objects should write themselves to the stream (as well as read).
I implemented a scheme using three member functions, and a buffer:
void load_from_buffer(uint8_t * & buffer_pointer);
void store_to_buffer(uint8_t * & buffer_pointer) const;
unsigned int size_on_stream() const;
The size_on_stream would be called first in order to determine the buffer size for the object (or how much space it occupied in the buffer).
The load_from_buffer function loads the object's members from a buffer using the given pointer. The function also increments the pointer appropriately.
The store_to_buffer function stores the objects's members to a buffer using the given pointer. The function also increments the pointer appropriately.
This can be applied to POD types by using templates and template specializations.
These functions also allow you to pack the output into the buffer, and load from a packed format.
The reason for I/O to the buffer is so you can use the more efficient block stream methods, such as write and read.
Edit 1: Writing a node to a stream
The problem with writing or serializing a node (such a linked list or tree node) is that pointers don't translate to a file. There is no guarantee that the OS will place your program in the same memory location or give you the same area of memory each time.
You have two options: 1) Only store the data. 2) Convert the pointers to file offsets. Option 2) is very complicated as it may require repositioning the file pointer because file offsets may not be known ahead of time.
Also, be aware of variable length records like strings. You can't directly write a string object to a file. Unless you use a fixed string width, the string size will change. You will either need to prefix the string with the string length (preferred) or use some kind of terminating character, such as '\0'. The string length first is preferred because you don't have to search for the end of the string; you can use a block read to read in the text.

If you replace the std::string by a char buffer, you can use fwrite and fread to write/read your structure to and from disk as a fixed size block of information. Within a single program that should work ok.
The big bug-a-boo is the fact that compilers will insert padding between fields in order to keep the data aligned. That makes the code less portable as if a module is compiled with different alignment requirements the structure literally can be a different size, throwing your fixed size assumption out the door.
I would lean toward a well worn in serialization library of some sort.

Another approach would be to overload the operator<< and operator>> for the structure so that it knows how to save/load itself. That would reduce the problem to knowing where to read/write the node. In theory, your left and right child fields could be seek addresses to where the nodes actually reside, while a new field could hold the seek location of the current node.

When implementing your own serialization method, the first decision you'll have to make is whether you want the data on disk to be in binary format or textual format.
I find it easier to implement the ability to save to a binary format. The number of functions needed to implement that is small. You need to implement functions that can write the fundamental types, arrays of known size at compile time, dynamic arrays and strings. Everything else can be built on top of those.
Here's something very close to what I recently put into production code.
#include <cstring>
#include <fstream>
#include <cstddef>
#include <stdexcept>
// Class to write to a stream
struct Writer
{
std::ostream& out_;
Writer(std::ostream& out) : out_(out) {}
// Write the fundamental types
template <typename T>
void write(T number)
{
out_.write(reinterpret_cast<char const*>(&number), sizeof(number));
if (!out_ )
{
throw std::runtime_error("Unable to write a number");
}
}
// Write arrays whose size is known at compile time
template <typename T, uint64_t N>
void write(T (&array)[N])
{
for(uint64_t i = 0; i < N; ++i )
{
write(array[i]);
}
}
// Write dynamic arrays
template <typename T>
void write(T array[], uint64_t size)
{
write(size);
for(uint64_t i = 0; i < size; ++i )
{
write(array[i]);
}
}
// Write strings
void write(std::string const& str)
{
write(str.c_str(), str.size());
}
void write(char const* str)
{
write(str, std::strlen(str));
}
};
// Class to read from a stream
struct Reader
{
std::ifstream& in_;
Reader(std::ifstream& in) : in_(in) {}
template <typename T>
void read(T& number)
{
in_.read(reinterpret_cast<char*>(&number), sizeof(number));
if (!in_ )
{
throw std::runtime_error("Unable to read a number.");
}
}
template <typename T, uint64_t N>
void read(T (&array)[N])
{
for(uint64_t i = 0; i < N; ++i )
{
read(array[i]);
}
}
template <typename T>
void read(T*& array)
{
uint64_t size;
read(size);
array = new T[size];
for(uint64_t i = 0; i < size; ++i )
{
read(array[i]);
}
}
void read(std::string& str)
{
char* s;
read(s);
str = s;
delete [] s;
}
};
// Test the code.
#include <iostream>
void writeData(std::string const& file)
{
std::ofstream out(file);
Writer w(out);
w.write(10);
w.write(20.f);
w.write(200.456);
w.write("Test String");
}
void readData(std::string const& file)
{
std::ifstream in(file);
Reader r(in);
int i;
r.read(i);
std::cout << "i: " << i << std::endl;
float f;
r.read(f);
std::cout << "f: " << f << std::endl;
double d;
r.read(d);
std::cout << "d: " << d << std::endl;
std::string s;
r.read(s);
std::cout << "s: " << s << std::endl;
}
void testWriteAndRead(std::string const& file)
{
writeData(file);
readData(file);
}
int main()
{
testWriteAndRead("test.bin");
return 0;
}
Output:
i: 10
f: 20
d: 200.456
s: Test String
The ability to write and read a Node is very easily implemented.
void write(Writer& w, Node const& n)
{
w.write(n.leftChild);
w.write(n.rightChild);
w.write(n.value);
w.write(n.count);
w.write(n.balanceFactor);
}
void read(Reader& r, Node& n)
{
r.read(n.leftChild);
r.read(n.rightChild);
r.read(n.value);
r.read(n.count);
r.read(n.balanceFactor);
}

The process you are referring to are known as serialization. I'd recommend Cereal at http://uscilab.github.io/cereal/
It supports both json, xml and binary serialization and is very easy to use (with good examples).
(Unfortunately it does not support my favourite format yaml)

Define custom comparator for use with priority_queue

I want to implement a minHeap in c++ for char[] buffers and am facing some problems with the implementation. My declaration of the priority queue is as follows (I am not sure if this will give me a maxHeap or a minHeap):
priority_queue<char[], vector<char[]>, comparePacketContents> receiveBuffer;
where comparePacketContents is:
struct comparePacketContents {
bool operator()(char lhs[], char rhs[]) const {
return atoi(TcpPacket::getBytes(lhs, 0, SEQUENCE_SIZE)) < atoi(TcpPacket::getBytes(rhs, 0, SEQUENCE_SIZE));
}
};
and TcpPacket::getBytes is:
char* TcpPacket::getBytes(char* buf, int start, int size) {
char* ans = (char *) malloc(sizeof(char)*size);
for (int i = 0; i < size; i++) {
*(ans + i) = *(buf + start + i);
}
return ans;
}
Basically I intend to get the first SEQUENCE_SIZE characters of the received packet and then create a heap ordered upon the value of the sequence number.
However, when I try to push a packet into this heap using:
receiveBuffer.push(buf);
It gives me the following error:
no instance of overloaded function "std::priority_queue<_Ty, _Container, _Pr>::push [with _Ty=char [], _Container=std::vector<char [], std::allocator<char []>>, _Pr=comparePacketContents]" matches the argument list
argument types are: (char [2048])
object type is: std::priority_queue<char [], std::vector<char [], std::allocator<char []>>, comparePacketContents>
What should I do to resolve this error?

You can probably "fix" the compilation error by doing push(&buf) to push a pointer to the beginning of the array explicitly. Otherwise the compiler thinks you want to push the entire array, while the container holds pointers (char[] is like char*).
But probably that's not sufficient to fix all the problems you have, because it seems you are storing raw pointers to C-style strings without managing those allocations correctly. Instead, consider writing a class to hold your packets:
class Packet {
public:
Packet(const char* data); // takes ownership of data
uint32_t seqnum() const; // similar to existing implementation
// ...
private:
std::shared_ptr<char> m_data;
};
Packet::Packet(const char* data) : m_data(data, free) {
}
bool operator<(const Packet& lhs, const Packet& rhs) {
return lhs.seqnum() < rhs.seqnum();
}
priority_queue<Packet> receiveBuffer;
In my example I assume you release packet buffers using the C free() function, but you can use any "deleter" in the C++ shared_ptr constructor, including one you write yourself.

C++ Binary file reading

I'm trying to keep objects including vectors of objects in a binary file.
Here's a bit of the load from file code:
template <class T> void read(T* obj,std::ifstream * file) {
file->read((char*)(obj),sizeof(*obj));
file->seekg(int(file->tellg())+sizeof(*obj));
}
void read_db(DB* obj,std::ifstream * file) {
read<DB>(obj,file);
for(int index = 0;index < obj->Arrays.size();index++) {
std::cin.get(); //debugging
obj->Arrays[0].Name = "hi"; //debugging
std::cin.get(); //debugging
std::cout << obj->Arrays[0].Name;
read<DB_ARRAY>(&obj->Arrays[index],file);
for(int row_index = 0;row_index < obj->Arrays[index].Rows.size();row_index++) {
read<DB_ROW>(&obj->Arrays[index].Rows[row_index],file);
for(int int_index = 0;int_index < obj->Arrays[index].Rows[row_index].i_Values.size();int_index++) {
read<DB_VALUE<int>>(&obj->Arrays[index].Rows[row_index].i_Values[int_index],file);
}
}
}
}
And here's the DB/DB_ARRAY classes
class DB {
public:
std::string Name;
std::vector<DB_ARRAY> Arrays;
DB_ARRAY * operator[](std::string);
DB_ARRAY * Create(std::string);
};
class DB_ARRAY {
public:
DB* Parent;
std::string Name;
std::vector<DB_ROW> Rows;
DB_ROW * operator[](int);
DB_ROW * Create();
DB_ARRAY(DB*,std::string);
DB_ARRAY();
};
So now the first argument to the read_db function would have correct values, and the vector Arrays on the object has the correct size, However if I index any value of any object from obj->Arrays it's going to throw the access violation exception.
std::cout << obj->Arrays[0].Name; // error
std::cout << &obj->Arrays[0]; // no error
The later always prints the same address, so when I save an object casted to char* does it save the address of it too?

As various commenters pointed out, you cannot simply serialize a (non-POD) object by saving / restoring it's memory.
The usual way to implement serialization is to implement a serialization interface on the classes. Something like this:
struct ISerializable {
virtual std::ostream& save(std::ostream& os) const = 0;
virtual std::istream& load(std::istream& is) = 0;
};
You then implement this interface in your serializable classes, recursively calling save and load on any members referencing other serializable classes, and writing out any POD members. E.g.:
class DB_ARRAY : public ISerializable {
public:
DB* Parent;
std::string Name;
std::vector<DB_ROW> Rows;
DB_ROW * operator[](int);
DB_ROW * Create();
DB_ARRAY(DB*,std::string);
DB_ARRAY();
virtual std::ostream& save(std::ostream& os) const
{
// serialize out members
return os;
}
virtual std::istream& load(std::istream& is)
{
// unserialize members
return os;
}
};
As count0 pointed out, boost::serialization is also a great starting point.

What is the format of the binary data in the file? Until you specify
that, we can't tell you how to write it. Basically, you have to specify
a format for all of your data types (except char), then write the code
to write out that format, byte by byte (or generate it into a buffer);
and on the other side, to read it in byte by byte, and reconstruct it.
The C++ standard says nothing (or very little) about the size and
representation of the data types, except that sizeof(char) must be
1, and that unsigned char must be a pure binary representation over
all of the bits. And on the machines I have access today (Sun Sparc and
PC's), only the character types have a common representation. As for
the more complex types, the memory used in the value representation
might not even be contiguous: the bitwise representation of an
std::vector, for example, is usually three pointers, with the actual
values in the vector being found somewhere else entirely.
The functions istream::read and ostream::write are
designed for reading data into a buffer for manual parsing, and writing
a pre-formatted buffer. The fact that you need to use a
reinterpret_cast to use them otherwise should be a good indication
that it won't work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Writing/reading raw objects to file - c++

Related

Implementing a String class with implicit conversion to char* (C++)

C++, array of objects, customize where they are stored in memory

What's the most simple way to read and write data from a struct to and from a file in c++ without serialization library?

Define custom comparator for use with priority_queue

C++ Binary file reading

Categories

Resources