I would like to write a struct to a file as binary. The struct has two members, one is POD-only but the problem is the second member is a string:
struct ToWrite
{
std::string _str;
PODType _pod;
};
If I was simply writing the POD type as binary I would do:
file.write((char*)&_pod, sizeof(_pod));
and to read back:
const PODType& pod = *reinterpret_cast<const PODType*>(&bytes[position]);
However, I'm aware the string is more-complicated because you need to record the size. If I were to add a third class member, being an int containing the size of the string, how do you write and read the struct?
You need to do three things:
Define a storage format for the structure as a stream of bytes.
Write code to convert the structure into an array of bytes in the format you defined in step 1.
Write code to parse the array of bytes you defined in step 1 and fill in the structure.
If you want to find more information, the best search keyword to use is probably "serialization". There are lots of serialization libraries that you can use to save you from having to go through this complexity every time you need to serialize/deserialize a data structure. I personally like protocol buffers and boost serialization, but there are lots of options.
Related
The Beast websocket example stores the data in a multibuffer:
The implementation uses a sequence of one or more character arrays
of varying sizes. Additional character array objects are appended to
the sequence to accommodate changes in the size of the character
sequence.
When looking at the interface it is not completely clear to me how it works. If I read the descriptions it can be seen as an array of buffers. But it seems the output is only a single chunk of data. Does this mean the "one or more arrays" are only applicable to the internal structure ?
In the example code the data is read into the buffer as follows: m_websocketStream.async_read(m_buffer.....
Does each async_read operation creates a new internal buffer.
If this is the case, how to interpret it at the other end. E.G. how to read it into a std::string or std::vector.
When looking into the sources data() returns const_buffer_type, which is a forward declaration.
For the data member the help information provides the following info, which is not of much help:
The type used to represent the input sequence as a list of buffers.
using const_buffers_type = implementation_defined;
The definition seems to come from the header file boost/asio/buffer.hpp which is included as well. The overall structure however is somewhat obfuscating to me.
I just try to understand how to handle the data as bytes or convert it to as std::string.
Tried the following, but this is also not allowed:
std::string( boost::asio::buffer_cast<const char*>(m_buffer.data())
,boost::asio::buffer_size(m_buffer.data()) );
Anyone who can enlighten me a little ?
data() returns an object meeting the requirements of ConstBufferSequence (http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/reference/ConstBufferSequence.html). prepare() returns an object meeting the requirements of MutableBufferSequence (http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/reference/MutableBufferSequence.html)
All of the dynamic buffers in beast meet the requirements of DynamicBuffer, described in http://www.boost.org/doc/libs/develop/libs/beast/doc/html/beast/concepts/DynamicBuffer.html
If you want to convert a buffer sequence into a string you need to loop over each element and append it to the string individually. Such a function might look like this:
template<class ConstBufferSequence>
std::string
to_string(ConstBufferSequence const& buffers)
{
std::string s;
s.reserve(boost::asio::buffer_size(buffers));
for(boost::asio::const_buffer b : buffers)
s.append(boost::asio::buffer_cast<char const*>(b),
boost::asio::buffer_size(b));
return s;
}
Alternatively, if you want to avoid the buffer copy you can use something like beast::flat_buffer which guarantees that all the buffer sequences will have length one. Something like this:
inline
std::string
to_string(beast::flat_buffer const& buffer)
{
return std::string(boost::asio::buffer_cast<char const*>(
beast::buffers_front(buffer.data())),
boost::asio::buffer_size(buffer.data()));
}
For more information on buffers, see http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/overview/core/buffers.html
In the latest versions of Beast, there is now the function buffers_to_string which will do this for you in a single function call.
I'm working to copy the following structure to a byte array to send over a named pipe. I've found that since switching from a byte array that I had given a static definition, to a vector because my host length will be of varying lengths.
Here is the outline of my structure:
USHORT version; // Header Version
USHORT type; // IPVersion
USHORT count; // Number of IP addresses of remote system
USHORT length; // Header Length (1)
BYTE SysConfigLocIP[4];
BYTE SysConfigRemoteIP[4];
USHORT lengthHost;
std::vector<BYTE>HostName;
later, after filling the structure I copy it to a byte like so:
BYTE Response[sizeof(aMsg)]
memcpy(response, &aMsg, sizeof(aMsg))
I find that my array is vector is holding the correct information for the host when I inspect the container during a debug. However, after the copy to the Response byte array, I'm finding the data that has been copied is drastically different. Is this a valid operation, if so, what can I do correctly copy the data from my vector the BYTE array. If not, what are other strategies I can use to dynamically size the structure to send the hostnames? Thank you for taking the moment of time to read my question, and I appreciate any feedback.
I'm working to copy the following structure to a byte array to send
over a named pipe.
named pipe (or other forms of inter-process or inter-processor communication) does not understand your struct, neither do they understand vector. They just operate on the concept of byte-in-byte-out. It is up to you, the programmer, to assign meaning to those bytes.
As suggested, please read on serialization. Try starting at http://en.wikipedia.org/wiki/Serialization. If permitted you can use the Boost solution, http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html, but I would still encourage you to understand the basics first.
As an exercise, first try transferring a vector<int> from sender to receiver. The number of elements in the vector must not be implicitly known by the receiver. Once you achieve that, migrating from int to your struct would be trivial.
That memcpy will only work for POD (plain old data) types. A vector is not POD. Instead, write code to put each byte in the buffer exactly where it needs to be. Don't rely on "magic".
99% of the time in C++ there is no reason to use memcpy. It breaks classes. Learn about copy constructors and std::copy and use them instead.
I have a structure, that contain string. Something like that:
struct Chunk {
int a;
string b;
int c;
};
So, i suppose, that i cannot write and read this structure from file using fread and fwrite functions. Because string may reserve different memory capacity.
But such code works correctly.
Chunk var;
fwrite(&var, sizeof(Chunk), 1, file);
fread(&var, sizeof(Chunk), 1, file);
Is there really some problems in it?
You are justified in doubting this. You should only stream POD types with fwrite and fread and string is not POD.
You shouldn't do it like this, because different implementations use different structures of std::string.
In general you should only serialize integral types, the boolean type, binary data (if you can call it serializing). Make sure to use one endian-ness if you are thinking of sharing serialized data between platforms.
Watch out with floats, doubles and pointers. They can become very pesky.
You'll have to watch out with C/C++ structs too ebcause they can contain unpredictable amounts of padding.
You should serialize data.
You might like to do this manually - when it comes about std::string , check out:
const charT* std::string::c_str() const
const charT* std::string::data() const
When it comes about more complex objects, you might be interested in things like Google Protocol Buffers and/or Thrift.
Now I have a database, which one field type is an array of byte.
Now I have a piece of memory, or an object. How to convert this piece of memory or even an object to a byte array and so that I can store the byte array to the database.
Suppose the object is
Foo foo
The memory is
buf (actually, don't know how to declare it yet)
The database field is
byte data[256]
Only hex value like x'1' can be insert into the field.
Thanks so much!
There are two methods.
One is simple but has serious limitations. You can write the memory image of the Foo object. The drawback is that if you ever change the compiler or the structure of Foo then all your data may no longer loadable (because the image no longer matches the object). To do this simply use
&Foo
as the byte array.
The other method is called 'serialization'. It can be used if the object changes
but adds a lot of space to encode the information. If you only have 256 bytes then you
need to be watchful serialization doesn't create a string too large to save.
Boost has a serialization library you may want to look at, though you'll need to careful about the size of the objects created. If you're only doing this with a small set of classes, you may want to write the marshalling and unmarshalling functions yourself.
From the documentation:
"Here, we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. "
I'm creating my first real binary parser (a tiff reader) and have a question regarding how to allocate memory. I want to create a struct within my TiffSpec class for the IFD entries. These entries will always be 12 bytes, but depending upon the type specified in that particular entry, the values at the end could be of different types (or maybe just an address to another location in the file). What would be the best way to go about casting this sort of data? The smallest size memory I believe I would be dealing with would be 1 byte.
In C++ you should use a union.
This is a mechanism by which you can define several, overlapping data types, possibly with a common header.
See this article for how to use unions for exactly your problem -- a common header with different data underneath.