Access field value raw bytes from a constructed Flatbuffer - c++

Say I have following Flatbuffer definition compiled to C++.
table ParentObj { // this is the root table
timestamp:uint64;
child:ChildObj
}
table ChildObj {
...some fields...
}
I build a ParentObj (which includes the ChildObj) with Flatbuffer builder and send the final bytes over the network to another party.
Is there a way the receiver can access the raw bytes making up the ChildObj inside the received buffer? I can access individual fields in the child obj via the Flatbuffer generated C++ code interface. But can I get the buffer offset and length of the bytes making up the entire ChildObj object? I need this to generate a cryptographic signature of the ChildObj bytes.

No, because ChildObj is not necessarily contiguous in the buffer. It refers to a vtable (which may or may not be shared), and any sub-string/vector/table is an offset that may point to a non-adjacent part of the buffer.
Typically, you use nested_flatbuffer to store children that need to be treated as their own isolated buffer further down the line: child:[ubyte] (nested_flatbuffer: ChildObj).
However, getting a cryptographic signature from this is still a bad idea, since depending on how it was serialized (which implementation) the bytes may differ subtly, due to difference in alignment and ordering of fields/objects. To reliably get a hash out of this, you'd need to only has the actual data bytes, and in a fixed field order.

Related

Traversing byte string through uint16_t pointer

I have a list of uint16_t's that has been packed into a protobuf message that looks like:
bytes values = 1;
The generated stubs for this message in C allows me to set the field with some code like:
protobufMessage.set_values(uint16ptr, sizeof(uint16_t) * amount);
In the above example, uint16ptr is a uint16_t* to the start of the value list and amount is the number of elements in that list.
Now, since I know the message is in the field and I want to be as efficient as possible, I don't want to memcpy as that I want to somehow directly access that memory and I don't want to iterate through the values one by one as cases with a large amount value would be slow. So I tried something like:
uint16_t *ptr = (uint16_t*) some_string.c_str();
This works "fine", however I don't get the same values I originally packed in. I think it might be because I am not traversing the data correctly. How should I do this properly?
Protobuf encodes your message so you can't simply read the values back from a string. But a "repeated" uint16_t should be a big blob somewhere in the message. If you knew the offset you could access the data there.
But that is still UB since the uint16_t in the protobuf message are not aligned. So on some CPUs this will be just slow, on others it will trap. The only safe way is to memcpy the data. Use the function provided by protobuf to extract the uint16_t's.
Note: Strictly speaking it's even worse since there are no objects of type uint16_t living in the message so any access would be UB. But it being a fundemantal type makes it ok-ish, if it weren't for the alignment.

Why Serialization when a class object in memory is already binary (C/C++)?

My guess is that data is scattered in physical memory (even the data of a class object is sequential in virtual memory), so in order to send the data correctly it needs to be reassembled, and to be able to send over the network, one additional step is the transformation of host byte order to network byte order. Is it correct?
Proper serialization can be used to send data to arbitrary systems, that might not work under the same architecture as the source host.
Even an object that only consist of native types can be troublesome sharing between two systems because of the extra padding that might exists in between and after members, among other things. Sharing raw memory dumps of objects between programs compiled for the same architecture but with different compiler versions can also turn into a big hassle. There is no guarantee how variable type T actually is stored in memory.
If you are not working with pointers (references included), and the data is meant to be read by the same binary as it's dumped from, it's usually safe just to dump a raw struct to disk, but when sending data to another host.. drum roll serialization is the way to go.
I've heard developers talking about ntohl / htonl / ntohl / ntohs as methods of serializing/deserializing integers, and when you think about it saying that isn't that far from the truth.
The word "serialization" is often used to describe this "complicated method of storing data in a generic way", but then again; your first programming assignment where you were asked to save information about Dogs to file (hopefully*) made use of serialization, in some way or another.
* "hopefully" meaning that you didn't dump the raw memory representation of your Dog object to disk
Pointers!
If you've allocated memory on the heap you'll just end up with a serialised pointer pointing to an arbitrary area of memory. If you just have a few ints and chars then yes you can just write it out directly to a file, but that then becomes platform dependent because of the byte ordering that you mentioned.
Pointer and data pack(data align)
If you memcpy your object's memory, there is dangerous to copy a wild pointer value instead of it's data. There is another risk, if the sender and receiver have different data pack(data align) method, you will get rubbish after decoding.
Binary representations may be different between different architectures, compilers and even different versions of the same compiler. There's no guarantee that what system A sees as a signed integer will be seen as the same on system B. Byte ordering, word langths, struct padding etc will become hard to debug problems if you don't properly define the protocol or file format for exchanging the data.
Class (when we speak of C++) also includes virtual method pointers - and they must be reconstructed on receiving end.

Memory padding issue

I am working on a sample application in this application I am serializing some of the data. In client application I am reading the serialized data back. While doing this I observed some strange behavior.
In sample application size of object is different from size of data in client. I think this is because of memory padding. My problem is I am trying to write “BRUSHOBJ” to file. This structure is defined by Microsoft. I can change the declaration of this structure. Please let me know how to solve this problem.
Please let me know how to apply memory padding on slandered data type.
It sounds like you're trying to just cast the address of a struct to
char*, and use ostream::write on it. This simply doesn't work.
There's padding, but there's also the size of different types (which
varies from one platform to the next), byte order, and on some more
exotic platforms (including most mainframes) data representation itself.
Generally, you need a specification of what the output data should look
like, byte by byte, and you have to then write each byte with the
required value.
And this is just for simple types. A quick glance at BRUSHOBJ shows
that it contains a pointer, which you'll probably have to
follow—you'll certainly have to do something with it, since the
receiving end won't be able to do anything with a pointer into your
data. (I suspect, given the description, that you'll have to convert it
into some sort of identifier, and also transmit a dictionary mapping
such identifiers to objects. But I don't know enough about how this
structure is used to be sure.)
you have 2 options
serializing data
modify memory padding via #pragma pack
Serializing data has no relation with memory padding, you are just defining a way to write/ read back memory to/from a memory location (the memory stream).
I see the that _BRUSHOBJ struct has the following definition,
typedef struct _BRUSHOBJ {
ULONG iSolidColor;
PVOID pvRbrush;
FLONG flColorType;
} BRUSHOBJ;
please note that sending a pointer across process is nonsens. serializing a pointer should be done by writing the size of memory and the the memory itself. Anyway if you want to pass this BRUSHOBJ to a windows function you can get undefined behavior. It's not a supported/documented way of passing a BRUSHOBJ across process.
memory padding can by applied like this
#pragma pack(push)
#pragma pack(4)
struct myStruct
{
char Char1
int Int1;
};
#pragma pack(pop)
If you what to modify padding you should doit for a structure that is written by you.

However convert a memory to a byte array?

Now I have a database, which one field type is an array of byte.
Now I have a piece of memory, or an object. How to convert this piece of memory or even an object to a byte array and so that I can store the byte array to the database.
Suppose the object is
Foo foo
The memory is
buf (actually, don't know how to declare it yet)
The database field is
byte data[256]
Only hex value like x'1' can be insert into the field.
Thanks so much!
There are two methods.
One is simple but has serious limitations. You can write the memory image of the Foo object. The drawback is that if you ever change the compiler or the structure of Foo then all your data may no longer loadable (because the image no longer matches the object). To do this simply use
&Foo
as the byte array.
The other method is called 'serialization'. It can be used if the object changes
but adds a lot of space to encode the information. If you only have 256 bytes then you
need to be watchful serialization doesn't create a string too large to save.
Boost has a serialization library you may want to look at, though you'll need to careful about the size of the objects created. If you're only doing this with a small set of classes, you may want to write the marshalling and unmarshalling functions yourself.
From the documentation:
"Here, we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. "

Read data with varying formats in C++

I'm creating my first real binary parser (a tiff reader) and have a question regarding how to allocate memory. I want to create a struct within my TiffSpec class for the IFD entries. These entries will always be 12 bytes, but depending upon the type specified in that particular entry, the values at the end could be of different types (or maybe just an address to another location in the file). What would be the best way to go about casting this sort of data? The smallest size memory I believe I would be dealing with would be 1 byte.
In C++ you should use a union.
This is a mechanism by which you can define several, overlapping data types, possibly with a common header.
See this article for how to use unions for exactly your problem -- a common header with different data underneath.