Traversing byte string through uint16_t pointer - c++

I have a list of uint16_t's that has been packed into a protobuf message that looks like:
bytes values = 1;
The generated stubs for this message in C allows me to set the field with some code like:
protobufMessage.set_values(uint16ptr, sizeof(uint16_t) * amount);
In the above example, uint16ptr is a uint16_t* to the start of the value list and amount is the number of elements in that list.
Now, since I know the message is in the field and I want to be as efficient as possible, I don't want to memcpy as that I want to somehow directly access that memory and I don't want to iterate through the values one by one as cases with a large amount value would be slow. So I tried something like:
uint16_t *ptr = (uint16_t*) some_string.c_str();
This works "fine", however I don't get the same values I originally packed in. I think it might be because I am not traversing the data correctly. How should I do this properly?

Protobuf encodes your message so you can't simply read the values back from a string. But a "repeated" uint16_t should be a big blob somewhere in the message. If you knew the offset you could access the data there.
But that is still UB since the uint16_t in the protobuf message are not aligned. So on some CPUs this will be just slow, on others it will trap. The only safe way is to memcpy the data. Use the function provided by protobuf to extract the uint16_t's.
Note: Strictly speaking it's even worse since there are no objects of type uint16_t living in the message so any access would be UB. But it being a fundemantal type makes it ok-ish, if it weren't for the alignment.

Related

Access field value raw bytes from a constructed Flatbuffer

Say I have following Flatbuffer definition compiled to C++.
table ParentObj { // this is the root table
timestamp:uint64;
child:ChildObj
}
table ChildObj {
...some fields...
}
I build a ParentObj (which includes the ChildObj) with Flatbuffer builder and send the final bytes over the network to another party.
Is there a way the receiver can access the raw bytes making up the ChildObj inside the received buffer? I can access individual fields in the child obj via the Flatbuffer generated C++ code interface. But can I get the buffer offset and length of the bytes making up the entire ChildObj object? I need this to generate a cryptographic signature of the ChildObj bytes.
No, because ChildObj is not necessarily contiguous in the buffer. It refers to a vtable (which may or may not be shared), and any sub-string/vector/table is an offset that may point to a non-adjacent part of the buffer.
Typically, you use nested_flatbuffer to store children that need to be treated as their own isolated buffer further down the line: child:[ubyte] (nested_flatbuffer: ChildObj).
However, getting a cryptographic signature from this is still a bad idea, since depending on how it was serialized (which implementation) the bytes may differ subtly, due to difference in alignment and ordering of fields/objects. To reliably get a hash out of this, you'd need to only has the actual data bytes, and in a fixed field order.

Sending a structure trough send

Before anything, I want to say the title is not the question.
My question is why can't you just send all the bytes of the structure and then cast it into that structure (giving you have the structure defined in both sides, which makes sense you have).
Thank you!
EDIT: Here's my current structure:
struct COMPUTER_INFO
{
const char* Name;
int Brightness;
int Volume;
}
I was thinking that it you can easily calculate the size of all that and then send it trough send().
Name is a pointer (contains an address) that only makes sense to your program on your computer. If you sent this structure as bytes the receiving program would receive just the address not the characters that comprise the name. The received address would also not point to a valid location in memory in the receiving computer.
Brightness and Volume are ints - ints do not have a fixed size they are the "natural" word size of the computer (the standard does impose some restrictions). So the sizeof(int) on the sending and receiving computer may be different. There may also be encoding differences e.g. big vs little endian. See: https://en.wikipedia.org/wiki/Endianness
in general you are right. You can, of course, send the raw byte streams. But how do you want the receiver to recognize the positions of the members of your structure in this byte stream? Especially, the char* in your example is of variable size, so this will not be possible.
I recommend using the Boost serialization package. You can find a detailed tutorial here. The package will take care of "serializing" (packaging your object into a byte stream) and "deserializing" (constructing your object from the byte stream). It is absolutely convenient for nearly all STL containers and can easily be expanded for custom types.

Memcpy and Structures

I'm working to copy the following structure to a byte array to send over a named pipe. I've found that since switching from a byte array that I had given a static definition, to a vector because my host length will be of varying lengths.
Here is the outline of my structure:
USHORT version; // Header Version
USHORT type; // IPVersion
USHORT count; // Number of IP addresses of remote system
USHORT length; // Header Length (1)
BYTE SysConfigLocIP[4];
BYTE SysConfigRemoteIP[4];
USHORT lengthHost;
std::vector<BYTE>HostName;
later, after filling the structure I copy it to a byte like so:
BYTE Response[sizeof(aMsg)]
memcpy(response, &aMsg, sizeof(aMsg))
I find that my array is vector is holding the correct information for the host when I inspect the container during a debug. However, after the copy to the Response byte array, I'm finding the data that has been copied is drastically different. Is this a valid operation, if so, what can I do correctly copy the data from my vector the BYTE array. If not, what are other strategies I can use to dynamically size the structure to send the hostnames? Thank you for taking the moment of time to read my question, and I appreciate any feedback.
I'm working to copy the following structure to a byte array to send
over a named pipe.
named pipe (or other forms of inter-process or inter-processor communication) does not understand your struct, neither do they understand vector. They just operate on the concept of byte-in-byte-out. It is up to you, the programmer, to assign meaning to those bytes.
As suggested, please read on serialization. Try starting at http://en.wikipedia.org/wiki/Serialization. If permitted you can use the Boost solution, http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html, but I would still encourage you to understand the basics first.
As an exercise, first try transferring a vector<int> from sender to receiver. The number of elements in the vector must not be implicitly known by the receiver. Once you achieve that, migrating from int to your struct would be trivial.
That memcpy will only work for POD (plain old data) types. A vector is not POD. Instead, write code to put each byte in the buffer exactly where it needs to be. Don't rely on "magic".
99% of the time in C++ there is no reason to use memcpy. It breaks classes. Learn about copy constructors and std::copy and use them instead.

Sending variable length arrays over a network?

In the game I'm making, I nee to be able to send std::vectors of integer over a network.
A packet seems to be made up entirely of a string. Since enet, the network libray im using takes care of endian, my first idea on solving this is to send a messsage where the first byte is the message id, as usual, the next 4 bytes would be an integer indicating the length of the array, and all subsequent bytes would be the ints in the array. On the client side I can then push these back into a vector.
Is this how it is usually done or am I missing something critical? Is there a better way to do it?
Thanks
In general, there are two approaches to solving this problem which can be combined. One is to put the length of the array before the actual array. The other involves including some framing to mark the end (or beginning) of your message. There are advantages and disadvantages to each.
Putting the length before the array is simplest. However, if there should ever be a bug where the length does not match the number of integers, there is no way to detect this or recover from it.
Using a framing byte(s) to mark the end of the message has the advantage of more robustness and the ability to recover from an improperly formatted message at the cost of complexity. The complexity comes in the fact that if your framing bytes appear in your array of integers, you must escape the bytes (i.e. prepend and escape character). Also, your code to read messages from the network becomes a little more complicated.
Of course, this all assumes that you are dealing stream and not a datagram. If your messages are clearly packetized already, then the length should be enough.
There are two ways to send variable length data on a stream: by prefixing the data with the length, or by suffixing it with a delimiter.
A suffix can have a theoretically infinite size, but it means the delimiter must not appear in the data. This approach can be used for strings, with a NUL ('\0') character as the delimiter.
When dealing with binary data, then you don't have any choice but to prefix the length. The size of the data will be limited to the the size of the prefix, which is rarely a problem with a 4 byte prefix (because otherwise it means you're sending more than 4 gigabytes of data).
So, it all depends on the data being sent.
Appending that header information to your packet is a good approach to take. Another option you can do on the receive side, if the data is stored in some unsigned char* buffer of memory, is create a structure like so:
typedef struct network_packet
{
char id;
int message_size;
int data[];
} __attribute__((packed)) network_packet;
You can then simply "overlay" this structure on-top of your received buffer like so:
unsigned char* buffer;
//...fill the buffer and make sure it's the right size -- endian is taken care of
//via library
network_packet* packet_ptr = (network_packet*)buffer;
//access the 10th integer in the packet if the packet_ptr->message_size
//is long enough
if (packet_ptr->message_size >= 10)
int tenth_int = packet_ptr->data[9];
This will avoid you having to go through the expense of copying all the data back, which already exists in a buffer, back into another std::vector on the receive side.

Memory padding issue

I am working on a sample application in this application I am serializing some of the data. In client application I am reading the serialized data back. While doing this I observed some strange behavior.
In sample application size of object is different from size of data in client. I think this is because of memory padding. My problem is I am trying to write “BRUSHOBJ” to file. This structure is defined by Microsoft. I can change the declaration of this structure. Please let me know how to solve this problem.
Please let me know how to apply memory padding on slandered data type.
It sounds like you're trying to just cast the address of a struct to
char*, and use ostream::write on it. This simply doesn't work.
There's padding, but there's also the size of different types (which
varies from one platform to the next), byte order, and on some more
exotic platforms (including most mainframes) data representation itself.
Generally, you need a specification of what the output data should look
like, byte by byte, and you have to then write each byte with the
required value.
And this is just for simple types. A quick glance at BRUSHOBJ shows
that it contains a pointer, which you'll probably have to
follow—you'll certainly have to do something with it, since the
receiving end won't be able to do anything with a pointer into your
data. (I suspect, given the description, that you'll have to convert it
into some sort of identifier, and also transmit a dictionary mapping
such identifiers to objects. But I don't know enough about how this
structure is used to be sure.)
you have 2 options
serializing data
modify memory padding via #pragma pack
Serializing data has no relation with memory padding, you are just defining a way to write/ read back memory to/from a memory location (the memory stream).
I see the that _BRUSHOBJ struct has the following definition,
typedef struct _BRUSHOBJ {
ULONG iSolidColor;
PVOID pvRbrush;
FLONG flColorType;
} BRUSHOBJ;
please note that sending a pointer across process is nonsens. serializing a pointer should be done by writing the size of memory and the the memory itself. Anyway if you want to pass this BRUSHOBJ to a windows function you can get undefined behavior. It's not a supported/documented way of passing a BRUSHOBJ across process.
memory padding can by applied like this
#pragma pack(push)
#pragma pack(4)
struct myStruct
{
char Char1
int Int1;
};
#pragma pack(pop)
If you what to modify padding you should doit for a structure that is written by you.