I am working on a sample application in this application I am serializing some of the data. In client application I am reading the serialized data back. While doing this I observed some strange behavior.
In sample application size of object is different from size of data in client. I think this is because of memory padding. My problem is I am trying to write “BRUSHOBJ” to file. This structure is defined by Microsoft. I can change the declaration of this structure. Please let me know how to solve this problem.
Please let me know how to apply memory padding on slandered data type.
It sounds like you're trying to just cast the address of a struct to
char*, and use ostream::write on it. This simply doesn't work.
There's padding, but there's also the size of different types (which
varies from one platform to the next), byte order, and on some more
exotic platforms (including most mainframes) data representation itself.
Generally, you need a specification of what the output data should look
like, byte by byte, and you have to then write each byte with the
required value.
And this is just for simple types. A quick glance at BRUSHOBJ shows
that it contains a pointer, which you'll probably have to
follow—you'll certainly have to do something with it, since the
receiving end won't be able to do anything with a pointer into your
data. (I suspect, given the description, that you'll have to convert it
into some sort of identifier, and also transmit a dictionary mapping
such identifiers to objects. But I don't know enough about how this
structure is used to be sure.)
you have 2 options
serializing data
modify memory padding via #pragma pack
Serializing data has no relation with memory padding, you are just defining a way to write/ read back memory to/from a memory location (the memory stream).
I see the that _BRUSHOBJ struct has the following definition,
typedef struct _BRUSHOBJ {
ULONG iSolidColor;
PVOID pvRbrush;
FLONG flColorType;
} BRUSHOBJ;
please note that sending a pointer across process is nonsens. serializing a pointer should be done by writing the size of memory and the the memory itself. Anyway if you want to pass this BRUSHOBJ to a windows function you can get undefined behavior. It's not a supported/documented way of passing a BRUSHOBJ across process.
memory padding can by applied like this
#pragma pack(push)
#pragma pack(4)
struct myStruct
{
char Char1
int Int1;
};
#pragma pack(pop)
If you what to modify padding you should doit for a structure that is written by you.
Related
Before anything, I want to say the title is not the question.
My question is why can't you just send all the bytes of the structure and then cast it into that structure (giving you have the structure defined in both sides, which makes sense you have).
Thank you!
EDIT: Here's my current structure:
struct COMPUTER_INFO
{
const char* Name;
int Brightness;
int Volume;
}
I was thinking that it you can easily calculate the size of all that and then send it trough send().
Name is a pointer (contains an address) that only makes sense to your program on your computer. If you sent this structure as bytes the receiving program would receive just the address not the characters that comprise the name. The received address would also not point to a valid location in memory in the receiving computer.
Brightness and Volume are ints - ints do not have a fixed size they are the "natural" word size of the computer (the standard does impose some restrictions). So the sizeof(int) on the sending and receiving computer may be different. There may also be encoding differences e.g. big vs little endian. See: https://en.wikipedia.org/wiki/Endianness
in general you are right. You can, of course, send the raw byte streams. But how do you want the receiver to recognize the positions of the members of your structure in this byte stream? Especially, the char* in your example is of variable size, so this will not be possible.
I recommend using the Boost serialization package. You can find a detailed tutorial here. The package will take care of "serializing" (packaging your object into a byte stream) and "deserializing" (constructing your object from the byte stream). It is absolutely convenient for nearly all STL containers and can easily be expanded for custom types.
I'm writing a logger in C++, and I've come to the part where I'd like to take a log record and write in to a file.
I have created a LogRecord struct, and would like to serialize it and write it to a file in binary mode.
I have read some posts about serialization in C++, and one of the answers included this following snippet:
reinterpret_cast<char*>(&logRec)
I've tried reading about reinterpret_cast and what it does, but I couldn't fully understand what's really happening in the background.
From what I understand, it takes a pointer to my struct, and turns it into a pointer to a char, so it thinks that the chunk of memory that holds my struct is actually a string, is that true? How can that work?
A memory address is just a memory address. Memory isn't inherently special - it's just a huge array of bytes, for all we care. What gives memory its meaning is what we do with it, and the lenses through which we view it.
A pointer to a struct is just an integer that specifies some offset into memory - surely you can treat one integer in any way you want, in your case, as a pointer to some arbitrary number of bytes (chars).
reinterpret_cast() doesn't do anything special except allow you to convert one view of a memory address into another view of a memory address. It's still up to you to treat that memory address correctly.
For instance, char* is the conventional way to refer to a string of characters in C++ - but the type char* literally means "a pointer to a single char". How does it come to mean a pointer to a null-terminated string of characters? By convention, that's how. We treat the type differently depending on the context, but it's up to us to make sure we do so correctly.
For instance, how do you know how many bytes to read through your char* pointer to your struct? The type itself gives you zero information - it's up to you to know that you've really got a byte-oriented pointer to a struct of fixed length.
Remember, under the hood, the machine has no types. A piece of paper doesn't care if you write an essay on each line, or if you scribble all over the thing. It's how we treat it - and how the tools we use (C++) treat it.
Binary-wise, it does nothing at all. This casting is a higher-level concept that has no bearing in any actual machine instructions.
At a low level, a pointer is just a numeric value that holds a memory address. There is nothing to be done in telling the compiler "although you thought the destination memory contained a struct, now please think that it contains a char". The actual address itself doesn't change in any way.
From what I understand, it takes a pointer to my struct, and turns it into a pointer to a char, so it thinks that the chunk of memory that holds my struct is actually a string, is that true?
Yes.
How can that work?
A string is just a sequence of bytes, and your object is just a sequence of bytes, so that's how it works.
But it won't if your object is logically more than just a sequence of bytes. Any indirection, and you're hosed. Furthermore, any implementation-defined padding or representation/endianness and your data is non-portable. This might be acceptable; it really depends on your requirements.
Casting a struct into an array of bytes (chars) is a classic low impact method of binary serialization. This is based on the assumption that the content of the struct exists contiguously in memory. The casting allows us write this data to a file or socket using the normal APIs.
This only works though if the data is contiguous. This is true for C style structs or PODs in C++ terminology. It will not work with complex C++ objects or any struct with pointers to storage outside the struct. For text data you will need to use fixed size character arrays.
struct {
int num;
char name[50];
};
will serialize correctly.
struct {
int num;
char* name;
};
will not serialize correctly since the data for the string is stored outside the struct;
If you are sending data across a nework you will also need to ensure that the struct is packed or at least of known alignment and that integers are converted to a consistent endianness (network byte order is normally big endian)
Suppose I have the following struct:
typedef struct {
int mID;
struct in_addr mIP;
size_t dataSize;
// Another structure
fairness_structure str;
bool ack;
bool stability;
bool stop_message;
}HeaderType;
As you know, the size of a struct would vary due to its alignment. How to fill in the padding between fields with some data, say with zeros?
Just initialize the structure with memset, and the padding will be filled as well.
memset(&mystruct, 0, sizeof(HeaderType));
If you want to really only fill the pads, you can can cast the pointer to char* and do the arithmetics. But in this case you MUST know how the compiler padded the structure, or enforce it yourself with #pragma pack.
You can use offsetof() macro to get the offset of struct members.
char *off = (char *)&mystruct + offsetof(HeaderType, ack);
char *pad_start = off + sizeof(mystruct.ack);
char *pad_end = (char *)&mystruct + offsetof(HeaderType, stability);
Bedtime reading: The Lost Art of C Structure Packing
Controlling the contents of padding bits and bytes does not seem very useful. But if you write the contents of a structure to a file with a single write or fwrite call, You probably care about the padding and may want to make sure they have consistent values, preferably 0, at all times. Not that is matters when you read the contents back from the file, but in order for the file contents to be predictable and reproducible. Some development tools are known to produce unpredictable contents in object or executable files exactly for this reason, making it very difficult to rebuild from source and check signatures.
So if you really need this, you want a simple and portable method.
The bad news is the C Standard does not have a generic solution for this.
The only guaranty about the contents of padding bytes and bits the standard makes is for uninitialized structures of static storage. Padding is guarantied to be zero in this case (in a hosted environment). In practice, this is also true of initialized structures because it is simple enough for compiler writers to do so.
What about local structures with automatic storage? If they are not initialized, both fields and padding contents are indeterminate. If you just clear the bytes with a memset(&s, 0, sizeof(s)) the padding will be cleared and you can start modifying struct members... Bad news again: the C standard describes as Unspecified behaviour The value of padding bytes when storing values in structures or unions (6.2.6.1).
In other words, storing values in structure members can have side effects on the contents of padding bits and bytes. The compiler is allowed to generate code that does that and it may be more efficient to do so.
The method described by Marek beyond the simple memset is very cumbersome to use, especially if you have bitfields. In practice, clearing the structures before you initialize the fields manually seems the simplest way to achieve the purpose, and I have not seen a compilers that takes advantage of the Standard's leniency concerning the padding bytes. If you pass the structures by value, all bets are off as the compiler may generate code that does not copy the padding.
As a conclusion: if you use local structures, clear them with memset before use and do not pass them by value. There is no guaranty padding will keep a 0 value, but that's the best you can do.
I'm working to copy the following structure to a byte array to send over a named pipe. I've found that since switching from a byte array that I had given a static definition, to a vector because my host length will be of varying lengths.
Here is the outline of my structure:
USHORT version; // Header Version
USHORT type; // IPVersion
USHORT count; // Number of IP addresses of remote system
USHORT length; // Header Length (1)
BYTE SysConfigLocIP[4];
BYTE SysConfigRemoteIP[4];
USHORT lengthHost;
std::vector<BYTE>HostName;
later, after filling the structure I copy it to a byte like so:
BYTE Response[sizeof(aMsg)]
memcpy(response, &aMsg, sizeof(aMsg))
I find that my array is vector is holding the correct information for the host when I inspect the container during a debug. However, after the copy to the Response byte array, I'm finding the data that has been copied is drastically different. Is this a valid operation, if so, what can I do correctly copy the data from my vector the BYTE array. If not, what are other strategies I can use to dynamically size the structure to send the hostnames? Thank you for taking the moment of time to read my question, and I appreciate any feedback.
I'm working to copy the following structure to a byte array to send
over a named pipe.
named pipe (or other forms of inter-process or inter-processor communication) does not understand your struct, neither do they understand vector. They just operate on the concept of byte-in-byte-out. It is up to you, the programmer, to assign meaning to those bytes.
As suggested, please read on serialization. Try starting at http://en.wikipedia.org/wiki/Serialization. If permitted you can use the Boost solution, http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html, but I would still encourage you to understand the basics first.
As an exercise, first try transferring a vector<int> from sender to receiver. The number of elements in the vector must not be implicitly known by the receiver. Once you achieve that, migrating from int to your struct would be trivial.
That memcpy will only work for POD (plain old data) types. A vector is not POD. Instead, write code to put each byte in the buffer exactly where it needs to be. Don't rely on "magic".
99% of the time in C++ there is no reason to use memcpy. It breaks classes. Learn about copy constructors and std::copy and use them instead.
My guess is that data is scattered in physical memory (even the data of a class object is sequential in virtual memory), so in order to send the data correctly it needs to be reassembled, and to be able to send over the network, one additional step is the transformation of host byte order to network byte order. Is it correct?
Proper serialization can be used to send data to arbitrary systems, that might not work under the same architecture as the source host.
Even an object that only consist of native types can be troublesome sharing between two systems because of the extra padding that might exists in between and after members, among other things. Sharing raw memory dumps of objects between programs compiled for the same architecture but with different compiler versions can also turn into a big hassle. There is no guarantee how variable type T actually is stored in memory.
If you are not working with pointers (references included), and the data is meant to be read by the same binary as it's dumped from, it's usually safe just to dump a raw struct to disk, but when sending data to another host.. drum roll serialization is the way to go.
I've heard developers talking about ntohl / htonl / ntohl / ntohs as methods of serializing/deserializing integers, and when you think about it saying that isn't that far from the truth.
The word "serialization" is often used to describe this "complicated method of storing data in a generic way", but then again; your first programming assignment where you were asked to save information about Dogs to file (hopefully*) made use of serialization, in some way or another.
* "hopefully" meaning that you didn't dump the raw memory representation of your Dog object to disk
Pointers!
If you've allocated memory on the heap you'll just end up with a serialised pointer pointing to an arbitrary area of memory. If you just have a few ints and chars then yes you can just write it out directly to a file, but that then becomes platform dependent because of the byte ordering that you mentioned.
Pointer and data pack(data align)
If you memcpy your object's memory, there is dangerous to copy a wild pointer value instead of it's data. There is another risk, if the sender and receiver have different data pack(data align) method, you will get rubbish after decoding.
Binary representations may be different between different architectures, compilers and even different versions of the same compiler. There's no guarantee that what system A sees as a signed integer will be seen as the same on system B. Byte ordering, word langths, struct padding etc will become hard to debug problems if you don't properly define the protocol or file format for exchanging the data.
Class (when we speak of C++) also includes virtual method pointers - and they must be reconstructed on receiving end.