Passing data between dll boundaries in c++ Safely - c++

Let there be a structure
struct MyDataStructure
{
int a;
int b;
string c;
};
Let there be a function in the interface exposed by a dll.
class IDllInterface
{
public:
void getData(MyDataStructure&) = 0;
};
From a client exe which loads the dll, would the following code be safe?
...
IDllInterface* dll = DllFactory::getInterface(); // Imagine this exists
MyDataStructure data;
dll->getData(data);
...
Assume, of course, that MyDataStructure is known to both the client and the dll. Also according to what I understand, as the code is compiled separately for the dll and exe, the MyDataStructure could be different for difference compilers/compiler versions. Is my understanding correct.
If so, how can you pass data between the dll boundaries safely when working with different compilers/compiler versions.

You could use a "protocol" approach. For this, you could use a memory buffer to transfer the data and both sides just have to agree on the buffer layout.
The protocol agreement could be something like:
We don't use a struct, we just use a memory buffer - (pass me a pointer or whatever means toolkit allows sharing a memory buffer.
We clear the buffer to 0s before setting any data in it.
All ints use 4 bytes in the buffer. This means each side uses whatever int type under their compiler is 4 bytes e.g. int/long.
For the particular case of two ints, the first 8 bytes has the ints and after that it's the string data.
#define MAX_STRING_SIZE_I_NEED 128
// 8 bytes for ints.
#define DATA_SIZE (MAX_STRING_SIZE_I_NEED + 8)
char xferBuf[DATA_SIZE];
So Dll sets int etc. e.g.
void GetData(void* p);
// "int" is whatever type is known to use 4 bytes
(int*) p = intA_ValueImSending;
(int*) (p + 4) = intB_ValueImSending;
strcpy((char*) (p + 8), stringBuf_ImSending);
On the receving end it's easy enough to place the buffered values in the struct:
char buf[DATA_SIZE];
void* p =(void*) buf;
theDll.GetData(p);
theStrcuctInstance.intA = *(int*) p;
theStrcuctInstance.intB = *(int*) (p + 4);
...
If you want you could even agree on the endianness of the bytes per integer and set each of the 4 bytes of each integer in the buffer - but you probably wouldn't need to go to that extent.
For more general purpose both sides could agree on "markers" in the buffer. The buffer would look like this:
<marker>
<data>
<marker>
<data>
<marker>
<data>
...
Marker: 1st byte indicates the data type, the 2nd byte indicates the length (very much like a network protocol).

If you want to pass a string in COM, you normally want to use a COM BSTR object. You can create one with SysAllocString. This is defined to be neutral between compilers, versions, languages, etc. Contrary to popular belief, COM does directly support the int type--but from its perspective, int is always a 32-bit type. If you want a 64-bit integer, that's a Hyper, in COM-speak.
Of course you could use some other format that both sides of your connection know/understand/agree upon. Unless you have an extremely good reason to do this, it's almost certain to be a poor idea. One of the major strengths of COM is exactly the sort of interoperation you seem to want--but inventing your own string formation would limit that substantially.

Using JSON for communication.
I think I have found an easier way to do it hence answering my own question. As suggested in the answer by #Greg, one has to make sure that the data representation follows a protocol e.g. network protocol. This makes sure that the object representation between different binary components (exe and dll here) becomes irrelevant. If we think about it again this is the same problem that JSON solves by defining a simple object representation protocol.
So a simple yet powerful solution according to me would be to construct a JSON object from your object in the exe, serialise it, pass it across the dll boundary as bytes and deserialise it in the dll. The only agreement between the dll and exe would be that both use the same string encoding (e.g. UTF-8).
https://en.wikibooks.org/wiki/JsonCpp
One can use the above Jsoncpp library. The strings are encoded by UTF-8 by default in Jsoncpp library so that it convenient as well :-)

Related

C/C++ Little/Big Endian handler

There are two systems that communicate via TCP. One uses little endian and the second one big endian. The ICD between systems contains a lot of structs (fields). Making bytes swap for each field looks like not the best solution.
Is there any generic solution/practice for handling communication between systems with different endianness?
Each system may have a different architecture, but endianness should be defined by the communication protocol. If the protocol says "data must be sent as big endian", then that's how the system sends it and how the other system receives it.
I am guessing the reason why you're asking is because you would like to cast a struct pointer to a char* and just send it over the wire, and this won't work.
That is generally a bad idea. It's far better to create an actual serializer, so that your internal data is decoupled from the actual protocol, which also means you can easily add support for different protocols in the future, or different versions of the protocols. You also don't have to worry about struct padding, aliasing, or any implementation-defined issues that casting brings along.
(update)
So generally, you would have something like:
void Serialize(const struct SomeStruct *s, struct BufferBuilder *bb)
{
BufferBuilder_append_u16_le(bb, s->SomeField);
BufferBuilder_append_s32_le(bb, s->SomeOther);
...
BufferBuilder_append_u08(bb, s->SomeOther);
}
Where you would already have all these methods written in advance, like
// append unsigned 16-bit value, little endian
void BufferBuilder_append_u16_le(struct BufferBuilder *bb, uint16_t value)
{
if (bb->remaining < sizeof(value))
{
return; // or some error handling, whatever
}
memcpy(bb->buffer, &value, sizeof(value));
bb->remaining -= sizeof(value);
}
We use this approach because it's simpler to unit test these "appending" methods in isolation, and writing (de)serializers is then a matter of just calling them in succession.
But of course, if you can pick any protocol and implement both systems, then you could simply use protobuf and avoid doing a bunch of plumbing.
Generally speaking, values transmitted over a network should be in network byte order, i.e. big endian. So values should be converted from host byte order to network byte order for transmission and converted back when received.
The functions htons and ntohs do this for 16 bit integer values and htonl and ntohl do this for 32 bit integer values. On little endian systems these functions essentially reverse the bytes, while on big endian systems they're a no-op.
So for example if you have the following struct:
struct mystruct {
char f1[10];
uint32_t f2;
uint16_t f3;
};
Then you would serialize the data like this:
// s points to the struct to serialize
// p should be large enough to hold the serialized struct
void serialize(struct mystruct *s, unsigned char *p)
{
memcpy(p, s->f1, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp = htonl(s->f2);
memcpy(p, &f2_tmp, sizeof(f2_tmp));
p += sizeof(s->f2);
uint16_t f3_tmp = htons(s->f3);
memcpy(p, &f3_tmp, sizeof(f3_tmp));
}
And deserialize it like this:
// s points to a struct which will store the deserialized data
// p points to the buffer received from the network
void deserialize(struct mystruct *s, unsigned char *p)
{
memcpy(s->f1, p, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp;
memcpy(&f2_tmp, p, sizeof(f2_tmp));
s->f2 = ntohl(f2_tmp);
p += sizeof(s->f2);
uint16_t f3_tmp;
memcpy(&f3_tmp, p, sizeof(f3_tmp));
s->f3 = ntohs(f3_tmp);
}
While you could use compiler specific flags to pack the struct so that it has a known size, allowing you to memcpy the whole struct and just convert the integer fields, doing so means that certain fields may not be aligned properly which can be a problem on some architectures. The above will work regardless of the overall size of the struct.
You mention one problem with struct fields. Transmitting structs also requires taking care of alignment of fields (causing gaps between fields): compiler flags.
For binary data one can use Abstract Syntax Notation One (ASN.1) where you define the data format. There are some alternatives. Like Protocol Buffers.
In C one can with macros determine endianess and field offsets inside a struct, and hence use such a struct description as the basis for a generic bytes-to-struct conversion. So this would work independent of endianess and alignment.
You would need to create such a descriptor for every struct.
Alternatively a parser might generate code for bytes-to-struct conversion.
But then again you could use a language neutral solution like ASN.1.
C and C++ of course have no introspection/reflection capabilities like Java has, so that are the only solutions.
The fastest and most portable way is to use bit shifts.
These have the big advantage that you only need to know the network endianess, never the CPU endianess.
Example:
uint8_t buf[4] = { MS_BYTE, ... LS_BYTE}; // some buffer from TCP/IP = Big Endian
uint32_t my_u32 = ((uint32_t)buf[0] << 24) |
((uint32_t)buf[1] << 16) |
((uint32_t)buf[2] << 8) |
((uint32_t)buf[3] << 0) ;
Do not use (bit-field) structs/type punning directly on the input. They are poorly standardized, may contain padding/alignment requirements, depend on endianess. It is fine to use structs if you have proper serialization/deserialization routines in between. A deserialization routine may contain the above bit shifts, for example.
Do not use pointer arithmetic to iterate across the input, or plain memcpy(). Neither one of these solves the endianess issue.
Do not use htons etc bloat libs. Because they are non-portable. But more importantly because anyone who can't write a simple bit shift like above without having some lib function holding their hand, should probably stick to writing high level code in a more family-friendly programming language.
There is no point in writing code in C if you don't have a clue about how to do efficient, close to the hardware programming, also known as the very reason you picked C for the task to begin with.
EDIT
Helping hand for people who are confused over how C code gets translated to asm: https://godbolt.org/z/TT1MP7oc4. As we can see, the machine code is identical on x86 Linux. The htonl won't compile on a number of embedded targets, nor on MSVC, while leading to worse performance on Mips64.

Can we typecast buffer into C++ structure on client when server is sending data as c structure?

I have server, client processes written in C named as NetworkServer.c and NetworkClient.c and these 2 are communicating using linux sockets. When client sends a request as below to get ethernet statistics,
// rxbuf - character array of 128K
// ETHERNET_DIAGNOSTIC_INFO - structure typedefed
recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0)
server fills the data in to rxbuf (as ETHERNET_DIAGNOSTIC_INFO because server also uses the same copy of header file where this structure is defined) and sends the data. Once client receives, it will typecast as below to get the data.
ETHERNET_DIAGNOSTIC_INFO *info = (ETHERNET_DIAGNOSTIC_INFO *) rxbuf;
the structure is defined in NetworkDiag.h as below.
#ifdef __cplusplus
extern "C" {
#endif
typedef struct ETHERNET_DIAGNOSTIC_INFO
{
uint32_t cmdId;
unsigned long RxCount[MAX_SAMPLES];
unsigned long TxCount[MAX_SAMPLES];
time_t TimeStamp[MAX_SAMPLES] ;
char LanIpAddress[20];
char LanMacAddress[20];
char WanIpAddress[20];
char LanDefaultGateway[20];
char LanSubnetMask[20];
char LanLease[5000];
}ETHERNET_DIAGNOSTIC_INFO;
This is working fine.
Now there is a requirement that I need to create a c++ file which should work as client (I removed client C file and server should remain as c file). I defined header file for the structure definition as below.
struct ETHERNET_DIAGNOSTIC_INFO
{
uint32_t cmdId;
unsigned long RxCount[MAX_SAMPLES];
unsigned long TxCount[MAX_SAMPLES];
time_t TimeStamp[MAX_SAMPLES] ;
char LanIpAddress[20];
char LanMacAddress[20];
char WanIpAddress[20];
char LanDefaultGateway[20];
char LanSubnetMask[20];
char LanLease[5000];
};
basically I removed the C++ guard and typedef and using the below code in client.cpp file to get the result from server.
if(recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0) > 0)
{
ETHERNET_DIAGNOSTIC_INFO *info = reinterpret_cast<ETHERNET_DIAGNOSTIC_INFO *> (rxbuf);
}
I am not getting the correct results. The values in the structure are misplaced (some values are correct but lot of values are misplaced). I tried 'C' type casting also but no use.
I doubt that we can not typecast buffer into C++ structure on client when server is sending data as c structure. Is it correct? Can any one please let me know how to solve this issue?
There are multiple problems with this approach:
Endianness might be different between server and client machine
You then need to deserealize numbers and time_t's.
Structure packing might be different between code compiled on server (c++) and on client (C)
You then need to use a protocol to send data, like binary ASN, protobuf or many others.
if(recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0) > 0)
there is no guarantee recv will read exactly sizeof(ETHERNET_DIAGNOSTIC_INFO) bytes.
You need to wrap this into while loop (code is sample and might be non-compilable):
.
int left = sizeof(ETHERNET_DIAGNOSTIC_INFO);
char *ptr = rxbuf;
int rd;
while(left>0)
{
rd=recv(sockfd, ptr, left, 0);
if(rd==0)
{
if(left>0) return SOCKET_CLOSED_PREMATURELY;
else return SOCKET_DONE;
} else if(rd==-1 && errno==EAGAIN) {
//do again
continue;
} else if(rd==-1 && errno!=EAGAIN) {
return SOCKET_ERROR;
}
left = left - rd;
ptr=ptr+rd;
}
The proper way to send binary data is to use protobuf or apache thrift, or ASN or invent something yourself.
You can probably do it but are likely to run into serious significant issues in trying:
Different compilers and compiler settings will pack and align structures differently in order to optimise for the particular processor architecture. There is absolutely no guarantee that the members of a structure will lay out exactly next to each other unless you play with pragmas.
Different processors will use different byte orders for things like integers and floating point values. If you are going to exchange data between a client and server (or vice versa) it behooves you to explicitly define the byte order and then make both sides conform to that definition regardless of the native order.
Values like unsigned long will have different sizes based upon the processor architecture targeted by the compiler. In order to reliably exchange data, you will need to explicitly define the size of the values that will be transferred.
For these reasons, I prefer to write functions (or methods) that will explicitly pack and unpack messages as they are exchanged. By doing so, you will be subjected to much fewer seemingly mysterious errors.
A number of possible explanations spring to mind:
Different packing of ETHERNET_DIAGNOSTIC_INFO between a C struct and a C++ struct.
(less likely) Different alignments of rxbuf (you don't show where this pointer comes from). There are no guarantees in C or C++ that reading a int or long that does not lie on natural boundary (e.g. 4-byte aligned) yields correct results.
That your C and C++ compilers are compiling against different ABIs (e.g. 32-bit and 64-bit respectively). Note that sizeof(time_t) == 4 on a 32-bit platform and 8 on many 64-bit platforms.
All of these issues point in the same direction: Mapping a struct onto a wire data layout like this is really non-portable and problematic.
If you really insist on doing it you'll need to do the following:
Use #pragma pack directives (or better: if using a C++11 compiler __attribute__ ((__packed__))). Even then, you can get surprises.
Decide which byte-ordering you intend using and byte-swap all multi-byte values with htons() and friends. The convention is for multi-byte quantities to be big-endian over TCP/IP.
Ensure the buffer you call recv() with is aligned - probably to a 4-byte boundary.
A more robust approach is to read the input buffer as a stream of bytes, reconstructing any multi-byte fields as required.
Yes, you can as the buffer is just the byte representation of the struct sent by the other side. After you have handled the byte order, you can just cast the buffer pointer to a pointer of the type of your struct.
In C++ you can write for example ETHERNET_DIAGNOSTIC_INFO* NewPtr = reinterpret_cast<ETHERNET_DIAGNOSTIC_INFO*>(buffer);
This will do what you want unless you run an older C++ compiler not capable of understanding C++11 syntax. However, depending on your compiler the error might arrive from padding the data.
If you define bit fields and pack the struct on both sides you will be fine though. Ask if you need help, but google is your friend.
I doubt that we can not typecast buffer into C++ structure on client when server is sending data as c structure. Is it correct?
EDIT:
You can cast any binary data generated by any programming language into a readable piece of code in your program. After all, it is all about bits and bytes. So you can cast any data from any program to any data in any other program. Could you quickly print the sizeof(ETHERNET_DIAGNOSTIC_INFO) on both sides and see if they match?

Can I make a single binary write for a C++ struct which contains a vector

I am trying to build and write a binary request and have a "is this possible" type question. It might be important for me to mention the recipiant of the request is not aware of the data structure I have included below, it's just expecting a sequence of bytes, but using a struct seemed like a handy way to prepare the pieces of the request, then write them easily.
Writing the header and footer is fine as they are fixed size but I'm running into problems with the struct "Details", because of the vector. For now Im writing to a file so I can check the request is to spec, but the intention is to write to a PLC using boost asio serial port eventually
I can use syntax like so to write a struct, but that writes pointer addresses rather than values when it gets to the vector
myFile.write((char*) &myDataRequest, drSize);
I can use this sytax to write a vector by itself, but I must include the indexer at 0 to write the values
myFile.write((char*) &myVector[0], vectorSize);
Is there an elegant way to binary write a struct containing a vector (or other suitable collection), doing it in one go? Say for example if I declared the vector differently, or am I resigned to making multiple writes for the content inside the struct. If I replace the vector with an array I can send the struct in one go (without needing to include any indexer) but I dont know the required size until run time so I don't think it is suitable.
My Struct
struct Header
{ ... };
struct Details
{
std::vector<DataRequest> DRList;
};
struct DataRequest
{
short numAddresses; // Number of operands to be read Bytes 0-1
unsigned char operandType; // Byte 2
unsigned char Reserved1; //Should be 0xFF Byte 3
std::vector<short> addressList; // either, starting address (for sequence), or a list of addresses (for non-sequential)
};
struct Footer
{ ... };
It's not possible because the std::vector object doesn't actually contain an array but rather a pointer to a block of memory. However, I'm tempted to claim that being able to write a raw struct like that is not desireable:
I believe that by treating a struct as a block of memory you may end up sending padding bytes, I don't think this is desireable.
Depending on what you write to you may find that writes are buffered anyway, so multiple write calls aren't actually less efficient.
Chances are that you want to do something with the fields being sent over. In particular, with the numeric values you send. This requires enforcing a byte order which both sides of the transmission agree on. In order to be portable, you should exlicitely convert the byte order to make sure that your software is portable (if this is required).
To make a long story short: I suspect writing out each field one by one is not less efficient, it also is more correct.
This is not really a good strategy, since even if you could do this you're copying memory content directly to file. If you change the architecture/processor your client will get different data. If you write a method taking your struct and a filename, which writes the structs values individually and iterates over the vector writing out its content, you'll have full control over the binary format your client expects and are not dependent on the compilers current memory representation.
If you want convenience for marshalling/unmarshalling you should take a look at the boost::serialization library. They do offer a binary archive (besides text and xml) but it has its own format (e.g. it has a version number, which serialization lib was used to dump the data) so it is probably not what your client wants.
What exactly is the format expected at the other end? You have to write
that, period. You can't just write any random bytes. The probability
that just writing an std::vector like you're doing will work is about
as close to 0 as you can get. But the probability that writing a
struct with only int will work is still less than 50%. If the other
side is expecting a specific sequence of bytes, then you have to write
that sequence, byte by byte. To write an int, for example, you must
still write four (or whatever the protocol requires) bytes, something
like:
byte[0] = (value >> 24) & 0xFF;
byte[1] = (value >> 16) & 0xFF;
byte[2] = (value >> 8) & 0xFF;
byte[3] = (value ) & 0xFF;
(Even here, I'm supposing that your internal representation of negative
numbers corresponds to that of the protocol. Usually the case, but not
always.)
Typically, of course, you build your buffer in a std::vector<char>,
and then write &buffer[0], buffer.size(). (The fact that you need a
reinterpret_cast for the buffer pointer should signal that your
approach is wrong.)

pack/unpack functions for C++

NOTE: I know that this has been asked many times before, but none of the questions have had a link to a concrete, portable, maintained library for this.
I need a C or C++ library that implements Python/Ruby/Perl like pack/unpack functions. Does such a library exist?
EDIT: Because the data I am sending is simple, I have decided to just use memcpy, pointers, and the hton* functions. Do I need to manipulate a char in any way to send it over the network in a platform agnostic manner? (the char is only used as a byte, not as a character).
In C/C++ usually you would just write a struct with the various members in the correct order (correct packing may require compiler-specific pragmas) and dump/read it to/from file with a raw fwrite/fread (or read/write when dealing with C++ streams). Actually, pack and unpack were born to read stuff generated with this method.
If you instead need the result in a buffer instead of a file it's even easier, just copy the structure to your buffer with a memcpy.
If the representation must be portable, your main concerns are is byte ordering and fields packing; the first problem can be solved with the various hton* functions, while the second one with compiler-specific directives.
In particular, many compilers support the #pragma pack directive (see here for VC++, here for gcc), that allows you to manage the (unwanted) padding that the compiler may insert in the struct to have its fields aligned on convenient boundaries.
Keep in mind, however, that on some architectures it's not allowed to access fields of particular types if they are not aligned on their natural boundaries, so in these cases you would probably need to do some manual memcpys to copy the raw bytes to variables that are properly aligned.
Why not boost serialization or protocol buffers?
Yes: Use std::copy from <algorithm> to operate on the byte representation of a variable. Every variable T x; can be accessed as a byte array via char * p = reinterpret_cast<char*>(&x); and p can be treated like a pointer to the first element of a an array char[sizeof(T)]. For example:
char buf[100];
double q = get_value();
char const * const p = reinterpret_cast<char const *>(&q);
std::copy(p, p + sizeof(double), buf);
// more stuff like that
some_stream.write(buf) //... etc.
And to go back:
double r;
std::copy(data, data + sizeof(double), reinterpret_cast<char *>(&r));
In short, you don't need a dedicated pack/unpack in C++, because the language already allows you access to its variables' binary representation as a standard part of the language.

Reading Superblock into a C Structure

I have a disk image which contains a standard image using fuse. The Superblock contains the following, and I have a function read_superblock(*buf) that returns the following raw data:
Bytes 0-3: Magic Number (0xC0000112)
4-7: Block Size (1024)
8-11: Total file system size (in blocks)
12-15: FAT length (in blocks)
16-19: Root Directory (block number)
20-1023: NOT USED
I am very new to C and to get me started on this project I am curious what is a simple way to read this into a structure or some variables and simply print them out to the screen using printf for debugging.
I was initially thinking of doing something like the following thinking I could see the raw data, but I think this is not the case. There is also no structure and I am trying to read it in as a string which also seems terribly wrong. for me to grab data out of. Is there a way for me to specify the structure and define the number of bytes in each variable?
char *buf;
read_superblock(*buf);
printf("%s", buf);
Yes, I think you'd be better off reading this into a structure. The fields containing useful data are all 32-bit integers, so you could define a structure that looks like this (using the types defined in the standard header file stdint.h):
typedef struct SuperBlock_Struct {
uint32_t magic_number;
uint32_t block_size;
uint32_t fs_size;
uint32_t fat_length;
uint32_t root_dir;
} SuperBlock_t;
You can cast the structure to a char* when calling read_superblock, like this:
SuperBlock_t sb;
read_superblock((char*) &sb);
Now to print out your data, you can make a call like the following:
printf("%d %d %d %d\n",
sb.magic_number,
sb.block_size,
sb.fs_size,
sb.fat_length,
sb.root_dir);
Note that you need to be aware of your platform's endianness when using a technique like this, since you're reading integer data (i.e., you may need to swap bytes when reading your data). You should be able to determine that quickly using the magic number in the first field.
Note that it's usually preferable to pass a structure like this without casting it; this allows you to take advantage of the compiler's type-checking and eliminates potential problems that casting may hide. However, that would entail changing your implementation of read_superblock to read data directly into a structure. This is not difficult and can be done using the standard C runtime function fread (assuming your data is in a file, as hinted at in your question), like so:
fread(&sb.magic_number, sizeof(sb.magic_number), 1, fp);
fread(&sb.block_size, sizeof(sb.block_size), 1, fp);
...
Two things to add here:
It's a good idea, when pulling raw data into a struct, to set the struct to have zero padding, even if it's entirely composed of 32-bit unsigned integers. In gcc you do this with #pragma pack(0) before the struct definition and #pragma pack() after it.
For dealing with potential endianness issues, two calls to look at are ntohs() and ntohl(), for 16- and 32-bit values respectively. Note that these swap from network byte order to host byte order; if these are the same (which they aren't on x86-based platforms), they do nothing. You go from host to network byte order with htons() and htonl(). However, since this data is coming from your filesystem and not the network, I don't know if endianness is an issue. It should be easy enough to figure out by comparing the values you expect (e.g. the block size) with the values you get, in hex.
It's not difficult to print the data after you successfully copied data into a structure Emerick proposed. Suppose the instance of the structure you use to hold data is named SuperBlock_t_Instance.
Then you can print its fields like this:
printf("Magic Number:\t%u\nBlock Size:\t%u\n etc",
SuperBlock_t_Instance.magic_number,
SuperBlock_t_Instance.block_size);