c++ best way to compare byte array to struct - c++

I need help. I have an unsigned char * and say I have a struct
struct{
int a=3;
char b='d';
double c=3.14;
char d='e';
} cmp;
unsigned char input[1000];
l= recv(sockfd,input , sizeof(cmp),0);
I want to compare cmp and input. What is the fastest way?
Thanks a lot in advance.

If the compiler guarantees that there are no gaps between fields in the struct (usually happen due to packing) or you can use a #pragna to cancel any such gaps, then you can compare by either:
memcmp(&cmp, input, sizeof(stuct ThesSruct));
Or, my preferred:
cmp == *(struct TheStruct *)input // provided the struct doesn't contain pointers.
But a much safer way would be to compare it on a field by field basis. And even more, prepare special functions for extracting ints, floats, etc.. from the raw input. For example, extracting an int at index n may be as simple as
*(int *)&input[n]
But it might be more complicated, like shifting chars at 8, 16, 24 bits.
In short, accessing the communication data must be done with the most robust way, checking every basic element and not assuming anything.

Give reinterpret_cast a try. This will allow you to arbitrarily cast the char * to a cmp *
http://msdn.microsoft.com/en-us/library/e0w9f63b.aspx

In the general case James Kantzes comment is correct, you can't compare like that. This is , among other things, due to byte padding.
However in the specific case with the following assumptions;
The sender is on the same cpu architecture as the receiver
The sender is using the same compiler and linker as the receiver
The applications are compiled with the same compiler/linker flags
...other things...you get the gist.
The sender is sending it straight from the struct.
cmp c{ ...set variables... };
send(sockfd, (char*)&c, sizeof(c));
So in short, this is a very brittle way of transporting structs and you shouldn't do it for anything except simple tests or quick hacks.

Related

C++ casting a struct to std::vector<char> memory alignment

I'm trying to cast a struct into a char vector.
I wanna send my struct casted in std::vector throw a UDP socket and cast it back on the other side. Here is my struct whith the PACK attribute.
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
PACK(struct Inputs
{
uint8_t structureHeader;
int16_t x;
int16_t y;
Key inputs[8];
});
Here is test code:
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
//send and receive via udp
Inputs* my_struct = reinterpret_cast<Inputs*>(&buffer[0]);
The issue is:
All works fine except my uint8_t or int8_t.
I don't know why but whenever and wherever I put a 1Bytes value in the struct,
when I cast it back the value is not readable (but the others are)
I tried to put only 16bits values and it works just fine even with the
maximum values so all bits are ok.
I think this is something with the alignment of the bytes in the memory but i can't figure out how to make it work.
Thank you.
I'm trying to cast a struct into a char vector.
You cannot cast an arbitrary object to a vector. You can cast your object to an array of char and then copy that array into a vector (which is actually what your code is doing).
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
That second line defines a new vector and initializes it by copying the bytes that represent your object into it. This is reasonable, but it's distinct from what you said you were trying to do.
I think this is something with the alignment of the bytes in the memory
This is good intuition. If you hadn't told the compiler to pack the struct, it would have inserted padding bytes to ensure each field starts at its natural alignment. The fact that the operation isn't reversible suggests that somehow the receiving end isn't packed exactly the same way. Are you sure the receiving program has exactly the same packing directive and struct layout?
On x86, you can get by with unaligned data, but you may pay a large performance cost whenever you access an unaligned member variable. With the packing set to one, and the first field being odd-sized, you've guaranteed that the next fields will be unaligned. I'd urge you to reconsider this. Design the struct so that all the fields fall at their natural alignment boundaries and that you don't need to adjust the packing. This may make your struct a little bigger, but it will avoid all the alignment and performance problems.
If you want to omit the padding bytes in your wire format, you'll have to copy the relevant fields byte by byte into the wire format and then copy them back out on the receiving end.
An aside regarding:
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
Identifiers that begin with underscore and a capital letter or with two underscores are reserved for "the implementation," so you probably shouldn't use __Declaration__ as the macro's parameter name. ("The implementation" refers to the compiler, the standard library, and any other runtime bits the compiler requires.)
1
vector class has dynamically allocated memory and uses pointers inside. So you can't send the vector (but you can send the underlying array)
2
SFML has a great class for doing this called sf::packet. It's free, open source, and cross-platform.
I was recently working on a personal cross platform socket library for use in other personal projects and I eventually quit it for SFML. There's just TOO much to test, I was spending all my time testing to make sure stuff worked and not getting any work done on the actual projects I wanted to do.
3
memcpy is your best friend. It is designed to be portable, and you can use that to your advantage.
You can use it to debug. memcpy the thing you want to see into a char array and check that it matches what you expect.
4
To save yourself from having to do tons of robustness testing, limit yourself to only chars, 32-bit integers, and 64-bit doubles. You're using different compilers? struct packing is compiler and architecture dependent. If you have to use a packed struct, you need to guarantee that the packing is working as expected on all platforms you will be using, and that all platforms have the same endianness. Obviously, that's what you're having trouble with and I'm sorry I can't help you more with that. I would I would recommend regular serializing and would definitely avoid struct packing if I was trying to make portable sockets.
If you can make those guarantees that I mentioned, sending is really easy on LINUX.
// POSIX
void send(int fd, Inputs& input)
{
int error = sendto(fd, &input, sizeof(input), ..., ..., ...);
...
}
winsock2 uses a char* instead of a void* :(
void send(int fd, Inputs& input)
{
char buf[sizeof(input)];
memcpy(buf, &input, sizeof(input));
int error = sendto(fd, buf, sizeof(input), ..., ..., ...);
...
}
Did you tried the most simple approach of:
unsigned char *pBuff = (unsigned char*)&in;
for (unsigned int i = 0; i < sizeof(Inputs); i++) {
vecBuffer.push_back(*pBuff);
pBuff++;
}
This would work for both, pack and non pack, since you will iterate the sizeof.

How to read and write data in 8 bit integers unit form by c++ file functions

Is it possible to store data in integer form from 0 to 255 rather than 8-bit characters.Although both are same thing, how can we do it, for example, with write() function?
Is it ok to directly cast any integer to char and vice versa? Does something like
{
int a[1]=213;
write((char*)a,1);
}
and
{
int a[1];
read((char*)a,1);
cout<<a;
}
work to get 213 from the same location in the file? It may work on that computer but is it portable, in other words, is it suitable for cross-platform projects in that way? If I create a file format for each game level(which will store objects' coordinates in the current level's file) using this principle, will it work on other computers/systems/platforms in order to have loaded same level?
The code you show would write the first (lowest-address) byte of a[0]'s object representation - which may or may not be the byte with the value 213. The particular object representation of an int is imeplementation defined.
The portable way of writing one byte with the value of 213 would be
unsigned char c = a[0];
write(&c, 1);
You have the right idea, but it could use a bit of refinement.
{
int intToWrite = 213;
unsigned char byteToWrite = 0;
if ( intToWrite > 255 || intToWrite < 0 )
{
doError();
return();
}
// since your range is 0-255, you really want the low order byte of the int.
// Just reading the 1st byte may or may not work for your architecture. I
// prefer to let the compiler handle the conversion via casting.
byteToWrite = (unsigned char) intToWrite;
write( &byteToWrite, sizeof(byteToWrite) );
// you can hard code the size, but I try to be in the habit of using sizeof
// since it is better when dealing with multibyte types
}
{
int a = 0;
unsigned char toRead = 0;
// just like the write, the byte ordering of the int will depend on your
// architecture. You could write code to explicitly handle this, but it's
// easier to let the compiler figure it out via implicit conversions
read( &toRead, sizeof(toRead) );
a = toRead;
cout<<a;
}
If you need to minimize space or otherwise can't afford the extra char sitting around, then it's definitely possible to read/write a particular location in your integer. However, it can need linking in new headers (e.g. using htons/ntons) or annoying (using platform #defines).
It will work, with some caveats:
Use reinterpret_cast<char*>(x) instead of (char*)x to be explicit that you’re performing a cast that’s ordinarily unsafe.
sizeof(int) varies between platforms, so you may wish to use a fixed-size integer type from <cstdint> such as int32_t.
Endianness can also differ between platforms, so you should be aware of the platform byte order and swap byte orders to a consistent format when writing the file. You can detect endianness at runtime and swap bytes manually, or use htonl and ntohl to convert between host and network (big-endian) byte order.
Also, as a practical matter, I recommend you prefer text-based formats—they’re less compact, but far easier to debug when things go wrong, since you can examine them in any text editor. If you determine that loading and parsing these files is too slow, then consider moving to a binary format.

Reversibly Combining Two uint32_t's Without Changing Datatype

Here's my issue: I need to pass back two uint32_t's via a single uint32_t (because of how the API is set up...). I can hard code whatever other values I need to reverse the operation, but the parameter passed between functions needs to stay a single uint32_t.
This would be trivial if I could just bit-shift the two 32-bit ints into a single 64-bit int (like what was explained here), but the compiler wouldn't like that. I've also seen mathematical pairing functions, but I'm not sure if that's what I need in this case.
I've thought of setting up a simple cipher: the unint32_t could be the cipher text, and I could just hard code the key. This is an example, but that seems like overkill.
Is this even possible?
It is not possible to store more than 32 bits of information using only 32 bits. This is a basic result of information theory.
If you know that you're only using the low-order 16 bits of each value, you could shift one left 16 bits and combine them that way. But there's absolutely no way to get 64 bits worth of information (or even 33 bits) into 32 bits, period.
Depending on how much trouble this is really worth, you could:
create a global array or vector of std::pair<uint32_t,uint32_t>
pass an index into the function, then your "reverse" function just looks up the result in the array.
write some code to decide which index to use when you have a pair to pass. The index needs to not be in use by anyone else, and since the array is global there may be thread-safety issues. Essentially what you are writing is a simple memory allocator.
As a special case, on a machine with 32 bit data pointers you could allocate the struct and reinterpret_cast the pointer to and from uint32_t. So you don't need any globals.
Beware that you need to know whether or not the function you pass the value into might store the value somewhere to be "decoded" later, in which case you have a more difficult resource-management problem than if the function is certain to have finished using it by the time it returns.
In the easy case, and if the code you're writing doesn't need to be re-entrant at all, then you only need to use one index at a time. That means you don't need an array, just one pair. You could pass 0 to the function regardless of the values, and have the decoder ignore its input and look in the global location.
If both special cases apply (32 bit and no retaining of the value), then you can put the pair on the stack, and use no globals and no dynamic allocation even if your code does need to be re-entrant.
None of this is really recommended, but it could solve the problem you have.
You can use an intermediate global data structure to store the pair of uint32_t on it, using your only uint32_t parameter as the index on the structure:
struct my_pair {
uint32_t a, b;
};
std::map<uint32_t, my_pair> global_pair_map;
uint32_t register_new_pair(uint32_t a, uint32_t b) {
// Add the pair of (a, b) to the map global_pair_map on a new key, and return the
// new key value.
}
void release_pair(uint32_t key) {
// Remove the key from the global_pair_map.
}
void callback(uint32_t user_data) {
my_pair& p = global_pair_map[user_data];
// Use your pair of uint32_t with p.a, and p.b.
}
void main() {
uint32_t key = register_new_pair(number1, number2);
register_callback(callback, key);
}

C++: the fastest way to access specific octet of int

Assuming we have 32bit integer, 8bit char, gcc compiler and Intel architecture:
What would be the fastest way (with no assembler usage) to extract, say, third octet of integer variable? To store it to a char of some specific place of char[] for example?
For the 3rd octet (little endian):
int i = 0xdeadbeef;
char c = (char) (i>>16); // c = 0xad
use a Union:
union myCharredInt
{
int myInt;
struct {
char char1;
char char2;
char char3;
char char4;
}
};
myCharredInt a = 5;
char c = a.char3;
shift the octet to the least significant octet and store it
somewhat like this but it depends exactly what you mean by 3rd octet, as the majority of my experience has been in big-endian architecture
char *ptr;
....
*ptr = val >> 8;
Whenever you are looking for the "fastest" or "best" way to do something in very particular circumstances, the answer almost always will be: experiment, and find out.
While there are rules of thumb to follow, they will not conclusively give you the best answer for your particular system, architecture, compiler, etc.
You will notice there are a few different answers to your question already, using different techniques.
How will you know which is best?
Answer: Try them out. Profile them.
N.b.: I'm being a little facetious. I suspect what you really want to know is how to do this at all, and not how to do it fastest.

A Better Way To Build a Packet - Byte by Byte?

This is related to my question asked here today on SO. Is there a better way to build a packet to send over serial rather than doing this:
unsigned char buff[255];
buff[0] = 0x02
buff[1] = 0x01
buff[2] = 0x03
WriteFile(.., buff,3, &dwBytesWrite,..);
Note: I have about twenty commands to send, so if there was a better way to send these bytes to the serial device in a more concise manner rather than having to specify each byte, it would be great. Each byte is hexadecimal, with the last byte being the checksum. I should clarify that I know I will have to specify each byte to build the commands, but is there a better way than having to specify each array position?
You can initialize static buffers like so:
const unsigned char command[] = {0x13, 0x37, 0xf0, 0x0d};
You could even use these to initialize non-const buffers and then replace only changing bytes by index.
Not sure what you're asking. If you ask about the problem of setting the byte one by one and messing up the data, usually this is doen with a packed struct with members having meaningful names. Like:
#pragma push(pack)
#pragma pack(1)
struct FooHeader {
uint someField;
byte someFlag;
dword someStatus;
};
#pragma pack(pop)
FooHeader hdr;
hdr.someField = 2;
hdr.someFlag = 3;
hdr.someStatus = 4;
WriteFile(..., sizeof(hdr), &hdr);
Is there a better way to build a packet than assembling it byte by byte?
Yes, but it will require some thought and some careful engineering. Many of the other answers tell you other mechanisms by which you can put together a sequence of bytes in C++. But I suggest you design an abstraction that represents a part of a packet:
class PacketField {
void add_to_packet(Packet p);
};
Then you can define various subclasses:
Add a single byte to the packet
Add a 16-bit integer in big-endian order. Another for little-endian. Other widths besides 16.
Add a string to the packet; code the string by inserting the length and then the bytes.
You also can define a higher-order version:
PacketField sequence(PacketField first, PacketField second);
Returns a field that consists of the two arguments in sequence. If you like operator overloading you could overload this as + or <<.
Your underlying Packet abstraction will just be an extensible sequence of bytes (dynamic array) with some kind of write method.
If you wind up programming a lot of network protocols, you'll find this sort of design pays off big time.
Edit: The point of the PacketField class is composability and reuse:
By composing packet fields you can create more complex packet fields. For example, you could define "add a TCP header" as a function from PacketFields to PacketFields.
With luck you build up a library of PacketFields that are specific to your application or protocol family or whatever. Then you reuse the fields in the library.
You can create subclasses of PacketField that take extra parameters.
It's quite possibly that you can do something equally nice without having to have this extra level of indirection; I'm recommending it because I've seen it used effectively in other applications. You are decoupling the knowledge of how to build a packet (which can be applied to any packet, any time) from the act of actually building a particular packet. Separating concerns like this can help reuse.
Yes, there is a better method. Have your classes read from and write to a packed buffer. You could even implement this as an interface. Templates would help to.
An example of writing:
template <typename Member_Type>
void Store_Value_In_Buffer(const Member_Type&, member,
unsigned char *& p_buffer)
{
*((Member_Type *)(p_buffer)) = member;
p_buffer += sizeof(Member_Type);
return;
}
struct My_Class
{
unsigned int datum;
void store_to_buffer(unsigned char *& p_buffer)
{
Store_Value_In_Buffer(datum, buffer);
return;
}
};
//...
unsigned char buffer[256];
unsigned char * p_buffer(buffer);
MyClass object;
object.datum = 5;
object.store_to_buffer(p_buffer);
std::cout.write(p_buffer, 256);
Part of the interface is also to query the objects for the size that they would occupy in the buffer, say a method size_in_buffer. This is left as an exercise for the reader. :-)
There is a much better way, which is using structs to set the structures. This is usually how network packets are built on a low level.
For example, say you have packets which have an id, length, flag byte, and data, you'd do something like this:
struct packet_header {
int id;
byte length;
byte flags;
};
byte my_packet[] = new byte[100];
packet_header *header = &my_packet;
header->id = 20;
header->length = 10; // This can be set automatically by a function, maybe?
// etc.
header++; // Header now points to the data section.
Do note that you're going to have to make sure that the structures are "packed", i.e. when you write byte length, it really takes up a byte. Usually, you'd achieve this using something like #pragma pack or similar (you'll have to read about your compiler's pragma settings).
Also, note that you should probably use functions to do common operations. For example, create a function which gets as input the size, data to send, and other information, and fills out the packet header and data for you. This way, you can perform calculations about the actual size you want to write in the length field, you can calculate the CRC inside the function, etc.
Edit: This is a C-centric way of doing things, which is the style of a lot of networking code. A more C++-centric (object oriented) approach could also work, but I'm less familiar with them.
const char *c = "\x02\x02\x03";