Python and C++ Sockets converting packet data - c++

First of all, to clarify my goal: There exist two programs written in C in our laboratory. I am working on a Proxy Server (bidirectional) for them (which will also mainpulate the data). And I want to write that proxy server in Python. It is important to know that I know close to nothing about these two programs, I only know the definition file of the packets.
Now: assuming a packet definition in one of the C++ programs reads like this:
unsigned char Packet[0x32]; // Packet[Length]
int z=0;
Packet[0]=0x00; // Spare
Packet[1]=0x32; // Length
Packet[2]=0x01; // Source
Packet[3]=0x02; // Destination
Packet[4]=0x01; // ID
Packet[5]=0x00; // Spare
for(z=0;z<=24;z+=8)
{
Packet[9-z/8]=((int)(720000+armcontrolpacket->dof0_rot*1000)/(int)pow((double)2,(double)z));
Packet[13-z/8]=((int)(720000+armcontrolpacket->dof0_speed*1000)/(int)pow((double)2,(double)z));
Packet[17-z/8]=((int)(720000+armcontrolpacket->dof1_rot*1000)/(int)pow((double)2,(double)z));
Packet[21-z/8]=((int)(720000+armcontrolpacket->dof1_speed*1000)/(int)pow((double)2,(double)z));
Packet[25-z/8]=((int)(720000+armcontrolpacket->dof2_rot*1000)/(int)pow((double)2,(double)z));
Packet[29-z/8]=((int)(720000+armcontrolpacket->dof2_speed*1000)/(int)pow((double)2,(double)z));
Packet[33-z/8]=((int)(720000+armcontrolpacket->dof3_rot*1000)/(int)pow((double)2,(double)z));
Packet[37-z/8]=((int)(720000+armcontrolpacket->dof3_speed*1000)/(int)pow((double)2,(double)z));
Packet[41-z/8]=((int)(720000+armcontrolpacket->dof4_rot*1000)/(int)pow((double)2,(double)z));
Packet[45-z/8]=((int)(720000+armcontrolpacket->dof4_speed*1000)/(int)pow((double)2,(double)z));
Packet[49-z/8]=((int)armcontrolpacket->timestamp/(int)pow(2.0,(double)z));
}
if(SendPacket(sock,(char*)&Packet,sizeof(Packet)))
return 1;
return 0;
What would be the easiest way to receive that data, convert it into a readable python format, manipulate them and send them forward to the receiver?

You can receive the packet's 50 bytes with a .recv call on a properly connected socked (it might actually take more than one call in the unlikely event the TCP packet gets fragmented, so check incoming length until you have exactly 50 bytes in hand;-).
After that, understanding that C code is puzzling. The assignments of ints (presumably 4-bytes each) to Packet[9], Packet[13], etc, give the impression that the intention is to set 4 bytes at a time within Packet, but that's not what happens: each assignment sets exactly one byte in the packet, from the lowest byte of the int that's the RHS of the assignment. But those bytes are the bytes of (int)(720000+armcontrolpacket->dof0_rot*1000) and so on...
So must those last 44 bytes of the packet be interpreted as 11 4-byte integers (signed? unsigned?) or 44 independent values? I'll guess the former, and do...:
import struct
f = '>x4bx11i'
values = struct.unpack(f, packet)
the format f indicates: big-endian, 4 unsigned-byte values surrounded by two ignored "spare" bytes, 11 4-byte signed integers. Tuple values ends up with 15 values: the four single bytes (50, 1, 2, 1 in your example), then 11 signed integers. You can use the same format string to pack a modified version of the tuple back into a 50-bytes packet to resend.
Since you explicitly place the length in the packet it may be that different packets have different lenghts (though that's incompatible with the fixed-length declaration in your C sample) in which case you need to be a bit more accurate in receiving and unpacking it; however such details depend on information you don't give, so I'll stop trying to guess;-).

Take a look at the struct module, specifically the pack and unpack functions. They work with format strings that allow you to specify what types you want to write or read and what endianness and alignment you want to use.

Related

Need help identifying a bit manipulation technique

I need help identifying the following technique. Is a lengthy read so please try to follow. My question is if this is a known standard, does it have a name, can anyone relate or seen this before. What is the benefit. Also in case you wonder, this is related to a packet captured on a long forgotten online PS2 game and I am part of a team that is trying to bring it back.
Note that this is not the size as described by the ip protocol this size representation is withing the actual payload and it is for client and server consumption.
The following read describes the how the size of the message is being represented.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The true packet length is 94 bytes long.
These are bytes 5-6 [CF E0] on the payload data after all of the ip protocol stuff.
Also, note that we must interpret these two bytes as being in little endian format. Thus, we should think of these two bytes as being
[E0 CF]
We determine the Packet Class from these two bytes by taking the first nibble(4 bits) of the first byte. In this particular case, this is just 0xE. We would then identify this packet as having a packet class of 0xE. This was identified as a Session Initiator Packet Class.
Now, to determine the packet length from the remaining nibble and second byte. First we convert the second byte to decimal, we get 0xCF = 207. The different between this value and the actual length is 207-94=113 bytes. Originally I knew this byte was proportional to the packet length, but just had some offset. I wasn't sure where this offset came from. Additionally, this offset seemed to change for different packets. More study was required.
Eventually, I found out that each packet class had a different offset. So I needed to examine only packets in the same packet class to figure out the offset for that packet class. In doing this, I made a table of all the reported lengths (in byte 5) and compared that to the actual packet length. What I discovered is that
almost all of reported packet lengths in byte 5 were greater than 0x80=128.
the second nibble in the other byte worked as a type of multiplier for the packet length
that each packet class had an associated minimum packet length and maximum packet length that could be represented. For the 0xC packet class I was examining, the minimum packet size was 18 bytes and the maximum packet size was approximately 10*128 +17 = 1297 bytes.
This led to the following approach to extract the packet length from the fifth and sixth byte packet header. First note that we have previously determined the packet class to be 0xE and that the minimum packet size associated with this packet class is 15 bytes. Now, take the second nibble of the first byte [0xE0] = 0 in this case and multiply it by 128 bytes 0*128 = 0 bytes. Now add this to the second byte [0xCF] = 207 in this case and subtract out 128. So 0 + 207 - 128 = 79. Now we need to add in the minimum packet size for this packet class 0xE = 15 byte minimum packet size. So (0*128)+(207-128) -15 = 94. This is the reported true packet size.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This formula was tested on 20,000 subsequent packets and it works. But why go trough all that trouble just to indicate the size of the message that follows? I thought it was a form of encryption but the rest of the message is not encrypted at all. The formula is understood but I don't see a benefit. I am thinking that maybe is a way to optimize the size of the packet by passing a number greater than 255 using only one byte but that only saves me exactly one byte, throwing another byte yields a Max value of 65,535 so why not throw another byte into the byte stream. I am sure one extra byte is not going to have a great impact on the network so what could be the purpose. I thought that maybe someone else would see what's missing or connected to some kind of documented standard, protocol, pattern, technique or something that is documented somewhere.
Also, I do not take credit for figuring out the formula above, that was done by another team member.
My best guess is that the receiver uses some form of variable-length base128 encoding, like LEB128.
But in this case, the sender, knowing the actual max size fits in 11 bits, forces the encoding to to use 2 bytes, and overloads the high nibble for "class". This makes the header size and construction time constant. And the receiver side can just mask out the class and run it through a standard decoder.
Send:
len -= minlen[class]
byte[5]=(len&0x7F)|0x80;
byte[6]=(len>>7)|(class<<4);
Receive:
class = byte[6]>>4;
byte[6]&=0xF;
len = decode(&byte[5]) + minlen[class];
where:
int decode(byte* data) {
int v=*data&0x7F;
while (*data & 0x80) {
data++;
v+=*data&0x7F;
}
return v;
}
One other possibility is that byte[5] is signed, and length is reconstructed by
(int8_t)byte[5] + 128*((byte[6]&0xF)+1) + minlen[byte[6]>>4];
But I can't think of any reason to construct it this way.

Send array of integers (winsock)

I have a client/server program I'm writing, I need to send information from client to server using the function send(SOCKET s, const char *buf, int len, int flags);
but apparently this function is made to said a string, array of characters, but what I'm sending is encrypter message and the characters values large and character type isn't enough.
is there another function that allows me to do so ? I looked the reference in microsoft website but I didn't get the other functions.
if there's another function I hope you can show me how or give me hints, or if there's another way to do then so be it.
Notes: I'm working with C++ under Windows 8, using Borland 6
This might be tricky to explain.
Your issue isn't in the function you're using, but in the concept you're trying to apply.
First of all, if your data is intended to be transmitted through network, you must assume that the destination endpoint endianness may differ from the transmitting endpoint.
With that in mind, it's advisable to convert the eligible data types prone to endianness interpretation to network byte order before transmitting any data. Take a look at the htons(), htonl(), ntohs() and ntohl() functions.
As you must deal with known data sizes, instead of declaring your array as int[], you should declare it through a stdint.h type, such as int16_t, int32_t, uint16_t, etc.
So, lets assume you've the following:
uint32_t a[4] = { 1, 2, 3, 4 };
If you want to transmit this array in a portable way, you should first convert its contents to network byte order:
uint32_t a_converted[4];
for (int i = 0; i < sizeof(a); i ++)
a_converted[i] = htonl(a[i]);
Now, if you want to transmit this array, you can do it using:
send(s, (char *) a_converted, sizeof(a_converted), flags);
Just remember that the code for receiving this data, should convert it from network byte order to host byte order, using, in this case, the ntohl() for each element received.
Hope this gives you some clues for further research.
Well doodleboodle, guess what, if you read the TCP RFC, you might under stand that the TCP protocol only transfers OCTET STREAMS and, if you need to transfer anything more complex than one byte, you need a protocol on top of TCP that defines your Application Protocol Unit message type.
send(SOCKET s, const char *buf, int len, int flags); is basically the way to do it.
It uses binary data in bytes to send the data. So if you want to send a complex structure/object, you'll need to serialize it to a byte array first.
In your case with the integers it's quite simple: just convert the integer array to a byte array. (keep track of the length though).
Of course it's more appropriate to build an abstraction layer on top of your TCP layer so it's easier to send/receive different kinds of data.

C++ - Creating an integer of bits and nibbles

For a full background (you don't really need to understand this to understand the problem but it may help) I am writing a CLI program that sends data over Ethernet and I wish to add VLAN tags and priority tags to the Ethernet headers.
The problem I am facing is that I have a single 16 bit integer value that is built from three smaller values: PCP is 3 bits long (so 0 to 7), DEI is 1 bit, then VLANID is 12 bits long (0-4095). PCP and DEI together form the first 4 bit nibble, 4 bits from VLANID add on to complete the first byte, the remaining 8 bits from VLANID form the second byte of the integer.
11123333 33333333
1 == PCP bits, 2 == DEI bit, 3 == VLANID bits
Lets pretend PCP == 5, which in binary is 101, DEI == 0, and VLANID == 164 which in binary is 0000 10100011. Firstly I need to compile these values together like to form the following:
10100000 10100101
The problem I face is then when I copy this integer into a buffer to be encoded onto the wire (Ethernet medium) the bit ordering changes as follows (I am printing out my integer in binary before it gets copied to the wire and using wireshark to capture it on the wire to compare):
Bit order in memory: abcdefgh 87654321
Bit order on the wire: 8765321 abcdefgh
I have two problems here really:
The first is creating the 2 byte integer by "sticking" the three smaller ones together
The second is ensuring the order of bits is that which will be encoded correctly onto the wire (so the bytes aren't in the reverse order)
Obviously I have made an attempt at this code to get this far but I'm really out of my depth and would like to see someone’s suggestion from scratch, rather than posting what I have done so far and someone suggestion how to change that it to perform the required functionality in a possibly hard to read and long winded fashion.
The issue is byte ordering, rather than bit ordering. Bits in memory don't really have an order because they are not individually addressable, and the transmission medium is responsible for ensuring that the discrete entities transmitted, octets in this case, arrive in the same shape they were sent in.
Bytes, on the other hand, are addressable and the transmission medium has no idea whether you're sending a byte string which requires that no reordering be done, or a four byte integer, which may require one byte ordering on the receiver's end and another on the sender's.
For this reason, network protocols have a declared 'byte ordering' to and from which all sender's and receivers should convert their data. This way data can be sent and retrieved transparently by network hosts of different native byte orderings.
POSIX defines some functions for doing the required conversions:
#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
'n' and 'h' stand for 'network' and 'host'. So htonl converts a 32-bit quantity from the host's in-memory byte ordering to the network interface's byte ordering.
Whenever you're preparing a buffer to be sent across the network you should convert each value in it from the host's byte ordering to the network's byte ordering, and any time you're processing a buffer of received data you should convert the data in it from the network's ordering to the host's.
struct { uint32_t i; int8_t a, b; uint16_t s; } sent_data = {100000, 'a', 'b', 500};
sent_data.i = htonl(sent_data.i);
sent_data.s = htons(sent_data.s);
write(fd, &sent_data, sizeof sent_data);
// ---
struct { uint32_t i; int8_t a, b; uint16_t s; } received_data;
read(fd, &received_data, sizeof received_data);
received_data.i = ntohl(received_data.i);
received_data.s = ntohs(received_data.s);
assert(100000 == received_data.i && 'a' == received_data.a &&
'a' == received_data.b && 500 == received_data);
Although the above code still makes some assumptions, such as that both the sender and receiver use compatible char encodings (e.g., that they both use ASCII), that they both use 8-bit bytes, that they have compatible number representations after accounting for byte ordering, etc.
Programs that do not care about portability and inter-operate only with themselves on remote hosts may skip byte ordering in order to avoid the performance cost. Since all hosts will share the same byte ordering they don't need to convert at all. Of course if a program does this and then later needs to be ported to a platform with a different byte ordering then either the network protocol has to change or the program will have to handle a byte ordering that is neither the network ordering nor the host's ordering.
Today the only common byte orderings are simply reversals of each other, meaning that hton and ntoh both do the same thing and one could just as well use hton both for sending and receiving. However one should still use the proper conversion simply to communicate the intent of the code. And, who knows, maybe someday your code will run on a PDP-11 where hton and ntoh are not interchangeable.

Can I make a single binary write for a C++ struct which contains a vector

I am trying to build and write a binary request and have a "is this possible" type question. It might be important for me to mention the recipiant of the request is not aware of the data structure I have included below, it's just expecting a sequence of bytes, but using a struct seemed like a handy way to prepare the pieces of the request, then write them easily.
Writing the header and footer is fine as they are fixed size but I'm running into problems with the struct "Details", because of the vector. For now Im writing to a file so I can check the request is to spec, but the intention is to write to a PLC using boost asio serial port eventually
I can use syntax like so to write a struct, but that writes pointer addresses rather than values when it gets to the vector
myFile.write((char*) &myDataRequest, drSize);
I can use this sytax to write a vector by itself, but I must include the indexer at 0 to write the values
myFile.write((char*) &myVector[0], vectorSize);
Is there an elegant way to binary write a struct containing a vector (or other suitable collection), doing it in one go? Say for example if I declared the vector differently, or am I resigned to making multiple writes for the content inside the struct. If I replace the vector with an array I can send the struct in one go (without needing to include any indexer) but I dont know the required size until run time so I don't think it is suitable.
My Struct
struct Header
{ ... };
struct Details
{
std::vector<DataRequest> DRList;
};
struct DataRequest
{
short numAddresses; // Number of operands to be read Bytes 0-1
unsigned char operandType; // Byte 2
unsigned char Reserved1; //Should be 0xFF Byte 3
std::vector<short> addressList; // either, starting address (for sequence), or a list of addresses (for non-sequential)
};
struct Footer
{ ... };
It's not possible because the std::vector object doesn't actually contain an array but rather a pointer to a block of memory. However, I'm tempted to claim that being able to write a raw struct like that is not desireable:
I believe that by treating a struct as a block of memory you may end up sending padding bytes, I don't think this is desireable.
Depending on what you write to you may find that writes are buffered anyway, so multiple write calls aren't actually less efficient.
Chances are that you want to do something with the fields being sent over. In particular, with the numeric values you send. This requires enforcing a byte order which both sides of the transmission agree on. In order to be portable, you should exlicitely convert the byte order to make sure that your software is portable (if this is required).
To make a long story short: I suspect writing out each field one by one is not less efficient, it also is more correct.
This is not really a good strategy, since even if you could do this you're copying memory content directly to file. If you change the architecture/processor your client will get different data. If you write a method taking your struct and a filename, which writes the structs values individually and iterates over the vector writing out its content, you'll have full control over the binary format your client expects and are not dependent on the compilers current memory representation.
If you want convenience for marshalling/unmarshalling you should take a look at the boost::serialization library. They do offer a binary archive (besides text and xml) but it has its own format (e.g. it has a version number, which serialization lib was used to dump the data) so it is probably not what your client wants.
What exactly is the format expected at the other end? You have to write
that, period. You can't just write any random bytes. The probability
that just writing an std::vector like you're doing will work is about
as close to 0 as you can get. But the probability that writing a
struct with only int will work is still less than 50%. If the other
side is expecting a specific sequence of bytes, then you have to write
that sequence, byte by byte. To write an int, for example, you must
still write four (or whatever the protocol requires) bytes, something
like:
byte[0] = (value >> 24) & 0xFF;
byte[1] = (value >> 16) & 0xFF;
byte[2] = (value >> 8) & 0xFF;
byte[3] = (value ) & 0xFF;
(Even here, I'm supposing that your internal representation of negative
numbers corresponds to that of the protocol. Usually the case, but not
always.)
Typically, of course, you build your buffer in a std::vector<char>,
and then write &buffer[0], buffer.size(). (The fact that you need a
reinterpret_cast for the buffer pointer should signal that your
approach is wrong.)

C++ Byte order in socket programming

In C++ we send data using socket on the network. I am aware that we need to use htons() , ntohs() function to maintain byte order big endian and little endian.
support we have following data to be sent
int roll;
int id;
char name[100];
This can also be wrapped into struct.
My confusion here is, for roll and id, we can use htons() function. But for the string name, what should and how should we do it? Do we need to use any such function? will it work on every machine like mac, intel and other network?
I want to send all three fields in one packet.
You'd use htonl for int, not htons.
The name doesn't need to be reordered, since the bytes of the array correspond directly to bytes on the network.
The issue of byte-order only arises with words larger than a byte, since different architectures choose different ends at which to place the least-significant byte.
For char arrays this conversion is not necessary since they do not have a network byte order but are sequentially transmitted. The reason that ntohs and htons exist, is that some data types consist of lesser and more significant bits, which are interpreted differently on different architectures. This is not the case in strings.
To add to helpful comments here - if your structs get much more complex you could be better off considering a serialization library like Boost.Serialization or Google Protocol Buffers, which handle endianness for you under the covers.
When encoding the string, make sure you send a length (probably a short handled using htons) before the string itself, don't just send 100 chars every time.