Working with individual bytes using unsigned char arrays

Working with individual bytes using unsigned char arrays - c++

I've searched through many sites and can not seem to find anything relevant.
I would like to be able to take the individual bytes of each default data types such as short, unsigned short, int, unsigned int, float and double, and to store each individual byte information(binary part) into each index of the unsigned char array. How can this be achieved?
For example:
int main() {
short sVal = 1;
unsigned short usVal = 2;
int iVal = 3;
unsigned int uiVal = 4;
float fVal = 5.0f;
double dVal = 6.0;
const unsigned int uiLengthOfShort = sizeof(short);
const unsigned int uiLengthOfUShort = sizeof(unsigned short);
const unsigned int uiLengthOfInt = sizeof(int);
const unsigned int uiLengthOfUInt = sizeof(unsigned int);
const unsigned int uiLengthOfFloat = sizeof(float);
const unsigned int uiLengthOfDouble = sizeof(double);
unsigned char ucShort[uiLengthOfShort];
unsigned char ucUShort[uiLengthOfUShort];
unsigned char ucInt[uiLengthOfInt];
unsigned char ucUInt[uiLengthOfUInt];
unsigned char ucFloat[uiLengthOfFloat];
unsigned char ucDouble[uiLengthOfDouble];
// Above I declared a variable val for each data type to work with
// Next I created a const unsigned int of each type's size.
// Then I created unsigned char[] using each data types size respectively
// Now I would like to take each individual byte of the above val's
// and store them into the indexed location of each unsigned char array.
// For Example: - I'll not use int here since the int is
// machine and OS dependent.
// I will use a data type that is common across almost all machines.
// Here I will use the short as my example
// We know that a short is 2-bytes or has 16 bits encoded
// I would like to take the 1st byte of this short:
// (the first 8 bit sequence) and to store it into the first index of my unsigned char[].
// Then I would like to take the 2nd byte of this short:
// (the second 8 bit sequence) and store it into the second index of my unsigned char[].
// How would this be achieved for any of the data types?
// A Short in memory is 2 bytes here is a bit representation of an
// arbitrary short in memory { 0101 1101, 0011 1010 }
// I would like ucShort[0] = sVal's { 0101 1101 } &
// ucShort[1] = sVal's { 0011 1010 }
ucShort[0] = sVal's First Byte info. (8 Bit sequence)
ucShort[1] = sVal's Second Byte info. (8 Bit sequence)
// ... and so on for each data type.
return 0;
}

Ok, so first, don't do that if you can avoid it. Its dangerous and can be extremely dependent on architecture.
The commentators above are correct, union is the safest way to do it, you have the endian problem still, yes, but at least you don't have the stack alignment problem (I assume this is for network code, so stack-alignment is another potential architecture problem)
This is what I've found to be the most straight-forward way to do this:
uint32_t example_int;
char array[4];
//No endian switch
array[0] = ((char*) &example_int)[0];
array[1] = ((char*) &example_int)[1];
array[2] = ((char*) &example_int)[2];
array[3] = ((char*) &example_int)[3];
//Endian switch
array[0] = ((char*) &example_int)[3];
array[1] = ((char*) &example_int)[2];
array[2] = ((char*) &example_int)[1];
array[3] = ((char*) &example_int)[0];
If you're trying to write cross-architecture code, you will need to deal with endian problems one way or another. My suggestion is to construct a short endian test and build functions to "pack" and "unpack" byte arrays based on the above method. It should be noted that to "unpack" a byte array, simply reverse the above assignment statements.

The simplest correct way is:
// static_assert(sizeof ucShort == sizeof sVal);
memcpy( &ucShort, &sVal, sizeof ucShort);
The stuff you write in comments is not correct; all types have machine-dependent size, other than character types.

With the help of Raw N by providing me a website, I did a search on byte manipulation and found this thread - http://www.cplusplus.com/forum/articles/12/ and it presents a similar solution towards what I am looking for, however I would have to repeat this process for every default data type.

After doing some testing this is what I have come up with so far and this is dependent on machine architecture, but to do this on other machines the concept is the same.
typedef struct packed_2bytes {
unsigned char c0;
unsigned char c1;
} packed_2bytes;
typedef struct packed_4bytes {
unsigned char c0;
unsigned char c1;
unsigned char c2;
unsigned char c3;
} packed_4bytes;
typedef struct packed_8bytes {
unsigned char c0;
unsigned char c1;
unsigned char c2;
unsigned char c3;
unsigned char c4;
unsigned char c5;
unsigned char c6;
unsigned char c7;
} packed_8bytes;
typedef union {
short s;
packed_2bytes bytes;
} packed_short;
typedef union {
unsigned short us;
packed_2bytes bytes;
} packed_ushort;
typedef union { // 32bit machine, os, compiler only
int i;
packed_4bytes bytes;
} packed_int;
typedef union { // 32 bit machine, os, compiler only
unsigned int ui;
packed_4bytes bytes;
} packed_uint;
typedef union {
float f;
packed_4bytes bytes;
} packed_float;
typedef union {
double d;
packed_8bytes bytes;
} packed_double;
There is no implementation of use only the declarations or definitions to these types. I do think that they should contain which ever endian is being used, but the person who is using them has to know this ahead of time just as knowing the machines architectures sizes for each of the default types. I am not sure if there would be a problem with signed int or not due to one's, two's compliment or signed bit implementations, but it could also be something to consider.

Related

C++ , create structure with an optional value

I need to create structure with an optional value :
typedef struct pkt_header{
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
unsigned short Protected_Payload_Length; // optional (present/not present)
unsigned short Version;
} PKT_HEADER;
How can i sometimes use pkt_header->Protected_Payload_Length and sometimes not use this value in a struct when the field is not present ?
My first idea is to declare unsigned char * Protected_Payload_Length and pass NULL when i not use the field and use the unsigned char* for store my unsigned short value.
typedef struct pkt_header{
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
unsigned char * Protected_Payload_Length; // optional
unsigned short Version;
} PKT_HEADER;
I prepare my packet like this (and send this):
PKT_HEADER header;
header.Packet_Type = 0x0001;
header.Unprotected_Payload_Length = 0x0b00;
header.Protected_Payload_Length = NULL;
header.Version = 0x0000;
I receive response and do this :
PKT_HEADER * header= (PKT_HEADER*)recvbuf;
printf("Packet_Type : %04x\n", header->Packet_Type);
printf("Unprotected_Payload_Length : %04x\n", header->Unprotected_Payload_Length);
printf("Version : %04x\n", header->Version);
But in this case, if i understand correctly, unsigned char * Protected_Payload_Length contain a pointer with a length of 4 bytes then header->Protected_Payload_Length contain 4 bytes but i need 0 byte because the value/field is not present in this precise case.
Do I have to declare an appropriate structure in the data format or is there some other way to play with the structures?
Thanks for your help.

Beware. Structs can have padding, members are not necessarily adjacent in memory. Moreover reinterpreting something as a PKT_HEADER when that something is not a PKT_HEADER object is not allowed. Instead of casting:
PKT_HEADER * header= (PKT_HEADER*)recvbuf;
you probably should use memcpy. Having said this, now to your actual question...
If you rely on members having a specific order in the struct, then inheritance is not an option. In memory the base object comes first, then the derived members, you cannot mix that. For example
struct foo {
int x;
};
struct bar : foo {
int y;
int z;
};
Then a bar object will have in memory
| x | optional padding | y | optional padding | z | optional padding |
There is no simple way to get | y | x | z |.
If you want two different types the easiest is to define two different types:
struct PKT_HEADER_A {
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
unsigned short Protected_Payload_Length; // present
unsigned short Version;
};
struct PKT_HEADER_B {
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
//unsigned short Protected_Payload_Length; // not present
unsigned short Version;
};
Note that your way to typedef the struct is a C-ism. It is not necessary (and not recommended) in C++.

You should probably take a look at the packing done on NanoPb or Protobuff , because it sounds like you have a packing problem. Data should be pieced together before sending, and the Packet_Type would encode which header to decode/encode with.
If you can't properly pack/unpack, an alternative is to create both
typedef struct {
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
unsigned short Protected_Payload_Length;
unsigned short Version;
} PKT_HEADER_FULL;
typedef struct {
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
unsigned short Version;
} PKT_HEADER_SHORT;
then create your a packet header
typedef union u{
PKT_HEADER_FULL full;
PKT_HEADER_SHORT concat;
}PKT_HEADER;
// or as this
typedef struct{
unsigned short Protected_Payload_Length;
unsigned short version
}longform;
typedef struct{
unsigned short Packet_Type;
unsigned short Unprotected_Payload_Length;
union u{
longform l;
unsigned short version;
};
} PKT_HEADER;
Then the data coming in could be decoded either way (again, depending on Packet_type), and the remaining space can be ignored. A caveat to this method is you can't use sizeof(PKT_HEADER) because the struct size will always be the larger value.

Put data from a QByteArray in a Struct

I'm working with a serial device that returns a byte array.
In this array are values that are stored in unsigned shorts and unsigned chars.
I have the following structure:
typedef struct {
unsigned short RPM; //0
unsigned short Intakepress; //1
unsigned short PressureV; //2
unsigned short ThrottleV; //3
unsigned short Primaryinp; //4
unsigned short Fuelc; //5
unsigned char Leadingign; //6
unsigned char Trailingign; //7
unsigned char Fueltemp; //8
unsigned char Moilp; //9
unsigned char Boosttp; //10
unsigned char Boostwg; //11
unsigned char Watertemp; //12
unsigned char Intaketemp; //13
unsigned char Knock; //14
unsigned char BatteryV; //15
unsigned short Speed; //16
unsigned short Iscvduty; //17
unsigned char O2volt; //18
unsigned char na1; //19
unsigned short Secinjpulse; //20
unsigned char na2; //21
} fc_adv_info_t;
what's the best way to map the array to this structure? The order in the array received from the serial device matches the structure.

First of all, your description of the type of data in the structure using C-like syntax is ambiguous. It tells us nothing about the size of a short or char type, nor about the endianness of the data! A short int doesn't have to be 16 bits wide, neither is char always 8 bits! At the very least, you should use the fixed width integer types, or their Qt equivalents, and specify their endianness.
Also, typedef struct is a C-ism, unnecessary in C++. Drop the typedef.
Assuming a big endian packet, unsigned short to mean uint16_t and unsigned char to mean uint8_t, here is how you could do it:
struct FcAdvInfo { // this structure shouldn't be packed or anything like that!
quint16 RPM;
quint16 IntakePress;
...
quint8 LeadingIgn;
...
FcAdvInfo parse(const QByteArray &);
};
FcAdvInfo FcAdvInfo::parse(const QByteArray & src) {
FcAdvInfo p;
QDataStream ds(src);
ds.setByteOrder(QDataStream::BigEndian);
ds
>> p.RPM
>> p.IntakePress
...
>> p.LeadingIgn
...
;
return p;
}
Finally, if your struct comes from some C code, you must understand that it's not portable, and even on the same CPU, if you upgrade the compiler, the packing and the size of structure types can and will change! So don't do it. A C/C++ struct declaration implies nothing about how the data is arranged in memory, other than the chosen arrangement doesn't lead to undefined behavior, and must agree with other requirements of the standard (there are just a few). That's all, pretty much.

First, I would say that is is not safe to pack unsigned short type in structures that you are going to serialize/deserialize and exchange with other devices: unsigned short is usually 16-bit, but you can't take that as guaranteed, it is platform dependent.
It is even worse if struct members are not aligned so that compiler inserts paddings in the struct.
If binary data received from serial port is kept in QByteArray and byte order and "unsigned short" types are ok then to map a data in QByteArray on the struct you can use the code below. Note, that it is only correct if your struct is packed and has no padding gaps within it, use struct packing technique for your compiler (see Structure padding and packing).
QByteArray bArr;
bArr.resize(sizeof(fc_adv_info_t));
// do something to fill bArr with received data
fc_adv_info_t* info=reinterpret_cast<fc_adv_info_t*>(bArr.data());

Deserializing a DNS packet object in C++11

I have a DNS packet class which looks like this (I am pasting only part of it):
class DNSPacket {
public:
struct DNSHeader {
unsigned int ID :16;
unsigned int QR :1;
unsigned int OPCODE :4;
unsigned int AA :1;
unsigned int TC :1;
unsigned int RD :1;
unsigned int RA :1;
unsigned int Z :3;
unsigned int RCODE :4;
unsigned int QDCOUNT :16;
unsigned int ANCOUNT :16;
unsigned int NSCOUNT :16;
unsigned int ARCOUNT :16;
};
private:
DNSHeader header;
std::vector<DNSQuestion> questions;
std::vector<DNSAnswer> answers;
std::vector<DNSAnswer> nameservers; // TODO: DNSAnswer?
std::vector<DNSAnswer> add_records; // TODO: DNSAnswer?
}
What would be the right way to deserialize a char array into this object? The options I have are: overloading >> operator, adding a separate class to deserialize it to read and deserialize the data reading byte after byte and using reinterpret_cast().
I want to create a fast, modern implementation in C++11. Which way should I choose? Also, how should I deserialize the bitfields - should I stick with bitwise operations?

The way I would do it would be to treat word 2 (bits 16 to 31) as a single unsigned 16-bit integer (e.g. uint16_t) and simply get the bits by using bit-wise AND and SHIFT operations. All the other words can be read the same way, but used more or less as-is (after converting from network byte order to host byte order of course).
To get word X word as an uint16_t you have to do some casting:
uint16_t wordX = *reinterpret_cast<uint16_t*>(&your_array[X]);
Note that for the second word, since they are really just bits they are in the order specified, no byte order conversion is done on them.

reverse a number's bits

Here is a C++ class for revering bits from LeetCode discuss. https://leetcode.com/discuss/29324/c-solution-9ms-without-loop-without-calculation
For example, given input 43261596 (represented in binary as 00000010100101000001111010011100), return 964176192 (represented in binary as 00111001011110000010100101000000).
Is there anyone can explain it? Thank you so very much!!
class Solution {
public:
uint32_t reverseBits(uint32_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
unsigned int _08:1; unsigned int _09:1; unsigned int _10:1; unsigned int _11:1;
unsigned int _12:1; unsigned int _13:1; unsigned int _14:1; unsigned int _15:1;
unsigned int _16:1; unsigned int _17:1; unsigned int _18:1; unsigned int _19:1;
unsigned int _20:1; unsigned int _21:1; unsigned int _22:1; unsigned int _23:1;
unsigned int _24:1; unsigned int _25:1; unsigned int _26:1; unsigned int _27:1;
unsigned int _28:1; unsigned int _29:1; unsigned int _30:1; unsigned int _31:1;
} *b = (bs*)&n,
c =
{
b->_31, b->_30, b->_29, b->_28
, b->_27, b->_26, b->_25, b->_24
, b->_23, b->_22, b->_21, b->_20
, b->_19, b->_18, b->_17, b->_16
, b->_15, b->_14, b->_13, b->_12
, b->_11, b->_10, b->_09, b->_08
, b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(unsigned int *)&c;
}
};

Consider casting as providing a different layout stencil on memory.
Using this stencil picture, the code is a layout of a stencil of 32-bits on an unsigned integer memory location.
So instead of treating the memory as a uint32_t, it is treating the memory as 32 bits.
A pointer to the 32-bit structure is created.
The pointer is assigned to the same memory location as the uint32_t variable.
The pointer will allow different treatment of the memory location.
A temporary variable, of 32-bits (using the structure), is created.
The variable is initialized using an initialization list.
The bit fields in the initialization list are from the original variable, listed in reverse order.
So, in the list:
new bit 0 <-- old bit 31
new bit 1 <-- old bit 30
The foundation of this approach relies on initialization lists.
The author is letting the compiler reverse the bits.

The solution uses brute force to revert the bits.
It declares a bitfield structure (that's when the members are followed by :1) with 32 bit fields of one bit each.
The 32 bit input is then seen as such structure, by casting the address of the input to a pointer to the structure. Then c is declared as a variable of that type which is initialized by reverting the order of the bits.
Finally, the bitfield represented by c is reinterpreted as an integer and you're done.
The assembler is not very interesting, as the gcc explorer shows:
https://goo.gl/KYHDY6

It doesn't convert per see, but it just looks at the same memory address differently. It uses the value of the int n, but gets a pointer to that address, typecasts the pointer, and that way, you can interpret the number as a struct of 32 individual bits. So through this struct b you have access to the individual bits of the number.
Then, of a new struct c, each bit is bluntly set by putting bit 31 of the number in bit 0 of the output struct c, bit 30 in bit 1, etcetera.
After that, the value at the memory location of the struct is returned.

First of all, the posted code has a small bug. The line
return *(unsigned int *)&c;
will not return an accurate number if sizeof(unsigned int) is not equal to sizeof(uint32_t).
That line should be
return *(uint32_t*)&c;
Coming to the question of how it works, I will try to explain it with a smaller type, an uint8_t.
The function
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
} *b = (bs*)&n,
c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
uses a local struct. The local struct is defined as:
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
That struct has eight members. Each member of the struct is a bitfield of width 1. The space required for an object of type bs is 8 bits.
If you separate the definition of the struct and the variables of that type, the function will be:
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
bs *b = (bs*)&n;
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
Now, lets' say the input to the function is 0xB7, which is 1011 0111 in binary. The line
bs *b = (bs*)&n;
says:
Take the address of n ( &n )
Treat it like it is a pointer of type bs* ( (bs*)&n )
Assign the pointer to a variable. (bs *b =)
By doing that, we are able to pick each bit of n and get their values by using the members of b. At the end of that line,
The value of b->_00 is 1
The value of b->_01 is 0
The value of b->_02 is 1
The value of b->_03 is 1
The value of b->_04 is 0
The value of b->_05 is 1
The value of b->_06 is 1
The value of b->_07 is 1
The statement
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
simply creates c such that the bits of c are reversed from the bits of *b.
The line
return *(uint8_t *)&c;
says:
Take the address of c., whose value is the bit pattern 1110 1101.
Treat it like it is a pointer of type uint8_t*.
Dereference the pointer and return the resulting uint8_t
That returns an uint8_t whose value is bitwise reversed from the input argument.

This isn't exactly obfuscated but a comment or two would assist the innocent. The key is in the middle of the variable declarations, and the first step is to recognize that there is only one line of 'code' here, everything else is variable declarations and initialization.
Between declaration and initialization we find:
} *b = (bs*)&n,
c =
{
This declares a variable 'b' which is a pointer (*) to a struct "bs" just defined. It then casts the address of function argument 'n', a unit_32_t, to the type pointer-to-bs, and assigns it to 'b', effectively creating a union of uint_32_t and the bit array bs.
A second variable, an actual struct bs, named "c", is then declared, and it is initialized through the pointer 'b'. b->_31 initializes c._00, and so on.
So after "b" and "c" are created, in that order, there's nothing left to do but return the value of "c".
The author of the code, and the compiler, know that after a struct definition ends, variables of that type or related to that type can be created, before ";", and that's why #Thomas Matthews closes with, "The author is letting the compiler reverse the bits."

C struct sizes inconsistence [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I find the size of a struct?
Struct varies in memory size?
I am using following struct for network communication, It creates lots of unnecessary bytes in between.
It gives different size than expected 8 Bytes.
struct HttpPacket {
unsigned char x1;
union {
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
unsigned char bytes[7];
unsigned long num;
}
And Following gives different size even though that I am removing a field from a union
struct HttpPacket {
unsigned char x1;
union {
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
unsigned long num;
}
Also, A more clear example
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
And it gives a size of 8, instead of 7.
And I add one more field, It still gives the same size
struct {
unsigned char EXTRAADDEDFIELD;
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
Can someone please help on resolving this issue ?
UPDATE: I need the format to hold while transmitting the packet, So I want to skip these paddings

C makes no guarantees on the size of a struct. The compiler is allowed to line up the members however it wants. Usually, as in this case, it will make the size word-aligned since that's fastest on most machines.

Ever heard of alignment and padding?
Basically, to ensure fast access, certain types have to be on certain bounds of memory addresses.
This is called alignment.
To achieve that, the compiler is allowed to insert bytes into your data structure to achieve that alignment.
This is called padding.

By default, structure fields are aligned on natural boundaries. For example, a 4-byte field will start on a 4-byte boundary. The compiler inserts pad bytes to achieve this. You can avoid the padding by using #pragma pack(0) or other similar compiler directives

If you have a C99 compiler and can use the "new" fixed-width types: make an array of uint8_t and do the separation in members yourself.
uint8_t data[8];
x1 = data[0];
len = data[1];
host = data[2] * 256 + data[3]; /* big endian */
content[0] = data[4];
content[1] = data[5];
content[2] = data[6];
content[3] = data[7];
/* ... */
You can follow the same procedure in C89 if you can rely on CHAR_BIT being 8.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Working with individual bytes using unsigned char arrays - c++

The simplest correct way is: // static_assert(sizeof ucShort == sizeof sVal); memcpy( &ucShort, &sVal, sizeof ucShort); The stuff you write in comments is not correct; all types have machine-dependent size, other than character types.

With the help of Raw N by providing me a website, I did a search on byte manipulation and found this thread - http://www.cplusplus.com/forum/articles/12/ and it presents a similar solution towards what I am looking for, however I would have to repeat this process for every default data type.

Related

C++ , create structure with an optional value

Put data from a QByteArray in a Struct

Deserializing a DNS packet object in C++11

reverse a number's bits

C struct sizes inconsistence [duplicate]

Categories

Resources