How to pack data in binary format in c++ - c++

Say, i have binary protocol, where first 4 bits represent a numeric value which can be less than or equal to 10 (ten in decimal).
In C++, the smallest data type available to me is char, which is 8 bits long. So, within my application, i can hold the value represented by 4 bits in a char variable. My question is, if i have to pack the char value back into 4 bits for network transmission, how do i pack my char's value back into 4 bits?

You do bitwise operation on the char;
so
unsigned char packedvalue = 0;
packedvalue |= 0xF0 & (7 <<4);
packedvalue |= 0x0F & (10);
Set the 4 upper most bit to 7 and the lower 4 bits to 10
Unpacking these again as
int upper, lower;
upper = (packedvalue & 0xF0) >>4;
lower = packedvalue & 0x0F;
As an extra answer to the question -- you may also want to look at protocol buffers for a way of encoding and decoding data for binary transfers.

Sure, just use one char for your value:
std::ofstream outfile("thefile.bin", std::ios::binary);
unsigned int n; // at most 10!
char c = n << 4; // fits
outfile.write(&c, 1); // we wrote the value "10"
The lower 4 bits will be left at zero. If they're also used for something, you'll have to populate c fully before writing it. To read:
infile.read(&c, 1);
unsigned int n = c >> 4;

Well, there's the popular but non-portable "Bit Fields". They're standard-compliant, but may create a different packing order on different platforms. So don't use them.
Then, there are the highly portable bit shifting and bitwise AND and OR operators, which you should prefer. Essentially, you work on a larger field (usually 32 bits, for TCP/IP protocols) and extract or replace subsequences of bits. See Martin's link and Soren's answer for those.

Are you familiar with C's bitfields? You simply write
struct my_bits {
unsigned v1 : 4;
...
};
Be warned, various operations are slower on bitfields because the compiler must unpack them for things like addition. I'd imagine unpacking a bitfield will still be faster than the addition operation itself, even though it requires multiple instructions, but it's still overhead. Bitwise operations should remain quite fast. Equality too.
You must also take care with endianness and threads (see the wikipedia article I linked for details, but the issues are kinda obvious). You should leearn about endianness anyways since you said "binary protocol" (see this previous questions)

Related

How to copy part of int64_t to char[4] in c++?

I have a variable:
int64_t label : 40
I want to take the 32 lower bits and put them in a variable of type:
char nol[4]
How can I do that in c++?
Depends on what you mean by "lower" bits. The word "lower" normally implies lower memory address. But that's rarely useful. You may be thinking of least significant instead, which is more commonly useful.
You must also consider what order you want the bytes to be in the array. When copying the lower bytes, you typically want to keep the bytes in the same order as in the integer i.e. native endianness. When copying least significant bytes, you typically want a specific order which may differ from the native endianness i.e. either big or little endian. Big endian is conventionally used in network communication.
If the number of bits to copy is not a multiple of byte size, then copying the incomplete byte adds some complexity.
Copying the lower bytes in native order is very simple:
char nol[32 / CHAR_BIT];
std::memcpy(nol, &label, sizeof nol);
Here is an example of copying least significant bytes in big endian order:
for (int i = 0; i < sizeof nol; i++) {
nol[sizeof nol - i] = label >> CHAR_BIT * i & UCHAR_MAX;
}

Lower 25 bits of uint64_t

I am trying to extract the lower 25 bits of uint64_t to uint32_t. This solution shows how to extract lower 16 bits from uint32_t, but I am not able to figure out for uint64_t. Any help would be appreciated.
See How do you set, clear, and toggle a single bit? for bit operations.
To answer your question:
uint64_t lower25Bits = inputValue & (uint64_t)0x1FFFFFF;
Just mask with a mask that leaves just the bits you care about.
uint32_t out = input & ((1UL<<26)-1);
The idea here is: 1UL<<26 provides an (unsigned long, which is guaranteed to be at least 32-bit wide) integer with just the 26th bit set, i.e.
00000100000000000000000000000000
the -1 makes it become a value with all the bits below it set, i.e.:
00000011111111111111111111111111
the AND "lets through" only the bits that in the mask correspond to zero.
Another way is to throw away those bits with a double shift:
uint32_t out = (((uint32_t)input)<<7)>>7;
The cast to uint32_t makes sure we are dealing with a 32-bit wide unsigned integer; the unsigned part is important to get well-defined results with shifts (and bitwise operations in general), the 32 bit-wide part because we need a type with known size for this trick to work.
Let's say that (uint32_t)input is
11111111111111111111111111111111
we left shift it by 32-25=7; this throws away the top 7 bits
11111111111111111111111110000000
and we right-shift it back in place:
00000001111111111111111111111111
and there we go, we got just the bottom 25 bits.
Notice that the first uint32_t cast wouldn't be strictly necessary because you already have a known-size unsigned value; you could just do (input<<39)>>39, but (1) I prefer to be sure - what if tomorrow input becomes a type with another size/signedness? and (2) in general current CPUs are more efficient working with 32 bit integers than 64 bit integers.

How can i use 6 bits to store value?

My data unit (a network packet header) i am currently working on has 2 flags in its definition, stored in a byte field and accessed via bitwise operators. Unfortunately, i need only 2 bits and thinking what i can do with other 6 bits? Can i use them to store number?
Can i use them to store some internal state code (value range smaller than char?) and do not just waste them.
Is there any data types smaller than byte and how can i use them in C++? If no, should i waste those bits and left them without meaning?
You could use a bit field, as described here.
Adapted from that page:
#include <iostream>
struct S {
// 6-bit unsigned field,
// allowed values are 0...63
unsigned int b : 6;
};
int main()
{
S s = {7};
++s.b;
std::cout << s.b << '\n'; // output: 8
}
In C++, there is no datatype smaller than a char, which is - by definition - one byte. However, you do not need a dedicated datatype to access the bits of a value. Bitwise logic and Bitwise Shift operators are sufficient.
If you cannot live with sacrificing 6 bits (this is assuming 8-bit bytes) you might want to consider the std::vector<bool> specialization. Note, though, that there are a number of restrictions and differences to a regular std::vector.
Another option to make individual (consecutive) bits of a datatype accessible by name is to use bit fields:
struct S {
unsigned int flags : 2;
unsigned int state : 6;
};
static_assert( sizeof( S ) == 1, "Packing is implementation-defined." );
This declares a structure that can hold two pieces of information: flags and state, which occupy 2 and 6 bits, respectively. Adjacent bit fields are usually packed together (although this behavior is implementation-defined).

dealing with endianness in c++

I am working on translating a system from python to c++. I need to be able to perform actions in c++ that are generally performed by using Python's struct.unpack (interpreting binary strings as numerical values). For integer values, I am able to get this to (sort of) work, using the data types in stdint.h:
struct.unpack("i", str) ==> *(int32_t*) str; //str is a char* containing the data
This works properly for little-endian binary strings, but fails on big-endian binary strings. Basically, I need an equivalent to using the > tag in struct.unpack:
struct.unpack(">i", str) ==> ???
Please note, if there is a better way to do this, I am all ears. However, I cannot use c++11, nor any 3rd party libraries other than Boost. I will also need to be able to interpret floats and doubles, as in struct.unpack(">f", str) and struct.unpack(">d", str), but I'll get to that when I solve this.
NOTE I should point out that the endianness of my machine is irrelevant in this case. I know that the bitstream I receive in my code will ALWAYS be big-endian, and that's why I need a solution that will always cover the big-endian case. The article pointed out by BoBTFish in the comments seems to offer a solution.
For 32 and 16-bit values:
This is exactly the problem you have for network data, which is big-endian. You can use the the ntohl to turn a 32-bit into host order, little-endian in your case.
The ntohl() function converts the unsigned integer netlong from network byte order to
host byte order.
int res = ntohl(*((int32_t) str)));
This will also take care of the case where your host is big-endian and won't do anything.
For 64-bit values
Non-standardly on linux/BSD you can take a look at 64 bit ntohl() in C++?, which points to htobe64
These functions convert the byte encoding of integer values from the byte order that
the current CPU (the "host") uses, to and from little-endian and big-endian byte
order.
For windows try: How do I convert between big-endian and little-endian values in C++?
Which points to _byteswap_uint64 and as well as a 16 and 32-bit solution and a gcc-specific __builtin_bswap(32/64) call.
Other Sizes
Most systems don't have values that aren't 16/32/64 bits long. At that point I might try to store it in a 64-bit value, shift it and they translate. I'd write some good tests. I suspectt is an uncommon situation and more details would help.
Unpack the string one byte at a time.
unsigned char *str;
unsigned int result;
result = *str++ << 24;
result |= *str++ << 16;
result |= *str++ << 8;
result |= *str++;
First, the cast you're doing:
char *str = ...;
int32_t i = *(int32_t*)str;
results in undefined behavior due to the strict aliasing rule (unless str is initialized with something like int32_t x; char *str = (char*)&x;). In practical terms that cast can result in an unaligned read which causes a bus error (a crash) on some platforms and slow performance on others.
Instead you should be doing something like:
int32_t i;
std::memcpy(&i, c, sizeof(i));
There are a number of functions for swapping bytes between the host's native byte ordering and a host independent ordering: ntoh*(), hton*(), where * is nothing, l, or s for the different types supported. Since different hosts may have different byte orderings then this may be what you want to use if the data you're reading uses a consistent serialized form on all platforms.
ntoh(i);
You can also manually move bytes around in str before copying it into the integer.
std::swap(str[0],str[3]);
std::swap(str[1],str[2]);
std::memcpy(&i,str,sizeof(i));
Or you can manually manipulate the integer's value using shifts and bitwise operators.
std::memcpy(&i,str,sizeof(i));
i = (i&0xFFFF0000)>>16 | (i&0x0000FFFF)<<16;
i = (i&0xFF00FF00)>>8 | (i&0x00FF00FF)<<8;
This falls in the realm of bit twiddling.
for (i=0;i<sizeof(struct foo);i++) dst[i] = src[i ^ mask];
where mask == (sizeof type -1) if the stored and native endianness differ.
With this technique one can convert a struct to bit masks:
struct foo {
byte a,b; // mask = 0,0
short e; // mask = 1,1
int g; // mask = 3,3,3,3,
double i; // mask = 7,7,7,7,7,7,7,7
} s; // notice that all units must be aligned according their native size
Again these masks can be encoded with two bits per symbol: (1<<n)-1, meaning that in 64-bit machines one can encode necessary masks of a 32 byte sized struct in a single constant (with 1,2,4 and 8 byte alignments).
unsigned int mask = 0xffffaa50; // or zero if the endianness matches
for (i=0;i<16;i++) {
dst[i]=src[i ^ ((1<<(mask & 3))-1]; mask>>=2;
}
If your as received values are truly strings, (char* or std::string) and you know their format information, sscanf(), and atoi(), well, really ato() will be your friends. They take well formatted strings and convert them per passed-in formats (kind of reverse printf).

How can I have exactly 2 bits in memory?

I should be able to store a value in a data structure that could go from 0 to 3.. so I need 2 bits. This data structure I will be great 2 ^ 16 locations. So, i want to have 2 ^ 16 * 2 (bits). In C + + do you use to have exactly 2 bits in memory?
You need two bits per unit (not three), so you can pack four units into one byte, or 16 units into one 32-bit integer.
So you will need a std::array<uint32_t, 4096> to accomodate 216 units of 2-bit values.
You access the nth value as follows:
unsigned int get(std::size_t n, std::array<uint32_t, 4096> const & arr)
{
const uint32_t u = arr[n / 16];
return (u >> (2 * (n % 16))) & 0x3;
}
Alternatively, you could go with a bitfield:
struct BF32 {
uint32_t u0 : 2;
uint32_t u1 : 2;
//...
uint32_t uF : 2;
}
And then make an std::array<BF32, 4096>.
You cannot allocate a single object that is less than 1 byte (because 1 byte is the smallest addressable unit in the system).
You can, however, have portions of a structure that are smaller than a byte using bitfields. You could create one of these to hold 8 of your values, the size of this is exactly 3 bytes:
#pragma pack(1) // MSVC requires this
struct three_by_eight {
unsigned value1 : 3;
unsigned value2 : 3;
unsigned value3 : 3;
unsigned value4 : 3;
unsigned value5 : 3;
unsigned value6 : 3;
unsigned value7 : 3;
unsigned value8 : 3;
}
__attribute__ ((packed)) // GCC requires this
;
These can be clumsy to work with since they can't be accessed using [].... Your best be would be to create your own class that works similar to a bitset but works on 3 bits instead of 1.
If you are not working on an embedded system and resources are sufficient, you can have a look at std::bitset<> which will make your job as a programmer easier.
But if you are working on an embedded system, the bitset is probably not good for you (your compiler probably doesn't even support templates). There are a number of techniques for manipulating bits, each with its own quirks; here's an article that might help you:
> http://www.atmel.com/dyn/resources/prod_documents/avr_3_04.pdf
0 to 3 has 4 possible values. Since log2(4) == 2, or because 2^2 == 4, you need two bits, no three.
You might want to use bit fields
There was a discussion on the size allocated to bit-field structs last night. A struct cannot be smaller than a byte, and with most machines and compilers will be either 2 or 4, depending on the compiler and word size. So, no, you can't get a 3-bit struct (2-bit as you actually need). You can, however, pack bits yourself into an array of, say, uint64_ts. Or you could make a struct with 16 2-bit members and see if gcc makes that a 4-byte struct, then use an array of those.
There a very old trick to sneak a couple of bits around if you already have some data structures. This is quite nasty and unless you have extremely good reasons, it is most likely not at all a good idea. I'm just pointing this out in case you really really need to save a couple of bits.
Due to alignment, pointers on x86 or x64 are often multiples of 4, hence the two least significant bits of such pointers (e.g. pointers to int) are always 0. You can exploit this and sneak your two bits in there, but you have to make sure to remove them, when accessing those pointers (depending on the architecture, I'm not sure here).
Again, this is nasty, dangerous and pretty UB but perhaps it is worth it in your case.
3^5 = 243
and can fit 5 entries in 8bits. You spend like 20% less space storing lot of data this way. All you need is lookup table for 2 directional lookups and manipulations.