C/C++ packing signed char into int - c++

I have need to pack four signed bytes into 32-bit integral type.
this is what I came up to:
int32_t byte(int8_t c) { return (unsigned char)c; }
int pack(char c0, char c1, ...) {
return byte(c0) | byte(c1) << 8 | ...;
}
is this a good solution? Is it portable (not in communication sense)?
is there a ready-made solution, perhaps boost?
issue I am mostly concerned about is bit order when converting of negative bits from char to int. I do not know what the correct behavior should be.
Thanks

char isn't guaranteed to be signed or unsigned (on PowerPC Linux, char defaults to unsigned). Spread the word!
What you want is something like this macro:
#include <stdint.h> /* Needed for uint32_t and uint8_t */
#define PACK(c0, c1, c2, c3) \
(((uint32_t)(uint8_t)(c0) << 24) | \
((uint32_t)(uint8_t)(c1) << 16) | \
((uint32_t)(uint8_t)(c2) << 8) | \
((uint32_t)(uint8_t)(c3)))
It's ugly mainly because it doesn't play well with C's order of operations. Also, the backslash-returns are there so this macro doesn't have to be one big long line.
Also, the reason we cast to uint8_t before casting to uint32_t is to prevent unwanted sign extension.

I liked Joey Adam's answer except for the fact that it is written with macros (which cause a real pain in many situations) and the compiler will not give you a warning if 'char' isn't 1 byte wide. This is my solution (based off Joey's).
inline uint32_t PACK(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
return (c0 << 24) | (c1 << 16) | (c2 << 8) | c3;
}
inline uint32_t PACK(sint8_t c0, sint8_t c1, sint8_t c2, sint8_t c3) {
return PACK((uint8_t)c0, (uint8_t)c1, (uint8_t)c2, (uint8_t)c3);
}
I've omitted casting c0->c3 to a uint32_t as the compiler should handle this for you when shifting and I used c-style casts as they will work for either c or c++ (the OP tagged as both).

You can avoid casts with implicit conversions:
uint32_t pack_helper(uint32_t c0, uint32_t c1, uint32_t c2, uint32_t c3) {
return c0 | (c1 << 8) | (c2 << 16) | (c3 << 24);
}
uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
return pack_helper(c0, c1, c2, c3);
}
The idea is that you see "convert all the parameters correctly. Shift and combine them", rather than "for each parameter, convert it correctly, shift and combine it". Not much in it, though.
Then:
template <int N>
uint8_t unpack_u(uint32_t packed) {
// cast to avoid potential warnings for implicit narrowing conversion
return static_cast<uint8_t>(packed >> (N*8));
}
template <int N>
int8_t unpack_s(uint32_t packed) {
uint8_t r = unpack_u<N>(packed);
return (r <= 127 ? r : r - 256); // thanks to caf
}
int main() {
uint32_t x = pack(4,5,6,-7);
std::cout << (int)unpack_u<0>(x) << "\n";
std::cout << (int)unpack_s<1>(x) << "\n";
std::cout << (int)unpack_u<3>(x) << "\n";
std::cout << (int)unpack_s<3>(x) << "\n";
}
Output:
4
5
249
-7
This is as portable as the uint32_t, uint8_t and int8_t types. None of them is required in C99, and the header stdint.h isn't defined in C++ or C89. If the types exist and meet the C99 requirements, though, the code will work. Of course in C the unpack functions would need a function parameter instead of a template parameter. You might prefer that in C++ too if you want to write short loops for unpacking.
To address the fact that the types are optional, you could use uint_least32_t, which is required in C99. Similarly uint_least8_t and int_least8_t. You would have to change the code of pack_helper and unpack_u:
uint_least32_t mask(uint_least32_t x) { return x & 0xFF; }
uint_least32_t pack_helper(uint_least32_t c0, uint_least32_t c1, uint_least32_t c2, uint_least32_t c3) {
return mask(c0) | (mask(c1) << 8) | (mask(c2) << 16) | (mask(c3) << 24);
}
template <int N>
uint_least8_t unpack_u(uint_least32_t packed) {
// cast to avoid potential warnings for implicit narrowing conversion
return static_cast<uint_least8_t>(mask(packed >> (N*8)));
}
To be honest this is unlikely to be worth it - chances are the rest of your application is written on the assumption that int8_t etc do exist. It's a rare implementation that doesn't have an 8 bit and a 32 bit 2's complement type.

"Goodness"
IMHO, this is the best solution you're going to get for this. EDIT: though I'd use static_cast<unsigned int> instead of the C-style cast, and I'd probably not use a separate method to hide the cast....
Portability:
There is going to be no portable way to do this because nothing says char has to be eight bits, and nothing says unsigned int needs to be 4 bytes wide.
Furthermore, you're relying on endianness and therefore data pack'd on one architecture will not be usable on one with the opposite endianness.
is there a ready-made solution, perhaps boost?
Not of which I am aware.

This is based on Grant Peters and Joey Adams' answers, extended to show how to unpack the signed values (the unpack functions rely on the modulo rules of unsigned values in C):
(As Steve Jessop noted in comments, there is no need for separate pack_s and pack_u functions).
inline uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3)
{
return ((uint32_t)c0 << 24) | ((uint32_t)c1 << 16) |
((uint32_t)c2 << 8) | (uint32_t)c3;
}
inline uint8_t unpack_c3_u(uint32_t p)
{
return p >> 24;
}
inline uint8_t unpack_c2_u(uint32_t p)
{
return p >> 16;
}
inline uint8_t unpack_c1_u(uint32_t p)
{
return p >> 8;
}
inline uint8_t unpack_c0_u(uint32_t p)
{
return p;
}
inline uint8_t unpack_c3_s(uint32_t p)
{
int t = unpack_c3_u(p);
return t <= 127 ? t : t - 256;
}
inline uint8_t unpack_c2_s(uint32_t p)
{
int t = unpack_c2_u(p);
return t <= 127 ? t : t - 256;
}
inline uint8_t unpack_c1_s(uint32_t p)
{
int t = unpack_c1_u(p);
return t <= 127 ? t : t - 256;
}
inline uint8_t unpack_c0_s(uint32_t p)
{
int t = unpack_c0_u(p);
return t <= 127 ? t : t - 256;
}
(These are necessary rather than simply casting back to int8_t, because the latter may cause an implementation-defined signal to be raised if the value is over 127, so it's not strictly portable).

You could also let the compiler do the work for you.
union packedchars {
struct {
char v1,v2,v3,v4;
}
int data;
};
packedchars value;
value.data = 0;
value.v1 = 'a';
value.v2 = 'b;
Etc.

Related

Type-pun uint64_t as two uint32_t in C++20

This code to read a uint64_t as two uint32_t is UB due to the strict aliasing rule:
uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];
Likewise, this code to write the upper and lower part of an uint64_t is UB due to the same reason:
uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;
*lower = 1;
*upper = 1;
How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast?
Using std::bit_cast:
Try it online!
#include <bit>
#include <array>
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
std::cout << std::hex << v[0] << " " << v[1] << std::endl;
// Convert two u32 -> one u64
auto y = std::bit_cast<uint64_t>(v);
std::cout << std::hex << y << std::endl;
}
Output:
87654321 12345678
1234567887654321
std::bit_cast is available only in C++20. Prior to C++20 you can manually implement std::bit_cast through std::memcpy, with one exception that such implementation is not constexpr like C++20 variant:
template <class To, class From>
inline To bit_cast(From const & src) noexcept {
//return std::bit_cast<To>(src);
static_assert(std::is_trivially_constructible_v<To>,
"Destination type should be trivially constructible");
To dst;
std::memcpy(&dst, &src, sizeof(To));
return dst;
}
For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again. std::bit_cast is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization.
One extra profit of bit arithmetics is that it handles correctly endianess, it is endianess independent, unlike std::bit_cast.
Try it online!
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
std::cout << std::hex << lo << " " << hi << std::endl;
// Convert two u32 -> one u64
uint64_t y = (uint64_t(hi) << 32) | lo;
std::cout << std::hex << y << std::endl;
}
Output:
87654321 12345678
123456788765432
Notice! As #Jarod42 points out, solution with bit shifting is not equivalent to memcpy/bit_cast solution, their equivalence depends on endianess. On little endian CPU memcpy/bit_cast gives least significant half (lo) as array element v[0] and most significant (hi) in v[1], while on big endian least significant (lo) goes to v[1] and most significant goes to v[0]. While bit-shifting solution is endianess independent, and on all systems gives most significant half (hi) as uint32_t(num_64 >> 32) and least significant half (lo) as uint32_t(num_64).
in a safe and clean way
Do not use reinterpret_cast. Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior. Use exact arithmetic operations with well-known defined result. Classes and operator overloads are all there waiting for you. For example, some global functions:
#include <iostream>
struct UpperUint64Ref {
uint64_t &v;
UpperUint64Ref(uint64_t &v) : v(v) {}
UpperUint64Ref operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32;
return *this;
}
operator uint64_t() {
return v;
}
};
struct LowerUint64Ref {
uint64_t &v;
LowerUint64Ref(uint64_t &v) : v(v) {}
/* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }
int main() {
uint64_t v;
upper(v) = 1;
}
Or interface object:
#include <iostream>
struct Uint64Ref {
uint64_t &v;
Uint64Ref(uint64_t &v) : v(v) {}
struct UpperReference {
uint64_t &v;
UpperReference(uint64_t &v) : v(v) {}
UpperReference operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32u;
}
};
UpperReference upper() {
return v;
}
struct LowerReference {
uint64_t &v;
LowerReference(uint64_t &v) : v(v) {}
};
LowerReference lower() { return v; }
};
int main() {
uint64_t v;
Uint64Ref r{v};
r.upper() = 1;
}
Using std::memcpy
#include <cstdint>
#include <cstring>
void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
sizeof(low_val));
std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
&high_val, sizeof(high_val));
}
int main() {
uint64_t v = 0;
foo(v, 1, 2);
}
With O1, the compiler reduces foo to:
mov DWORD PTR [rdi], esi
mov DWORD PTR [rdi+4], edx
ret
Meaning there are no extra copies made, std::memcpy just serves as a hint to the compiler.
std::bit_cast alone is not enough since results will vary by the endian of the system.
Fortunately <bit> also contains std::endian.
Keeping in mind that optimizers generally compile-time resolve ifs that are always true or false, we can test endianness and act accordingly.
We only know beforehand how to handle big or little-endian. If it is not one of those, bit_cast results are not decodable.
Another factor that can spoil things is padding. Using bit_cast assumes 0 padding between array elements.
So we can check if there is no padding and the endianness is big or little to see if it is castable.
If it is not castable, we do a bunch of shifts as per the old method.
(this can be slow)
If the endianness is big -- return the results of bit_cast.
If the endianness is little -- reverse the order. Not the same as c++23 byteswap, as we swap elements.
I arbitrarily decided that big-endian has the correct order with the high bits at x[0].
#include <bit>
#include <array>
#include <cstdint>
#include <climits>
#include <concepts>
template <std::integral F, std::integral T>
requires (sizeof(F) >= sizeof(T))
constexpr auto split(F x) {
enum consts {
FBITS=sizeof(F)*CHAR_BIT,
TBITS=sizeof(F)*CHAR_BIT,
ELEM=sizeof(F)/sizeof(T),
BASE=FBITS-TBITS,
MASK=~0ULL >> BASE
};
using split=std::array<T, ELEM>;
const bool is_big=std::endian::native==std::endian::big;
const bool is_little=std::endian::native==std::endian::little;
const bool can_cast=((is_big || is_little)
&& (sizeof(F) == sizeof(split)));
// All the following `if`s should be eliminated at compile time
// since they are always true or always false
if (!can_cast)
{
split ret;
for (int e = 0; e < ELEM; ++e)
{
ret[e]=(x>>(BASE-e*TBITS)) & MASK;
}
return ret;
}
split tmp=std::bit_cast<split>(x);
if (is_big)
{
return tmp;
}
split ret;
for (int e=0; e < ELEM; ++e)
{
ret[e]=tmp[ELEM-(e+1)];
}
return ret;
}
auto tst(uint64_t x, int y)
{
return split<decltype(x), uint32_t>(x)[y];
}
I believe this should be defined behavior.
EDIT: changed uint64 base to template parameter and minor edit tweaks
Don't bother, because arithmetic is faster anyway:
uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;

Is memcpy the standard way to pack float into uint32?

Is the following the best way to pack a float's bits into a uint32? This might be a fast and easy yes, but I want to make sure there's no better way, or that exchanging the value between processes doesn't introduce a weird wrinkle.
"Best" in my case, is that it won't ever break on a compliant C++ compiler (given the static assert), can be packed and unpacked between two processes on the same computer, and is as fast as copying a uint32 into another uint32.
Process A:
static_assert(sizeof(float) == sizeof(uint32) && alignof(float) == alignof(uint32), "no");
...
float f = 0.5f;
uint32 buffer[128];
memcpy(buffer + 41, &f, sizeof(uint32)); // packing
Process B:
uint32 * buffer = thisUint32Is_ReadFromProcessA(); // reads "buffer" from process A
...
memcpy(&f, buffer + 41, sizeof(uint32)); // unpacking
assert(f == 0.5f);
Yes, this is the standard way to do type punning. Cppreferences's page on memcpy even includes an example showing how you can use it to reinterpret a double as an int64_t
#include <iostream>
#include <cstdint>
#include <cstring>
int main()
{
// simple usage
char source[] = "once upon a midnight dreary...", dest[4];
std::memcpy(dest, source, sizeof dest);
for (char c : dest)
std::cout << c << '\n';
// reinterpreting
double d = 0.1;
// std::int64_t n = *reinterpret_cast<std::int64_t*>(&d); // aliasing violation
std::int64_t n;
std::memcpy(&n, &d, sizeof d); // OK
std::cout << std::hexfloat << d << " is " << std::hex << n
<< " as an std::int64_t\n";
}
ouput
o
n
c
e
0x1.999999999999ap-4 is 3fb999999999999a as an std::int64_t
As long as the asserts pass (your are writing and reading the correct number of bytes) then the operation is safe. You can't pack a 64 bit object in a 32 bit object, but you can pack one 32 bit object into another 32 bit object, as long they are trivially copyable
Or this:
union TheUnion {
uint32 theInt;
float theFloat;
};
TheUnion converter;
converter.theFloat = myFloatValue;
uint32 myIntRep = converter.theInt;
I don't know if this is better, but it's a different way to look at it.

Creating constructor for a struct(union) in C++

What is the best way to create a constructor for a struct(which has a union member, does it matter?) to convert uint8_t type into the struct?
Here is my example to clarify more:
struct twoSixByte
{
union {
uint8_t fullByte;
struct
{
uint8_t twoPart : 2;
uint8_t sixPart : 6;
} bits;
};
};
uint32_t extractByte(twoSixByte mixedByte){
return mixedByte.bits.twoPart * mixedByte.bits.sixPart;
}
uint8_t tnum = 182;
print(extractByte(tnum)); // must print 2 * 54 = 108
P.S.
Finding from comments & answers, type-punning for unions is not possible in C++.
The solutions given are a little bit complicated specially where there are lots of these structures in the code. There are even situations where a byte is divided into multiple bit parts(more than two). So without taking advantage of unions and instead using bitsets ans shifting bits adds a lot of burden to the code.
Instead, I managed for a much simpler solution. I just converted the type before passing it to the function. Here is the fixed code:
struct twoSixByte
{
union {
uint8_t fullByte;
struct
{
uint8_t twoPart : 2;
uint8_t sixPart : 6;
} bits;
};
};
uint32_t extractByte(twoSixByte mixedByte){
return mixedByte.bits.twoPart * mixedByte.bits.sixPart;
}
uint8_t tnum = 182;
twoSixByte mixedType;
mixedType.fullByte = tnum;
print(extractByte(mixedByte)); // must print 2 * 54 = 108
Unless there is a pressing need for you to use a union, don't use it. Simplify your class to:
struct twoSixByte
{
twoSixByte(uint8_t in) : twoPart((in & 0xC0) >> 6), sixPart(in & 0x3F) {}
uint8_t twoPart : 2;
uint8_t sixPart : 6;
};
If there is a need to get the full byte, you can use:
uint8_t fullByte(twoSixByte mixedByte)
{
return ((mixedByte.twoPart << 6) | mixedByte.sixPart);
}
You could avoid the union and type punning and use a struct with the relevant member function. Note that we don't need a constructor if the struct is regarded as an aggregate to be initialized:
#include <cstdint>
struct twoSixByte {
uint8_t fullByte; // no constructor needed, initializing as an aggregate
uint32_t extractByte(){
return ((fullByte & 0b1100'0000) >> 6) * (fullByte & 0b0011'1111);
}
};
int main()
{
twoSixByte tnum{182};
auto test = tnum.extractByte(); // test == 2 * 54 == 108
}

C/C++ efficient bit array

Can you recommend efficient/clean way to manipulate arbitrary length bit array?
Right now I am using regular int/char bitmask, but those are not very clean when array length is greater than datatype length.
std vector<bool> is not available for me.
Since you mention C as well as C++, I'll assume that a C++-oriented solution like boost::dynamic_bitset might not be applicable, and talk about a low-level C implementation instead. Note that if something like boost::dynamic_bitset works for you, or there's a pre-existing C library you can find, then using them can be better than rolling your own.
Warning: None of the following code has been tested or even compiled, but it should be very close to what you'd need.
To start, assume you have a fixed bitset size N. Then something like the following works:
typedef uint32_t word_t;
enum { WORD_SIZE = sizeof(word_t) * 8 };
word_t data[N / 32 + 1];
inline int bindex(int b) { return b / WORD_SIZE; }
inline int boffset(int b) { return b % WORD_SIZE; }
void set_bit(int b) {
data[bindex(b)] |= 1 << (boffset(b));
}
void clear_bit(int b) {
data[bindex(b)] &= ~(1 << (boffset(b)));
}
int get_bit(int b) {
return data[bindex(b)] & (1 << (boffset(b));
}
void clear_all() { /* set all elements of data to zero */ }
void set_all() { /* set all elements of data to one */ }
As written, this is a bit crude since it implements only a single global bitset with a fixed size. To address these problems, you want to start with a data struture something like the following:
struct bitset { word_t *words; int nwords; };
and then write functions to create and destroy these bitsets.
struct bitset *bitset_alloc(int nbits) {
struct bitset *bitset = malloc(sizeof(*bitset));
bitset->nwords = (n / WORD_SIZE + 1);
bitset->words = malloc(sizeof(*bitset->words) * bitset->nwords);
bitset_clear(bitset);
return bitset;
}
void bitset_free(struct bitset *bitset) {
free(bitset->words);
free(bitset);
}
Now, it's relatively straightforward to modify the previous functions to take a struct bitset * parameter. There's still no way to re-size a bitset during its lifetime, nor is there any bounds checking, but neither would be hard to add at this point.
boost::dynamic_bitset if the length is only known in run time.
std::bitset if the length is known in compile time (although arbitrary).
I've written a working implementation based off Dale Hagglund's response to provide a bit array in C (BSD license).
https://github.com/noporpoise/BitArray/
Please let me know what you think / give suggestions. I hope people looking for a response to this question find it useful.
This posting is rather old, but there is an efficient bit array suite in C in my ALFLB library.
For many microcontrollers without a hardware-division opcode, this library is EFFICIENT because it doesn't use division: instead, masking and bit-shifting are used. (Yes, I know some compilers will convert division by 8 to a shift, but this varies from compiler to compiler.)
It has been tested on arrays up to 2^32-2 bits (about 4 billion bits stored in 536 MBytes), although last 2 bits should be accessible if not used in a for-loop in your application.
See below for an extract from the doco. Doco is http://alfredo4570.net/src/alflb_doco/alflb.pdf, library is http://alfredo4570.net/src/alflb.zip
Enjoy,
Alf
//------------------------------------------------------------------
BM_DECLARE( arrayName, bitmax);
Macro to instantiate an array to hold bitmax bits.
//------------------------------------------------------------------
UCHAR *BM_ALLOC( BM_SIZE_T bitmax);
mallocs an array (of unsigned char) to hold bitmax bits.
Returns: NULL if memory could not be allocated.
//------------------------------------------------------------------
void BM_SET( UCHAR *bit_array, BM_SIZE_T bit_index);
Sets a bit to 1.
//------------------------------------------------------------------
void BM_CLR( UCHAR *bit_array, BM_SIZE_T bit_index);
Clears a bit to 0.
//------------------------------------------------------------------
int BM_TEST( UCHAR *bit_array, BM_SIZE_T bit_index);
Returns: TRUE (1) or FALSE (0) depending on a bit.
//------------------------------------------------------------------
int BM_ANY( UCHAR *bit_array, int value, BM_SIZE_T bitmax);
Returns: TRUE (1) if array contains the requested value (i.e. 0 or 1).
//------------------------------------------------------------------
UCHAR *BM_ALL( UCHAR *bit_array, int value, BM_SIZE_T bitmax);
Sets or clears all elements of a bit array to your value. Typically used after a BM_ALLOC.
Returns: Copy of address of bit array
//------------------------------------------------------------------
void BM_ASSIGN( UCHAR *bit_array, int value, BM_SIZE_T bit_index);
Sets or clears one element of your bit array to your value.
//------------------------------------------------------------------
BM_MAX_BYTES( int bit_max);
Utility macro to calculate the number of bytes to store bitmax bits.
Returns: A number specifying the number of bytes required to hold bitmax bits.
//------------------------------------------------------------------
You can use std::bitset
int main() {
const bitset<12> mask(2730ul);
cout << "mask = " << mask << endl;
bitset<12> x;
cout << "Enter a 12-bit bitset in binary: " << flush;
if (cin >> x) {
cout << "x = " << x << endl;
cout << "As ulong: " << x.to_ulong() << endl;
cout << "And with mask: " << (x & mask) << endl;
cout << "Or with mask: " << (x | mask) << endl;
}
}
I know it's an old post but I came here to find a simple C bitset implementation and none of the answers quite matched what I was looking for, so I implemented my own based on Dale Hagglund's answer. Here it is :)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef uint32_t word_t;
enum { BITS_PER_WORD = 32 };
struct bitv { word_t *words; int nwords; int nbits; };
struct bitv* bitv_alloc(int bits) {
struct bitv *b = malloc(sizeof(struct bitv));
if (b == NULL) {
fprintf(stderr, "Failed to alloc bitv\n");
exit(1);
}
b->nwords = (bits >> 5) + 1;
b->nbits = bits;
b->words = malloc(sizeof(*b->words) * b->nwords);
if (b->words == NULL) {
fprintf(stderr, "Failed to alloc bitv->words\n");
exit(1);
}
memset(b->words, 0, sizeof(*b->words) * b->nwords);
return b;
}
static inline void check_bounds(struct bitv *b, int bit) {
if (b->nbits < bit) {
fprintf(stderr, "Attempted to access a bit out of range\n");
exit(1);
}
}
void bitv_set(struct bitv *b, int bit) {
check_bounds(b, bit);
b->words[bit >> 5] |= 1 << (bit % BITS_PER_WORD);
}
void bitv_clear(struct bitv *b, int bit) {
check_bounds(b, bit);
b->words[bit >> 5] &= ~(1 << (bit % BITS_PER_WORD));
}
int bitv_test(struct bitv *b, int bit) {
check_bounds(b, bit);
return b->words[bit >> 5] & (1 << (bit % BITS_PER_WORD));
}
void bitv_free(struct bitv *b) {
if (b != NULL) {
if (b->words != NULL) free(b->words);
free(b);
}
}
void bitv_dump(struct bitv *b) {
if (b == NULL) return;
for(int i = 0; i < b->nwords; i++) {
word_t w = b->words[i];
for (int j = 0; j < BITS_PER_WORD; j++) {
printf("%d", w & 1);
w >>= 1;
}
printf(" ");
}
printf("\n");
}
void test(struct bitv *b, int bit) {
if (bitv_test(b, bit)) printf("Bit %d is set!\n", bit);
else printf("Bit %d is not set!\n", bit);
}
int main(int argc, char *argv[]) {
struct bitv *b = bitv_alloc(32);
bitv_set(b, 1);
bitv_set(b, 3);
bitv_set(b, 5);
bitv_set(b, 7);
bitv_set(b, 9);
bitv_set(b, 32);
bitv_dump(b);
bitv_free(b);
return 0;
}
I use this one:
//#include <bitset>
#include <iostream>
//source http://stackoverflow.com/questions/47981/how-do-you-set-clear-and-toggle-a-single-bit-in-c
#define BIT_SET(a,b) ((a) |= (1<<(b)))
#define BIT_CLEAR(a,b) ((a) &= ~(1<<(b)))
#define BIT_FLIP(a,b) ((a) ^= (1<<(b)))
#define BIT_CHECK(a,b) ((a) & (1<<(b)))
/* x=target variable, y=mask */
#define BITMASK_SET(x,y) ((x) |= (y))
#define BITMASK_CLEAR(x,y) ((x) &= (~(y)))
#define BITMASK_FLIP(x,y) ((x) ^= (y))
#define BITMASK_CHECK(x,y) ((x) & (y))
I have recently released BITSCAN, a C++ bit string library which is specifically oriented towards fast bit scanning operations. BITSCAN is available here. It is in alpha but still pretty well tested since I have used it in recent years for research in combinatorial optimization (e.g. in BBMC, a state of the art exact maximum clique algorithm). A comparison with other well known C++ implementations (STL or BOOST) may be found here.
I hope you find it useful. Any feedback is welcome.
In micro controller development, some times we need to use
2-dimentional array (matrix) with element value of [0, 1] only. That
means if we use 1 byte for element type, it wastes the memory greatly
(memory of micro controller is very limited). The proposed solution is
that we should use 1 bit matrix (element type is 1 bit).
http://htvdanh.blogspot.com/2016/09/one-bit-matrix-for-cc-programming.html
I recently implemented a small header-only library called BitContainer just for this purpose.
It focuses on expressiveness and compiletime abilities and can be found here:
https://github.com/EddyXorb/BitContainer
It is for sure not the classical way to look at bitarrays but can come in handy for strong-typing purposes and memory efficient representation of named properties.
Example:
constexpr Props props(Prop::isHigh(),Prop::isLow()); // intialize BitContainer of type Props with strong-type Prop
constexpr bool result1 = props.contains(Prop::isTiny()) // false
constexpr bool result2 = props.contains(Prop::isLow()) // true

Binary literals?

In code, I sometimes see people specify constants in hex format like this:
const int has_nukes = 0x0001;
const int has_bio_weapons = 0x0002;
const int has_chem_weapons = 0x0004;
// ...
int arsenal = has_nukes | has_bio_weapons | has_chem_weapons; // all of them
if(arsenal &= has_bio_weapons){
std::cout << "BIO!!"
}
But it doesn't make sense to me to use the hex format here. Is there a way to do it directly in binary? Something like this:
const int has_nukes = 0b00000000000000000000000000000001;
const int has_bio_weapons = 0b00000000000000000000000000000010;
const int has_chem_weapons = 0b00000000000000000000000000000100;
// ...
I know the C/C++ compilers won't compile this, but there must be a workaround? Is it possible in other languages like Java?
In C++14 you will be able to use binary literals with the following syntax:
0b010101010 /* more zeros and ones */
This feature is already implemented in the latest clang and gcc. You can try it if you run those compilers with -std=c++1y option.
I'd use a bit shift operator:
const int has_nukes = 1<<0;
const int has_bio_weapons = 1<<1;
const int has_chem_weapons = 1<<2;
// ...
int dangerous_mask = has_nukes | has_bio_weapons | has_chem_weapons;
bool is_dangerous = (country->flags & dangerous_mask) == dangerous_mask;
It is even better than flood of 0's.
By the way, the next C++ version will support user defined literals. They are already included into the working draft. This allows that sort of stuff (let's hope i don't have too many errors in it):
template<char... digits>
constexpr int operator "" _b() {
return conv2bin<digits...>::value;
}
int main() {
int const v = 110110110_b;
}
conv2bin would be a template like this:
template<char... digits>
struct conv2bin;
template<char high, char... digits>
struct conv2bin<high, digits...> {
static_assert(high == '0' || high == '1', "no bin num!");
static int const value = (high - '0') * (1 << sizeof...(digits)) +
conv2bin<digits...>::value;
};
template<char high>
struct conv2bin<high> {
static_assert(high == '0' || high == '1', "no bin num!");
static int const value = (high - '0');
};
Well, what we get are binary literals that evaluate fully at compile time already, because of the "constexpr" above. The above uses a hard-coded int return type. I think one could even make it depend on the length of the binary string. It's using the following features, for anyone interested:
Generalized Constant Expressions.
Variadic Templates. A brief introduction can be found here
Static Assertions (static_assert)
User defined Literals
Actually, current GCC trunk already implements variadic templates and static assertions. Let's hope it will support the other two soon. I think C++1x will rock the house.
The C++ Standard Library is your friend:
#include <bitset>
const std::bitset <32> has_nukes( "00000000000000000000000000000001" );
GCC supports binary constants as an extension since 4.3. See the announcement (look at the section "New Languages and Language specific improvements").
You can use << if you like.
int hasNukes = 1;
int hasBioWeapons = 1 << 1;
int hasChemWeapons = 1 << 2;
This discussion may be interesting... Might have been, as the link is dead unfortunately. It described a template based approach similar to other answers here.
And also there is a thing called BOOST_BINARY.
The term you want is binary literals
Ruby has them with the syntax you give.
One alternative is to define helper macros to convert for you. I found the following code at http://bytes.com/groups/c/219656-literal-binary
/* Binary constant generator macro
* By Tom Torfs - donated to the public domain
*/
/* All macro's evaluate to compile-time constants */
/* *** helper macros *** */
/* turn a numeric literal into a hex constant
* (avoids problems with leading zeroes)
* 8-bit constants max value 0x11111111, always fits in unsigned long
*/
#define HEX_(n) 0x##n##LU
/* 8-bit conversion function */
#define B8_(x) ((x & 0x0000000FLU) ? 1:0) \
| ((x & 0x000000F0LU) ? 2:0) \
| ((x & 0x00000F00LU) ? 4:0) \
| ((x & 0x0000F000LU) ? 8:0) \
| ((x & 0x000F0000LU) ? 16:0) \
| ((x & 0x00F00000LU) ? 32:0) \
| ((x & 0x0F000000LU) ? 64:0) \
| ((x & 0xF0000000LU) ? 128:0)
/* *** user macros *** /
/* for upto 8-bit binary constants */
#define B8(d) ((unsigned char) B8_(HEX_(d)))
/* for upto 16-bit binary constants, MSB first */
#define B16(dmsb, dlsb) (((unsigned short) B8(dmsb) << 8) \
| B8(dlsb))
/* for upto 32-bit binary constants, MSB first */
#define B32(dmsb, db2, db3, dlsb) (((unsigned long) B8(dmsb) << 24) \
| ((unsigned long) B8( db2) << 16) \
| ((unsigned long) B8( db3) << 8) \
| B8(dlsb))
/* Sample usage:
* B8(01010101) = 85
* B16(10101010,01010101) = 43605
* B32(10000000,11111111,10101010,01010101) = 2164238933
*/
The next version of C++, C++0x, will introduce user defined literals. I'm not sure if binary numbers will be part of the standard but at the worst you'll be able to enable it yourself:
int operator "" _B(int i);
assert( 1010_B == 10);
I write binary literals like this:
const int has_nukes = 0x0001;
const int has_bio_weapons = 0x0002;
const int has_chem_weapons = 0x0004;
It's more compact than your suggested notation, and easier to read. For example:
const int upper_bit = 0b0001000000000000000;
versus:
const int upper_bit = 0x04000;
Did you notice that the binary version wasn't an even multiple of 4 bits? Did you think it was 0x10000?
With a little practice hex or octal are easier for a human than binary. And, in my opinion, easier to read that using shift operators. But I'll concede that my years of assembly language work may bias me on that point.
If you want to use bitset, auto, variadic templates, user-defined literals, static_assert, constexpr, and noexcept try this:
template<char... Bits>
struct __checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct __checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& __checkbits<Bits...>::valid;
};
template<char High>
struct __checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" bits() noexcept
{
static_assert(__checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
Use it like this:
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile-time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101bits;
// This throws at run-time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101bits");
}
Thanks to #johannes-schaub-litb
Java doesn't support binary literals either, unfortunately. However, it has enums which can be used with an EnumSet. An EnumSet represents enum values internally with bit fields, and presents a Set interface for manipulating these flags.
Alternatively, you could use bit offsets (in decimal) when defining your values:
const int HAS_NUKES = 0x1 << 0;
const int HAS_BIO_WEAPONS = 0x1 << 1;
const int HAS_CHEM_WEAPONS = 0x1 << 2;
There's no syntax for literal binary constants in C++ the way there is for hexadecimal and octal. The closest thing for what it looks like you're trying to do would probably be to learn and use bitset.
As an aside:
Especially if you're dealing with a large set, instead of going through the [minor] mental effort of writing a sequence of shift amounts, you can make each constant depend on the previously defined constant:
const int has_nukes = 1;
const int has_bio_weapons = has_nukes << 1;
const int has_chem_weapons = has_bio_weapons << 1;
const int has_nunchuks = has_chem_weapons << 1;
// ...
Looks a bit redundant, but it's less typo-prone. Also, you can simply insert a new constant in the middle without having to touch any other line except the one immediately following it:
const int has_nukes = 1;
const int has_gravity_gun = has_nukes << 1; // added
const int has_bio_weapons = has_gravity_gun << 1; // changed
const int has_chem_weapons = has_bio_weapons << 1; // unaffected from here on
const int has_nunchuks = has_chem_weapons << 1;
// ...
Compare to:
const int has_nukes = 1 << 0;
const int has_bio_weapons = 1 << 1;
const int has_chem_weapons = 1 << 2;
const int has_nunchuks = 1 << 3;
// ...
const int has_scimatar = 1 << 28;
const int has_rapier = 1 << 28; // good luck spotting this typo!
const int has_katana = 1 << 30;
And:
const int has_nukes = 1 << 0;
const int has_gravity_gun = 1 << 1; // added
const int has_bio_weapons = 1 << 2; // changed
const int has_chem_weapons = 1 << 3; // changed
const int has_nunchuks = 1 << 4; // changed
// ... // changed all the way
const int has_scimatar = 1 << 29; // changed *sigh*
const int has_rapier = 1 << 30; // changed *sigh*
const int has_katana = 1 << 31; // changed *sigh*
As an aside to my aside, it's probably equally hard to spot a typo like this:
const int has_nukes = 1;
const int has_gravity_gun = has_nukes << 1;
const int has_bio_weapons = has_gravity_gun << 1;
const int has_chem_weapons = has_gravity_gun << 1; // oops!
const int has_nunchuks = has_chem_weapons << 1;
So, I think the main advantage of this cascading syntax is when dealing with insertions and deletions of constants.
Another method:
template<unsigned int N>
class b
{
public:
static unsigned int const x = N;
typedef b_<0> _0000;
typedef b_<1> _0001;
typedef b_<2> _0010;
typedef b_<3> _0011;
typedef b_<4> _0100;
typedef b_<5> _0101;
typedef b_<6> _0110;
typedef b_<7> _0111;
typedef b_<8> _1000;
typedef b_<9> _1001;
typedef b_<10> _1010;
typedef b_<11> _1011;
typedef b_<12> _1100;
typedef b_<13> _1101;
typedef b_<14> _1110;
typedef b_<15> _1111;
private:
template<unsigned int N2>
struct b_: public b<N << 4 | N2> {};
};
typedef b<0> _0000;
typedef b<1> _0001;
typedef b<2> _0010;
typedef b<3> _0011;
typedef b<4> _0100;
typedef b<5> _0101;
typedef b<6> _0110;
typedef b<7> _0111;
typedef b<8> _1000;
typedef b<9> _1001;
typedef b<10> _1010;
typedef b<11> _1011;
typedef b<12> _1100;
typedef b<13> _1101;
typedef b<14> _1110;
typedef b<15> _1111;
Usage:
std::cout << _1101::_1001::_1101::_1101::x;
Implemented in CityLizard++ (citylizard/binary/b.hpp).
I agree that it's useful to have an option for binary literals, and they are present in many programming languages. In C, I've decided to use a macro like this:
#define bitseq(a00,a01,a02,a03,a04,a05,a06,a07,a08,a09,a10,a11,a12,a13,a14,a15, \
a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31) \
(a31|a30<< 1|a29<< 2|a28<< 3|a27<< 4|a26<< 5|a25<< 6|a24<< 7| \
a23<< 8|a22<< 9|a21<<10|a20<<11|a19<<12|a18<<13|a17<<14|a16<<15| \
a15<<16|a14<<17|a13<<18|a12<<19|a11<<20|a10<<21|a09<<22|a08<<23| \
a07<<24|a06<<25|a05<<26|a04<<27|a03<<28|a02<<29|a01<<30|(unsigned)a00<<31)
The usage is pretty much straightforward =)
One, slightly horrible way you could do it is by generating a .h file with lots of #defines...
#define b00000000 0
#define b00000001 1
#define b00000010 2
#define b00000011 3
#define b00000100 4
etc.
This might make sense for 8-bit numbers, but probably not for 16-bit or larger.
Alternatively, do this (similar to Zach Scrivena's answer):
#define bit(x) (1<<x)
int HAS_NUKES = bit(HAS_NUKES_OFFSET);
int HAS_BIO_WEAPONS = bit(HAS_BIO_WEAPONS_OFFSET);
Binary literals are part of the C++ language since C++14. It’s literals that start with 0b or 0B. Reference
Maybe less relevant to binary literals, but this just looks as if it can be solved better with a bit field.
struct DangerCollection : uint32_t {
bool has_nukes : 1;
bool has_bio_weapons : 1;
bool has_chem_weapons : 1;
// .....
};
DangerCollection arsenal{
.has_nukes = true,
.has_bio_weapons = true,
.has_chem_weapons = true,
// ...
};
if(arsenal.has_bio_weapons){
std::cout << "BIO!!"
}
You would still be able to fill it with binary data, since its binary footprint is just a uint32. This is often used in combination with a union, for compact binary serialisation:
union DangerCollectionUnion {
DangerCollection collection;
uint8_t data[sizeof(DangerCollection)];
};
DangerCollectionUnion dc;
std::memcpy(dc.data, bitsIGotFromSomewhere, sizeof(DangerCollection));
if (dc.collection.has_bio_weapons) {
// ....
In my experience less error prone and easy to understand what's going on.