Is there a portable alternative to C++ bitfields

Is there a portable alternative to C++ bitfields - c++

There are many situations (especially in low-level programming), where the binary layout of the data is important. For example: hardware/driver manipulation, network protocols, etc.
In C++ I can read/write arbitrary binary structures using char* and bitwise operations (masks and shifts), but that's tedious and error-prone. Obviously, I try to limit the scope of these operations and encapsulate them in higher-level APIs, but it's still a pain.
C++ bitfields seem to offer a developer-friendly solution to this problem, but unfortunately their storage is implementation specific.
NathanOliver mentionned std::bitset which basically allows you to access individual bits of an integer with a nice operator[] but lacks accessors for multi-bit fields.
Using meta-programming and/or macros, it's possible to abstract the bitwise operations in a library. Since I don't want to reinvent the wheel, I'm looking for a (preferably STL or boost) library that does that.
For the record, I'm looking into this for a DNS resolver, but the problem and its solution should be generic.
Edit: short answer: it turns out bitfield's storage is reliable in practice (even if it's not mandated by the standard) since system/network libraries use them and yeild well behaved programs when compiled with mainstream compilers.

From the C++14 standard (N3797 draft), section 9.6 [class.bit], paragraph 1:
Allocation of bit-fields within a class object is implementation-defined.
Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
[ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]
Although notes are non-normative, every implementation I'm aware of uses one of two layouts: either big-endian or little endian bit order.
Note that:
You must specify padding manually. This implies that you must know the size of your types (e.g. by using <cstdint>).
You must use unsigned types.
The preprocessor macros for detecting the bit order are implementation-dependent.
Usually the bit order endianness is the same as the byte order endianness. I believe there is a compiler flag to override it, though, but I can't find it.
For examples, look in netinet/tcp.h and other nearby headers.
Edit by OP: for example tcp.h defines
struct
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
// ...
}
And since it works with mainstream compilers, it means bitset's memory layout is reliable in practice.
Edit:
This is portable within one endianness:
struct Foo {
uint16_t x: 10;
uint16_t y: 6;
};
But this may not be because it straddles a 16-bit unit:
struct Foo {
uint16_t x: 10;
uint16_t y: 12;
uint16_t z: 10;
};
And this may not be because it has implicit padding:
struct Foo {
uint16_t x: 10;
};

We have this in production code where we had to port MIPS code to x86-64
https://codereview.stackexchange.com/questions/54342/template-for-endianness-free-code-data-always-packed-as-big-endian
Works well for us.
It's basically a template without any storage, the template arguments specify the position of the relevant bits.
If you need multiple fields, you put multiple specializations of the template together in a union, together with an array of bytes to provide storage.
The template has overloads for assignment of value and a conversion operator to unsigned for reading the value.
In addition, if the fields are larger than a byte, they are stored in big-endian byte order, which is sometimes useful when implementing cross-platform protocols.
here's a usage example:
union header
{
unsigned char arr[2]; // space allocation, 2 bytes (16 bits)
BitFieldMember<0, 4> m1; // first 4 bits
BitFieldMember<4, 5> m2; // The following 5 bits
BitFieldMember<9, 6> m3; // The following 6 bits, total 16 bits
};
int main()
{
header a;
memset(a.arr, 0, sizeof(a.arr));
a.m1 = rand();
a.m3 = a.m1;
a.m2 = ~a.m1;
return 0;
}

It's simple to implement bit fields with known positions with C++:
template<typename T, int POS, int SIZE>
struct BitField {
T *data;
BitField(T *data) : data(data) {}
operator int() const {
return ((*data) >> POS) & ((1ULL << SIZE)-1);
}
BitField& operator=(int x) {
T mask( ((1ULL << SIZE)-1) << POS );
*data = (*data & ~mask) | ((x << POS) & mask);
return *this;
}
};
The above toy implementation allows for example to define a 12-bit field in a unsigned long long variable with
unsigned long long var;
BitField<unsigned long long, 7, 12> muxno(&var);
and the generated code to access the field value is just
0000000000000020 <_Z6getMuxv>:
20: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax ; Get &var
27: 48 8b 00 mov (%rax),%rax ; Get content
2a: 48 c1 e8 07 shr $0x7,%rax ; >> 7
2e: 25 ff 0f 00 00 and $0xfff,%eax ; keep 12 bits
33: c3 retq
Basically what you'd have to write by hand

I have written an implementation of bit fields in C++ as a library header file. An example I give in the documentation is that, instead of writing this:
struct A
{
union
{
struct
{
unsigned x : 5;
unsigned a0 : 2;
unsigned a1 : 2;
unsigned a2 : 2;
}
u;
struct
{
unsigned x : 5;
unsigned all_a : 6;
}
v;
};
};
// …
A x;
x.v.all_a = 0x3f;
x.u.a1 = 0;
you can write:
typedef Bitfield<Bitfield_traits_default<> > Bf;
struct A : private Bitfield_fmt
{
F<5> x;
F<2> a[3];
};
typedef Bitfield_w_fmt<Bf, A> Bwf;
// …
Bwf::Format::Define::T x;
BITF(Bwf, x, a) = 0x3f;
BITF(Bwf, x, a[1]) = 0;
There's an alternative interface, under which the last two lines of the above would change to:
#define BITF_U_X_BWF Bwf
#define BITF_U_X_BASE x
BITF(X, a) = 0x3f;
BITF(X, a[1]) = 0;
Using this implementation of bit fields, the traits template parameter gives the programmer a lot of flexibility. Memory is just processor memory by default, or it can be an abstraction, with the programmer providing functions to perform "memory" reads and writes. The abstracted memory is a sequence of elements of any unsigned integral type (chosen by the programmer). Fields can be laid out either from least-to-most or most-to-least significance. The layout of fields in memory can be the reverse of what they are in the format structure.
The implementation is located at: https://github.com/wkaras/C-plus-plus-library-bit-fields
(As you can see, I unfortunately was not able to fully avoid use of macros.)

I have created a library for that:
Portable Bitfields
It works similar to the solution provided by #CpusPuzzle.
Basic example:
enum class Id
{
f1, f2, f3
};
using namespace jungles;
using Register = Bitfields<
uint16_t,
Field{.id = Id::f1, .size = 3},
Field{.id = Id::f2, .size = 9},
Field{.id = Id::f3, .size = 4}>;
r.at<Id::f1>() = 0b101;
r.at<Id::f2>() = 0b001111100;
r.at<Id::f3>() = 0b0110;
ASSERT(r.extract<Id::f1>() == 0b1010000000000000);
ASSERT(r.extract<Id::f2>() == 0b0000011111000000);
ASSERT(r.extract<Id::f3>() == 0b0000000000000110);
ASSERT(r.serialize() == 0b1010011111000110);
Deserialization:
Register r{0b0101110001110110};
// XXXYYYYYYYYYZZZZ
ASSERT(r.at<Id::f1>() == 0b010);
ASSERT(r.at<Id::f2>() == 0b111000111);
ASSERT(r.at<Id::f3>() == 0b0110);

C is designed for low-level bit manipulation. It's easy enough to declare a buffer of unsigned chars, and set it to any bit pattern you want. Especially if your bit strings are very short so fit into one of the integral types.
One potential problem is byte endiannness. C can't "see" this at all, but just as integers have an endianness, so too do bytes, when serialised. Another is the very small number of machines that don't use octets for bytes. C guarantees a byte shall be at least an octet, but 32 and 9 are real-world implementations. In those circumstances, you have to take a decision whether to simply ignore the uper bits (in which case naive code should work), or treat them as part of the bitstream (in which case you've got to be careful to fold CHAR_BIT into your calculations). It's also hard to test the code as you unlikely to find it easy to get your hands on a CHAR+BIT 32 machine.

Related

If a 32-bit integer overflows, can we use a 40-bit structure instead of a 64-bit long one?

If, say, a 32-bit integer is overflowing, instead of upgrading int to long, can we make use of some 40-bit type if we need a range only within 240, so that we save 24 (64-40) bits for every integer?
If so, how?
I have to deal with billions and space is a bigger constraint.

Yes, but...
It is certainly possible, but it is usually nonsensical (for any program that doesn't use billions of these numbers):
#include <stdint.h> // don't want to rely on something like long long
struct bad_idea
{
uint64_t var : 40;
};
Here, var will indeed have a width of 40 bits at the expense of much less efficient code generated (it turns out that "much" is very much wrong -- the measured overhead is a mere 1-2%, see timings below), and usually to no avail. Unless you have need for another 24-bit value (or an 8 and 16 bit value) which you wish to pack into the same structure, alignment will forfeit anything that you may gain.
In any case, unless you have billions of these, the effective difference in memory consumption will not be noticeable (but the extra code needed to manage the bit field will be noticeable!).
Note:
The question has in the mean time been updated to reflect that indeed billions of numbers are needed, so this may be a viable thing to do, presumed that you take measures not to lose the gains due to structure alignment and padding, i.e. either by storing something else in the remaining 24 bits or by storing your 40-bit values in structures of 8 each or multiples thereof).
Saving three bytes a billion times is worthwhile as it will require noticeably fewer memory pages and thus cause fewer cache and TLB misses, and above all page faults (a single page fault weighting tens of millions instructions).
While the above snippet does not make use of the remaining 24 bits (it merely demonstrates the "use 40 bits" part), something akin to the following will be necessary to really make the approach useful in a sense of preserving memory -- presumed that you indeed have other "useful" data to put in the holes:
struct using_gaps
{
uint64_t var : 40;
uint64_t useful_uint16 : 16;
uint64_t char_or_bool : 8;
};
Structure size and alignment will be equal to a 64 bit integer, so nothing is wasted if you make e.g. an array of a billion such structures (even without using compiler-specific extensions). If you don't have use for an 8-bit value, you could also use an 48-bit and a 16-bit value (giving a bigger overflow margin).
Alternatively you could, at the expense of usability, put 8 40-bit values into a structure (least common multiple of 40 and 64 being 320 = 8*40). Of course then your code which accesses elements in the array of structures will become much more complicated (though one could probably implement an operator[] that restores the linear array functionality and hides the structure complexity).
Update:
Wrote a quick test suite, just to see what overhead the bitfields (and operator overloading with bitfield refs) would have. Posted code (due to length) at gcc.godbolt.org, test output from my Win7-64 machine is:
Running test for array size = 1048576
what alloc seq(w) seq(r) rand(w) rand(r) free
-----------------------------------------------------------
uint32_t 0 2 1 35 35 1
uint64_t 0 3 3 35 35 1
bad40_t 0 5 3 35 35 1
packed40_t 0 7 4 48 49 1
Running test for array size = 16777216
what alloc seq(w) seq(r) rand(w) rand(r) free
-----------------------------------------------------------
uint32_t 0 38 14 560 555 8
uint64_t 0 81 22 565 554 17
bad40_t 0 85 25 565 561 16
packed40_t 0 151 75 765 774 16
Running test for array size = 134217728
what alloc seq(w) seq(r) rand(w) rand(r) free
-----------------------------------------------------------
uint32_t 0 312 100 4480 4441 65
uint64_t 0 648 172 4482 4490 130
bad40_t 0 682 193 4573 4492 130
packed40_t 0 1164 552 6181 6176 130
What one can see is that the extra overhead of bitfields is neglegible, but the operator overloading with bitfield reference as a convenience thing is rather drastic (about 3x increase) when accessing data linearly in a cache-friendly manner. On the other hand, on random access it barely even matters.
These timings suggest that simply using 64-bit integers would be better since they are still faster overall than bitfields (despite touching more memory), but of course they do not take into account the cost of page faults with much bigger datasets. It might look very different once you run out of physical RAM (I didn't test that).

You can quite effectively pack 4*40bits integers into a 160-bit struct like this:
struct Val4 {
char hi[4];
unsigned int low[4];
}
long getLong( const Val4 &pack, int ix ) {
int hi= pack.hi[ix]; // preserve sign into 32 bit
return long( (((unsigned long)hi) << 32) + (unsigned long)pack.low[i]);
}
void setLong( Val4 &pack, int ix, long val ) {
pack.low[ix]= (unsigned)val;
pack.hi[ix]= (char)(val>>32);
}
These again can be used like this:
Val4[SIZE] vals;
long getLong( int ix ) {
return getLong( vals[ix>>2], ix&0x3 )
}
void setLong( int ix, long val ) {
setLong( vals[ix>>2], ix&0x3, val )
}

You might want to consider Variable-Lenght Encoding (VLE)
Presumably, you have store a lot of those numbers somewhere (in RAM, on disk, send them over the network, etc), and then take them one by one and do some processing.
One approach would be to encode them using VLE.
From Google's protobuf documentation (CreativeCommons licence)
Varints are a method of serializing integers using
one or more bytes. Smaller numbers take a smaller number of bytes.
Each byte in a varint, except the last byte, has the most significant
bit (msb) set – this indicates that there are further bytes to come.
The lower 7 bits of each byte are used to store the two's complement
representation of the number in groups of 7 bits, least significant
group first.
So, for example, here is the number 1 – it's a single byte, so the msb
is not set:
0000 0001
And here is 300 – this is a bit more complicated:
1010 1100 0000 0010
How do you figure out that this is 300? First you drop the msb from
each byte, as this is just there to tell us whether we've reached the
end of the number (as you can see, it's set in the first byte as there
is more than one byte in the varint)
Pros
If you have lots of small numbers, you'll probably use less than 40 bytes per integer, in average. Possibly much less.
You are able to store bigger numbers (with more than 40 bits) in the future, without having to pay a penalty for the small ones
Cons
You pay an extra bit for each 7 significant bits of your numbers. That means a number with 40 significant bits will need 6 bytes. If most of your numbers have 40 significant bits, you are better of with a bit field approach.
You will lose the ability to easily jump to a number given its index (you have to at least partially parse all previous elements in an array in order to access the current one.
You will need some form of decoding before doing anything useful with the numbers (although that is true for other approaches as well, like bit fields)

(Edit: First of all - what you want is possible, and makes sense in some cases; I have had to do similar things when I tried to do something for the Netflix challenge and only had 1GB of memory; Second - it is probably best to use a char array for the 40-bit storage to avoid any alignment issues and the need to mess with struct packing pragmas; Third - this design assumes that you're OK with 64-bit arithmetic for intermediate results, it is only for large array storage that you would use Int40; Fourth: I don't get all the suggestions that this is a bad idea, just read up on what people go through to pack mesh data structures and this looks like child's play by comparison).
What you want is a struct that is only used for storing data as 40-bit ints but implicitly converts to int64_t for arithmetic. The only trick is doing the sign extension from 40 to 64 bits right. If you're fine with unsigned ints, the code can be even simpler. This should be able to get you started.
#include <cstdint>
#include <iostream>
// Only intended for storage, automatically promotes to 64-bit for evaluation
struct Int40
{
Int40(int64_t x) { set(static_cast<uint64_t>(x)); } // implicit constructor
operator int64_t() const { return get(); } // implicit conversion to 64-bit
private:
void set(uint64_t x)
{
setb<0>(x); setb<1>(x); setb<2>(x); setb<3>(x); setb<4>(x);
};
int64_t get() const
{
return static_cast<int64_t>(getb<0>() | getb<1>() | getb<2>() | getb<3>() | getb<4>() | signx());
};
uint64_t signx() const
{
return (data[4] >> 7) * (uint64_t(((1 << 25) - 1)) << 39);
};
template <int idx> uint64_t getb() const
{
return static_cast<uint64_t>(data[idx]) << (8 * idx);
}
template <int idx> void setb(uint64_t x)
{
data[idx] = (x >> (8 * idx)) & 0xFF;
}
unsigned char data[5];
};
int main()
{
Int40 a = -1;
Int40 b = -2;
Int40 c = 1 << 16;
std::cout << "sizeof(Int40) = " << sizeof(Int40) << std::endl;
std::cout << a << "+" << b << "=" << (a+b) << std::endl;
std::cout << c << "*" << c << "=" << (c*c) << std::endl;
}
Here is the link to try it live: http://rextester.com/QWKQU25252

You can use a bit-field structure, but it's not going to save you any memory:
struct my_struct
{
unsigned long long a : 40;
unsigned long long b : 24;
};
You can squeeze any multiple of 8 such 40-bit variables into one structure:
struct bits_16_16_8
{
unsigned short x : 16;
unsigned short y : 16;
unsigned short z : 8;
};
struct bits_8_16_16
{
unsigned short x : 8;
unsigned short y : 16;
unsigned short z : 16;
};
struct my_struct
{
struct bits_16_16_8 a1;
struct bits_8_16_16 a2;
struct bits_16_16_8 a3;
struct bits_8_16_16 a4;
struct bits_16_16_8 a5;
struct bits_8_16_16 a6;
struct bits_16_16_8 a7;
struct bits_8_16_16 a8;
};
This will save you some memory (in comparison with using 8 "standard" 64-bit variables), but you will have to split every operation (and in particular arithmetic ones) on each of these variables into several operations.
So the memory-optimization will be "traded" for runtime-performance.

As the comments suggest, this is quite a task.
Probably an unnecessary hassle unless you want to save alot of RAM - then it makes much more sense. (RAM saving would be the sum total of bits saved across millions of long values stored in RAM)
I would consider using an array of 5 bytes/char (5 * 8 bits = 40 bits). Then you will need to shift bits from your (overflowed int - hence a long) value into the array of bytes to store them.
To use the values, then shift the bits back out into a long and you can use the value.
Then your RAM and file storage of the value will be 40 bits (5 bytes), BUT you must consider data alignment if you plan to use a struct to hold the 5 bytes. Let me know if you need elaboration on this bit shifting and data alignment implications.
Similarly, you could use the 64 bit long, and hide other values (3 chars perhaps) in the residual 24 bits that you do not want to use. Again - using bit shifting to add and remove the 24 bit values.

Another variation that may be helpful would be to use a structure:
typedef struct TRIPLE_40 {
uint32_t low[3];
uint8_t hi[3];
uint8_t padding;
};
Such a structure would take 16 bytes and, if 16-byte aligned, would fit entirely within a single cache line. While identifying which of the parts of the structure to use may be more expensive than it would be if the structure held four elements instead of three, accessing one cache line may be much cheaper than accessing two. If performance is important, one should use some benchmarks since some machines may perform a divmod-3 operation cheaply and have a high cost per cache-line fetch, while others might have have cheaper memory access and more expensive divmod-3.

If you have to deal with billions of integers, I'd try to encapuslate arrays of 40-bit numbers instead of single 40-bit numbers. That way, you can test different array implementations (e.g. an implementation that compresses data on the fly, or maybe one that stores less-used data to disk.) without changing the rest of your code.
Here's a sample implementation (http://rextester.com/SVITH57679):
class Int64Array
{
char* buffer;
public:
static const int BYTE_PER_ITEM = 5;
Int64Array(size_t s)
{
buffer=(char*)malloc(s*BYTE_PER_ITEM);
}
~Int64Array()
{
free(buffer);
}
class Item
{
char* dataPtr;
public:
Item(char* dataPtr) : dataPtr(dataPtr){}
inline operator int64_t()
{
int64_t value=0;
memcpy(&value, dataPtr, BYTE_PER_ITEM); // Assumes little endian byte order!
return value;
}
inline Item& operator = (int64_t value)
{
memcpy(dataPtr, &value, BYTE_PER_ITEM); // Assumes little endian byte order!
return *this;
}
};
inline Item operator[](size_t index)
{
return Item(buffer+index*BYTE_PER_ITEM);
}
};
Note: The memcpy-conversion from 40-bit to 64-bit is basically undefined behavior, as it assumes litte-endianness. It should work on x86-platforms, though.
Note 2: Obviously, this is proof-of-concept code, not production-ready code. To use it in real projects, you'd have to add (among other things):
error handling (malloc can fail!)
copy constructor (e.g. by copying data, add reference counting or by making the copy constructor private)
move constructor
const overloads
STL-compatible iterators
bounds checks for indices (in debug build)
range checks for values (in debug build)
asserts for the implicit assumptions (little-endianness)
As it is, Item has reference semantics, not value semantics, which is unusual for operator[]; You could probably work around that with some clever C++ type conversion tricks
All of those should be straightforward for a C++ programmer, but they would make the sample code much longer without making it clearer, so I've decided to omit them.

I'll assume that
this is C, and
you need a single, large array of 40 bit numbers, and
you are on a machine that is little-endian, and
your machine is smart enough to handle alignment
you have defined size to be the number of 40-bit numbers you need
unsigned char hugearray[5*size+3]; // +3 avoids overfetch of last element
__int64 get_huge(unsigned index)
{
__int64 t;
t = *(__int64 *)(&hugearray[index*5]);
if (t & 0x0000008000000000LL)
t |= 0xffffff0000000000LL;
else
t &= 0x000000ffffffffffLL;
return t;
}
void set_huge(unsigned index, __int64 value)
{
unsigned char *p = &hugearray[index*5];
*(long *)p = value;
p[4] = (value >> 32);
}
It may be faster to handle the get with two shifts.
__int64 get_huge(unsigned index)
{
return (((*(__int64 *)(&hugearray[index*5])) << 24) >> 24);
}

For the case of storing some billions of 40-bit signed integers, and assuming 8-bit bytes, you can pack 8 40-bit signed integers in a struct (in the code below using an array of bytes to do that), and, since this struct is ordinarily aligned, you can then create a logical array of such packed groups, and provide ordinary sequential indexing of that:
#include <limits.h> // CHAR_BIT
#include <stdint.h> // int64_t
#include <stdlib.h> // div, div_t, ptrdiff_t
#include <vector> // std::vector
#define STATIC_ASSERT( e ) static_assert( e, #e )
namespace cppx {
using Byte = unsigned char;
using Index = ptrdiff_t;
using Size = Index;
// For non-negative values:
auto roundup_div( const int64_t a, const int64_t b )
-> int64_t
{ return (a + b - 1)/b; }
} // namespace cppx
namespace int40 {
using cppx::Byte;
using cppx::Index;
using cppx::Size;
using cppx::roundup_div;
using std::vector;
STATIC_ASSERT( CHAR_BIT == 8 );
STATIC_ASSERT( sizeof( int64_t ) == 8 );
const int bits_per_value = 40;
const int bytes_per_value = bits_per_value/8;
struct Packed_values
{
enum{ n = sizeof( int64_t ) };
Byte bytes[n*bytes_per_value];
auto value( const int i ) const
-> int64_t
{
int64_t result = 0;
for( int j = bytes_per_value - 1; j >= 0; --j )
{
result = (result << 8) | bytes[i*bytes_per_value + j];
}
const int64_t first_negative = int64_t( 1 ) << (bits_per_value - 1);
if( result >= first_negative )
{
result = (int64_t( -1 ) << bits_per_value) | result;
}
return result;
}
void set_value( const int i, int64_t value )
{
for( int j = 0; j < bytes_per_value; ++j )
{
bytes[i*bytes_per_value + j] = value & 0xFF;
value >>= 8;
}
}
};
STATIC_ASSERT( sizeof( Packed_values ) == bytes_per_value*Packed_values::n );
class Packed_vector
{
private:
Size size_;
vector<Packed_values> data_;
public:
auto size() const -> Size { return size_; }
auto value( const Index i ) const
-> int64_t
{
const auto where = div( i, Packed_values::n );
return data_[where.quot].value( where.rem );
}
void set_value( const Index i, const int64_t value )
{
const auto where = div( i, Packed_values::n );
data_[where.quot].set_value( where.rem, value );
}
Packed_vector( const Size size )
: size_( size )
, data_( roundup_div( size, Packed_values::n ) )
{}
};
} // namespace int40
#include <iostream>
auto main() -> int
{
using namespace std;
cout << "Size of struct is " << sizeof( int40::Packed_values ) << endl;
int40::Packed_vector values( 25 );
for( int i = 0; i < values.size(); ++i )
{
values.set_value( i, i - 10 );
}
for( int i = 0; i < values.size(); ++i )
{
cout << values.value( i ) << " ";
}
cout << endl;
}

Yes, you can do that, and it will save some space for large quantities of numbers
You need a class that contains a std::vector of an unsigned integer type.
You will need member functions to store and to retrieve an integer. For example, if you want do store 64 integers of 40 bit each, use a vector of 40 integers of 64 bits each. Then you need a method that stores an integer with index in [0,64] and a method to retrieve such an integer.
These methods will execute some shift operations, and also some binary | and & .
I am not adding any more details here yet because your question is not very specific. Do you know how many integers you want to store? Do you know it during compile time? Do you know it when the program starts? How should the integers be organized? Like an array? Like a map? You should know all this before trying to squeeze the integers into less storage.

There are quite a few answers here covering implementation, so I'd like to talk about architecture.
We usually expand 32-bit values to 64-bit values to avoid overflowing because our architectures are designed to handle 64-bit values.
Most architectures are designed to work with integers whose size is a power of 2 because this makes the hardware vastly simpler. Tasks such as caching are much simpler this way: there are a large number of divisions and modulus operations which can be replaced with bit masking and shifts if you stick to powers of 2.
As an example of just how much this matters, The C++11 specification defines multithreading race-cases based on "memory locations." A memory location is defined in 1.7.3:
A memory location is either an object of scalar type or a maximal
sequence of adjacent bit-fields all having non-zero width.
In other words, if you use C++'s bitfields, you have to do all of your multithreading carefully. Two adjacent bitfields must be treated as the same memory location, even if you wish computations across them could be spread across multiple threads. This is very unusual for C++, so likely to cause developer frustration if you have to worry about it.
Most processors have a memory architecture which fetches 32-bit or 64-bit blocks of memory at a time. Thus use of 40-bit values will have a surprising number of extra memory accesses, dramatically affecting run-time. Consider the alignment issues:
40-bit word to access: 32-bit accesses 64bit-accesses
word 0: [0,40) 2 1
word 1: [40,80) 2 2
word 2: [80,120) 2 2
word 3: [120,160) 2 2
word 4: [160,200) 2 2
word 5: [200,240) 2 2
word 6: [240,280) 2 2
word 7: [280,320) 2 1
On a 64 bit architecture, one out of every 4 words will be "normal speed." The rest will require fetching twice as much data. If you get a lot of cache misses, this could destroy performance. Even if you get cache hits, you are going to have to unpack the data and repack it into a 64-bit register to use it (which might even involve a difficult to predict branch).
It is entirely possible this is worth the cost
There are situations where these penalties are acceptable. If you have a large amount of memory-resident data which is well indexed, you may find the memory savings worth the performance penalty. If you do a large amount of computation on each value, you may find the costs are minimal. If so, feel free to implement one of the above solutions. However, here are a few recommendations.
Do not use bitfields unless you are ready to pay their cost. For example, if you have an array of bitfields, and wish to divide it up for processing across multiple threads, you're stuck. By the rules of C++11, the bitfields all form one memory location, so may only be accessed by one thread at a time (this is because the method of packing the bitfields is implementation defined, so C++11 can't help you distribute them in a non-implementation defined manner)
Do not use a structure containing a 32-bit integer and a char to make 40 bytes. Most processors will enforce alignment and you wont save a single byte.
Do use homogenous data structures, such as an array of chars or array of 64-bit integers. It is far easier to get the alignment correct. (And you also retain control of the packing, which means you can divide an array up amongst several threads for computation if you are careful)
Do design separate solutions for 32-bit and 64-bit processors, if you have to support both platforms. Because you are doing something very low level and very ill-supported, you'll need to custom tailor each algorithm to its memory architecture.
Do remember that multiplication of 40-bit numbers is different from multiplication of 64-bit expansions of 40-bit numbers reduced back to 40-bits. Just like when dealing with the x87 FPU, you have to remember that marshalling your data between bit-sizes changes your result.

This begs for streaming in-memory lossless compression. If this is for a Big Data application, dense packing tricks are tactical solutions at best for what seems to require fairly decent middleware or system-level support. They'd need thorough testing to make sure one is able to recover all the bits unharmed. And the performance implications are highly non-trivial and very hardware-dependent because of interference with the CPU caching architecture (e.g. cache lines vs packing structure). Someone mentioned complex meshing structures : these are often fine-tuned to cooperate with particular caching architectures.
It's not clear from the requirements whether the OP needs random access. Given the size of the data it's more likely one would only need local random access on relatively small chunks, organised hierarchically for retrieval. Even the hardware does this at large memory sizes (NUMA). Like lossless movie formats show, it should be possible to get random access in chunks ('frames') without having to load the whole dataset into hot memory (from the compressed in-memory backing store).
I know of one fast database system (kdb from KX Systems to name one but I know there are others) that can handle extremely large datasets by seemlessly memory-mapping large datasets from backing store. It has the option to transparently compress and expand the data on-the-fly.

If what you really want is an array of 40 bit integers (which obviously you can't have), I'd just combine one array of 32 bit and one array of 8 bit integers.
To read a value x at index i:
uint64_t x = (((uint64_t) array8 [i]) << 32) + array32 [i];
To write a value x to index i:
array8 [i] = x >> 32; array32 [i] = x;
Obviously nicely encapsulated into a class using inline functions for maximum speed.
There is one situation where this is suboptimal, and that is when you do truly random access to many items, so that each access to an int array would be a cache miss - here you would get two cache misses every time. To avoid this, define a 32 byte struct containing an array of six uint32_t, an array of six uint8_t, and two unused bytes (41 2/3rd bits per number); the code to access an item is slightly more complicated, but both components of the item are in the same cache line.

Concise bit-manipulation for 64bit integer handle type

I have a 64bit integer that is used as a handle. The 64bits must be sliced into the following fields, to be accessed individually:
size : 30 bits
offset : 30 bits
invalid flag : 1 bit
immutable flag : 1 bit
type flag : 1 bit
mapped flag : 1 bit
The two ways I can think of to achieve this are:
1) Traditional bit operations (& | << >>), etc. But I find this a bit cryptic.
2) Use a bitfield struct:
#pragma pack(push, 1)
struct Handle {
uint32_t size : 30;
uint32_t offset : 30;
uint8_t invalid : 1;
uint8_t immutable : 1;
uint8_t type : 1;
uint8_t mapped : 1;
};
#pragma pack(pop)
Then accessing a field becomes very clear:
handle.invalid = 1;
But I understand bitfields are quite problematic and non-portable.
I'm looking for ways to implement this bit manipulation with the object of maximizing code clarity and readability. Which approach should I take?
Side notes:
The handle size must not exceed 64bits;
The order these fields are laid in memory is irrelevant, as long as each field size is respected;
The handles are not saved/loaded to file, so I don't have to worry about endianess.

I would go for the bitfields solution.
Bitfields are only "non-portable" if you want to store the in binary form and later read the bitfield using a different compiler or, more commonly, on a different machine architecture. This is mainly because field order is not defined by the standard.
Using bitfields within your application will be fine, and as long as you have no requirement for "binary portability" (storing your Handle in a file and reading it on a different system with code compiled by a different compiler or different processor type), it will work just fine.
Obviously, you need to do some checking, e.g. sizeof(Handle) == 8 should be done somewhere, to ensure that you get the size right, and compiler hasn't decided to put your two 30-bit values in separate 32-bit words. To improve the chances of success on multiple architectures, I'd probably define the type as:
struct Handle {
uint64_t size : 30;
uint64_t offset : 30;
uint64_t invalid : 1;
uint64_t immutable : 1;
uint64_t type : 1;
uint64_t mapped : 1;
};
There is some rule that the compiler should not "split elements", and if you define something as uint32_t, and there are only two bits left in the field, the whole 30 bits move to the next 32-bit element. [It probably works in most compilers, but just in case, using the same 64-bit type throughout is a better choice]

I recommend bit operations. Of course you should hide all those operations inside a class. Provide member functions to perform set/get operations. Judicious use of constants inside the class will make most of the operations fairly transparent. For example:
bool Handle::isMutable() const {
return bits & MUTABLE;
}
void Handle::setMutable(bool f) {
if (f)
bits |= MUTABLE;
else
bits &= ~MUTABLE;
}

2 bits size variable

I need to define a struct which has data members of size 2 bits and 6 bits.
Should I use char type for each member?Or ,in order not to waste a memory,can I use something like :2\ :6 notation?
how can I do that?
Can I define a typedef for 2 or 6 bits type?

You can use something like:
typedef struct {
unsigned char SixBits:6;
unsigned char TwoBits:2;
} tEightBits;
and then use:
tEightBits eight;
eight.SixBits = 31;
eight.TwoBits = 3;
But, to be honest, unless you're having to comply with packed data external to your application, or you're in a very memory constrained situation, this sort of memory saving is not usually worth it. You'll find your code is a lot faster if it's not having to pack and unpack data all the time with bitwise and bitshift operations.
Also keep in mind that use of any type other than _Bool, signed int or unsigned int is an issue for the implementation. Specifically, unsigned char may not work everywhere.

It's probably best to use uint8_t for something like this. And yes, use bit fields:
struct tiny_fields
{
uint8_t twobits : 2;
uint8_t sixbits : 6;
}
I don't think you can be sure that the compiler will pack this into a single byte, though. Also, you can't know how the bits are ordered, within the byte(s) that values of the the struct type occupies. It's often better to use explicit masks, if you want more control.

Personally I prefer shift operators and some macros over bit fields, so there's no "magic" left for the compiler. It is usual practice in embedded world.
#define SET_VAL2BIT(_var, _val) ( (_var) | ((_val) & 3) )
#define SET_VAL6BIT(_var, _val) ( (_var) | (((_val) & 63) << 2) )
#define GET_VAL2BIT(_var) ( (_val) & 3)
#define GET_VAL6BIT(_var) ( ((_var) >> 2) & 63 )
static uint8_t my_var;
<...>
SET_VAL2BIT(my_var, 1);
SET_VAL6BIT(my_var, 5);
int a = GET_VAL2BIT(my_var); /* a == 1 */
int b = GET_VAL6BIT(my_var); /* b == 5 */

How to use an int as an array of ints/bools?

I noticed while making a program that a lot of my int type variables never went above ten. I figure that because an int is 2 bytes at the shortest (1 if you count char), so I should be able to store 4 unsigned ints with a max value of 15 in a short int, and I know I can access each one individually using >> and <<:
short unsigned int SLWD = 11434;
S is (SLWD >> 12), L is ((SLWD << 4) >> 12),
W is ((SLWD << 8) >> 12), and D is ((SLWD << 8) >> 12)
However, I have no idea how to encompase this in a function of class, since any type of GetVal() function would have to be of type int, which defeats the purpose of isolating the bits in the first place.

First, remember the Rules of Optimization. But this is possible in C or C++ using bitfields:
struct mystruct {
unsigned int smallint1 : 3; /* 3 bits wide, values 0 -- 7 */
signed int smallint2 : 4; /* 4 bits wide, values -8 -- 7 */
unsigned int boolean : 1; /* 1 bit wide, values 0 -- 1 */
};
It's worth noting that while you gain by not requiring so much storage, you lose because it becomes more costly to access everything, since each read or write now has a bunch of bit twiddling mechanics associated with it. Given that storage is cheap, it's probably not worth it.
Edit: You can also use vector<bool> to store 1-bit bools; but beware of it because it doesn't act like a normal vector! In particular, it doesn't provide iterators. It's sufficiently different that it's fair to say a vector<bool> is not actually a vector. Scott Meyers wrote very clearly on this topic in 'Effective STL'.

In C, and for the sole purpose of saving space, you can reinterpret the unsigned short as a structure with bitfields (or use such structure without messing with reinterpretations):
#include <stdio.h>
typedef struct bf_
{
unsigned x : 4;
unsigned y : 4;
unsigned z : 4;
unsigned w : 4;
} bf;
int main(void)
{
unsigned short i = 5;
bf *bitfields = (bf *) &i;
bitfields->w = 12;
printf("%d\n", bitfields->x);
// etc..
return 0;
}

That's a very common technique. You usually allocate an array of the larger primitive type (e.g., ints or longs), and have some abstraction to deal with the mapping. If you're using an OO language, it's usually a good idea to actually define some sort of BitArray or SmartArray or something like that, and impement a getVal() that takes an index. The important thing is to make sure you hide the details of the internal representation (e.g., for when you move between platforms).
That being said, most mainstream languages already have this functionality available.
If you just want bits, WikiPedia has a good list.
If you want more than bits, you can still find something, or implement it yourself with a similar interface. Take a look at the Java BitSet for reference

force a bit field read to 32 bits

I am trying to perform a less-than-32bit read over the PCI bus to a VME-bridge chip (Tundra Universe II), which will then go onto the VME bus and picked up by the target.
The target VME application only accepts D32 (a data width read of 32bits) and will ignore anything else.
If I use bit field structure mapped over a VME window (nmap'd into main memory) I CAN read bit fields >24 bits, but anything less fails. ie :-
struct works {
unsigned int a:24;
};
struct fails {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
};
struct main {
works work;
fails fail;
}
volatile *reg = function_that_creates_and_maps_the_vme_windows_returns_address()
This shows that the struct works is read as a 32bit, but a read via fails struct of a for eg reg->fail.a is getting factored down to a X bit read. (where X might be 16 or 8?)
So the questions are :
a) Where is this scaled down? Compiler? OS? or the Tundra chip?
b) What is the actual size of the read operation performed?
I basiclly want to rule out everything but the chip. Documentation on that is on the web, but if it can be proved that the data width requested over the PCI bus is 32bits then the problem can be blamed on the Tundra chip!
edit:-
Concrete example, code was:-
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
So now I have changed it to this :-
union UPECVersion
{
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
unsigned int dummy;
};
And the base main struct :-
typedef struct SEPUMap
{
...
...
UPECVersion PECVersion;
};
So I still have to change all my baseline code
// perform dummy 32bit read
pEpuMap->PECVersion.dummy;
// get the bits out
x = pEpuMap->PECVersion.Version.minorversion;
And how do I know if the second read wont actually do a real read again, as my original code did? (Instead of using the already read bits via the union!)

Your compiler is adjusting the size of your struct to a multiple of its memory alignment setting. Almost all modern compilers do this. On some processors, variables and instructions have to begin on memory addresses that are multiples of some memory alignment value (often 32-bits or 64-bits, but the alignment depends on the processor architecture). Most modern processors don't require memory alignment anymore - but almost all of them see substantial performance benefit from it. So the compilers align your data for you for the performance boost.
However, in many cases (such as yours) this isn't the behavior you want. The size of your structure, for various reasons, can turn out to be extremely important. In those cases, there are various ways around the problem.
One option is to force the compiler to use different alignment settings. The options for doing this vary from compiler to compiler, so you'll have to check your documentation. It's usually a #pragma of some sort. On some compilers (the Microsoft compilers, for instance) it's possible to change the memory alignment for only a very small section of code. For example (in VC++):
#pragma pack(push) // save the current alignment
#pragma pack(1) // set the alignment to one byte
// Define variables that are alignment sensitive
#pragma pack(pop) // restore the alignment
Another option is to define your variables in other ways. Intrinsic types are not resized based on alignment, so instead of your 24-bit bitfield, another approach is to define your variable as an array of bytes.
Finally, you can just let the compilers make the structs whatever size they want and manually record the size that you need to read/write. As long as you're not concatenating structures together, this should work fine. Remember, however, that the compiler is giving you padded structs under the hood, so if you make a larger struct that includes, say, a works and a fails struct, there will be padded bits in between them that could cause you problems.
On most compilers, it's going to be darn near impossible to create a data type smaller than 8 bits. Most architectures just don't think that way. This shouldn't be a huge problem because most hardware devices that use datatypes of smaller than 8-bits end up arranging their packets in such a way that they still come in 8-bit multiples, so you can do the bit manipulations to extract or encode the values on the data stream as it leaves or comes in.
For all of the reasons listed above, a lot of code that works with hardware devices like this work with raw byte arrays and just encode the data within the arrays. Despite losing a lot of the conveniences of modern language constructs, it ends up just being easier.

I am wondering about the value of sizeof(struct fails). Is it 1? In this case, if you perform the read by dereferencing a pointer to a struct fails, it looks correct to issue a D8 read on the VME bus.
You can try to add a field unsigned int unused:29; to your struct fails.

The size of a struct is not equal to the sum of the size of its fields, including bit fields. Compilers are allowed, by the C and C++ language specifications, to insert padding between fields in a struct. Padding is often inserted for alignment purposes.
The common method in embedded systems programming is to read the data as an unsigned integer then use bit masking to retrieve the interesting bits. This is due to the above rule that I stated and the fact that there is no standard compiler parameter for "packing" fields in a structure.
I suggest creating an object ( class or struct) for interfacing with the hardware. Let the object read the data, then extract the bits as bool members. This puts the implementation as close to the hardware. The remaining software should not care how the bits are implemented.
When defining bit field positions / named constants, I suggest this format:
#define VALUE (1 << BIT POSITION)
// OR
const unsigned int VALUE = 1 << BIT POSITION;
This format is more readable and has the compiler perform the arithmetic. The calculation takes place during compilation and has no impact during run-time.

As an example, the Linux kernel has inline functions that explicitly handle memory-mapped IO reads and writes. In newer kernels it's a big macro wrapper that boils down to an inline assembly movl instruction, but it older kernels it was defined like this:
#define readl(addr) (*(volatile unsigned int *) (addr))
#define writel(b,addr) ((*(volatile unsigned int *) (addr)) = (b))

Ian - if you want to be sure as to the size of things you're reading/writing I'd suggest not using structs like this to do it - it's possible the sizeof of the fails struct is just 1 byte - the compiler is free to decide what it should be based on optimizations etc- I'd suggest reading/writing explicitly using int's or generally the things you need to assure the sizes of and then doing something else like converting to a union/struct where you don't have those limitations.

It is the compiler that decides what size read to issue. To force a 32 bit read, you could use a union:
union dev_word {
struct dev_reg {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
} fail;
uint32_t dummy;
};
volatile union dev_word *vme_map_window();
If reading the union through a volatile-qualified pointer isn't enough to force a read of the whole union (I would think it would be - but that could be compiler-dependent), then you could use a function to provide the required indirection:
volatile union dev_word *real_reg; /* Initialised with vme_map_window() */
union dev_word * const *reg_func(void)
{
static union dev_word local_copy;
static union dev_word * const static_ptr = &local_copy;
local_copy = *real_reg;
return &static_ptr;
}
#define reg (*reg_func())
...then (for compatibility with the existing code) your accesses are done as:
reg->fail.a

The method described earlier of using the gcc flag -fstrict-volatile-bitfields and defining bitfield variables as volatile u32 works, but the total number of bits defined must be greater than 16.
For example:
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
};
}tFlashACR;
.
tFLASH* const pFLASH = (tFLASH*)FLASH_BASE;
#define FLASH_LATENCY pFLASH->ACR.LATENCY
.
FLASH_LATENCY = Latency;
causes gcc to generate code
.
ldrb r1, [r3, #0]
.
which is a byte read. However, changing the typedef to
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
vu32 :2;
vu32 DUMMY1 :8;
vu32 DUMMY2 :8;
};
}tFlashACR;
changes the resultant code to
.
ldr r3, [r2, #0]
.

I believe the only solution is to
1) edit/create my main struct as all 32bit ints (unsigned longs)
2) keep my original bit-field structs
3) each access I require,
3.1) I have to read the struct member as a 32bit word, and cast it into the bit-field struct,
3.2) read the bit-field element I require. (and for writes, set this bit-field, and write the word back!)
(1) Which is a same, because then I lose the intrinsic types that each member of the "main/SEPUMap" struct are.
End solution :-
Instead of :-
printf("FirmwareVersionMinor: 0x%x\n", pEpuMap->PECVersion);
This :-
SPECVersion ver = *(SPECVersion*)&pEpuMap->PECVersion;
printf("FirmwareVersionMinor: 0x%x\n", ver.minorversion);
Only problem I have is writting! (Writes are now Read/Modify/Writes!)
// Read - Get current
_HVPSUControl temp = *(_HVPSUControl*)&pEpuMap->HVPSUControl;
// Modify - set to new value
temp.OperationalRequestPort = true;
// Write
volatile unsigned int *addr = reinterpret_cast<volatile unsigned int*>(&pEpuMap->HVPSUControl);
*addr = *reinterpret_cast<volatile unsigned int*>(&temp);
Just have to tidy that code up into a method!
#define writel(addr, data) ( *(volatile unsigned long*)(&addr) = (*(volatile unsigned long*)(&data)) )

I had same problem on ARM using GCC compiler, where write into memory is only through bytes rather than 32bit word.
The solution is to define bit-fields using volatile uint32_t (or required size to write):
union {
volatile uint32_t XY;
struct {
volatile uint32_t XY_A : 4;
volatile uint32_t XY_B : 12;
};
};
but while compiling you need add to gcc or g++ this parameter:
-fstrict-volatile-bitfields
more in gcc documentation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js