Two values in one byte - c++

In a single nibble (0-F) I can store one number from 0 to 15. In one byte, I can store a single number from 0 to 255 (00 - FF).
Can I use a byte (00-FF) to store two different numbers each in the range 0-127 (00 - 7F)?

The answer to your question is NO. You can split a single byte into two numbers, but the sum of the bits in the two numbers must be <= 8. Since, the range 0-127 requires 7 bits, the other number in the byte can only be 1 bit, i.e. 0-1.

For obvious cardinality reasons, you cannot store two small integers in the 0 ... 127 range in one byte of 0 ... 255 range. In other words the cartesian product [0;127]×[0;127] has 214 elements which is bigger than 28 (the cardinal of the [0;255] interval, for bytes)
(If you can afford losing precision - which you didn't tell - you could, e.g. by storing only the highest bits ...)
Perhaps your question is: could I store two small integers from [0;15] in a byte? Then of course you could:
typedef unsigned unibble_t; // unsigned nibble in [0;15]
uint8_t make_from_two_nibbles(unibble_t l, unibble_t r) {
assert(l<=15);
assert(r<=15);
return (l<<4) | r;
}
unibble_t left_nible (uint8_t x) { return x >> 4; }
unibble_t right_nibble (uint8_t) { return x & 0xf; }
But I don't think you always should do that. First, you might use bit fields in struct. Then (and most importantly) dealing with nibbles that way might be more inefficient and make less readable code than using bytes.
And updating a single nibble, e.g. with
void update_left_nibble (uint8_t*p, unibble_t l) {
assert (p);
assert (l<=15);
*p = ((l<<4) | ((*p) & 0xf));
}
is sometimes expensive (it involves a memory load and a memory store, so uses the CPU cache and cache coherence machinery), and most importantly is generally a non-atomic operation (what would happen if two different threads are calling simultaneously update_left_nibble on the same address p -i.e. with pointer aliasing- is undefined behavior).
As a rule of thumb, avoid packing more than one data item in a byte unless you are sure it is worthwhile (e.g. you have a billion of such data items).

One byte is not enough for two values in 0…127, because each of those values needs log2(128) = 7 bits, for a total of 14, but a byte is only 8 bits.
You can declare variables with bit-packed storage using the C and C++ bitfield syntax:
struct packed_values {
uint8_t first : 7;
uint8_t second : 7;
uint8_t third : 2;
};
In this example, sizeof(packed_values) should equal 2 because only 16 bits were used, despite having three fields.
This is simpler than using bitwise arithmetic with << and & operators, but it's still not quite the same as ordinary variables: bit-fields have no addresses, so you can't have a pointer (or C++ reference) to one.

Can I use a byte to store two numbers in the range 0-127?
Of course you can:
uint8_t storeTwoNumbers(unsigned a, unsigned b) {
return ((a >> 4) & 0x0f) | (b & 0xf0);
}
uint8_t retrieveTwoNumbers(uint8_t byte, unsigned *a, unsigned *b) {
*b = byte & 0xf0;
*a = (byte & 0x0f) << 4;
}
Numbers are still in range 0...127 (0...255, actually). You just loose some precision, similar to floating point types. Their values increment in steps of 16.

You can store two data in range 0-15 in a single byte, but you should not (one var = one data is a better design).
If you must, you can use bit-masks and bit-shifts to access to the two data in your variable.
uint8_t var; /* range 0-255 */
data1 = (var & 0x0F); /* range 0-15 */
data2 = (var & 0xF0) >> 4; /* range 0-15 */

Related

How to build N bits variables in C++?

I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.
For small N, for example 24, I am just using unsigned long int. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much.
With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.
I tried std::bitset, but std::bitset< N > creates a element of at least 8B.
So a list of std::bitset< 1 > is much greater than a list of unsigned long int.
It is because the std::bitset just change the representation, not the container.
I also tried boost::dynamic_bitset<> from Boost, but the result is even worst (at least 32B!), for the same reason.
I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.
How to build a N-bits variable in C++?
How about a struct with 5 chars (and perhaps some fancy operator overloading as needed to keep it compatible to the existing code)? A struct with a long and a char probably won't work because of padding / alignment...
Basically your own mini BitSet optimized for size:
struct Bitset40 {
unsigned char data[5];
bool getBit(int index) {
return (data[index / 8] & (1 << (index % 8))) != 0;
}
bool setBit(int index, bool newVal) {
if (newVal) {
data[index / 8] |= (1 << (index % 8));
} else {
data[index / 8] &= ~(1 << (index % 8));
}
}
};
Edit: As geza has also pointed out int he comments, the "trick" here is to get as close as possible to the minimum number of bytes needed (without wasting memory by triggering alignment losses, padding or pointer indirection, see http://www.catb.org/esr/structure-packing/).
Edit 2: If you feel adventurous, you could also try a bit field (and please let us know how much space it actually consumes):
struct Bitset36 {
unsigned long long data:36;
}
I'm not an expert, but this is what I would "try". Find the bytes for the smallest type your compiler supports (should be char). You can check with sizeof and you should get 1. That means 1 byte, so 8 bits.
So if you wanted a 24 bit type...you would need 3 chars. For 36 you would need 5 char array and you would have 4 bits of wasted padding on the end. This could easily be accounted for.
i.e.
char typeSize[3] = {0}; // should hold 24 bits
Now make a bit mask to access each position of typeSize.
const unsigned char one = 0b0000'0001;
const unsigned char two = 0b0000'0010;
const unsigned char three = 0b0000'0100;
const unsigned char four = 0b0000'1000;
const unsigned char five = 0b0001'0000;
const unsigned char six = 0b0010'0000;
const unsigned char seven = 0b0100'0000;
const unsigned char eight = 0b1000'0000;
Now you can use the bit-wise or to set the values to 1 where needed..
typeSize[1] |= four;
*typeSize[0] |= (four | five);
To turn off bits use the & operator..
typeSize[0] &= ~four;
typeSize[2] &= ~(four| five);
You can read the position of each bit with the & operator.
typeSize[0] & four
Bear in mind, I don't have a compiler handy to try this out so hopefully this is a useful approach to your problem.
Good luck ;-)
You can use array of unsigned long int and store and retrieve needed bit chains with bitwise operations. This approach excludes space overhead.
Simplified example for unsigned byte array B[] and 12-bit variables V (represented as ushort):
Set V[0]:
B[0] = V & 0xFF; //low byte
B[1] = B[1] & 0xF0; // clear low nibble
B[1] = B[1] | (V >> 8); //fill low nibble of the second byte with the highest nibble of V

Use bit manipulation to convert a bit from each byte in an 8-byte number to a single byte

I have a 64-bit unsigned integer. I want to check the 6th bit of each byte and return a single byte representing those 6th bits.
The obvious, "brute force" solution is:
inline const unsigned char Get6thBits(unsigned long long num) {
unsigned char byte(0);
for (int i = 7; i >= 0; --i) {
byte <<= 1;
byte |= bool((0x20 << 8 * i) & num);
}
return byte;
}
I could unroll the loop into a bunch of concatenated | statements to avoid the int allocation, but that's still pretty ugly.
Is there a faster, more clever way to do it? Maybe use a bitmask to get the 6th bits, 0x2020202020202020 and then somehow convert that to a byte?
If _pext_u64 is a possibility (this will work on Haswell and newer, it's very slow on Ryzen though), you could write this:
int extracted = _pext_u64(num, 0x2020202020202020);
This is a really literal way to implement it. pext takes a value (first argument) and a mask (second argument), at every position that the mask has a set bit it takes the corresponding bit from the value, and all bits are concatenated.
_mm_movemask_epi8 is more widely usable, you could use it like this:
__m128i n = _mm_set_epi64x(0, num);
int extracted = _mm_movemask_epi8(_mm_slli_epi64(n, 2));
pmovmskb takes the high bit of every byte in its input vector and concatenates them. The bits we want are not the high bit of every byte, so I move them up two positions with psllq (of course you could shift num directly). The _mm_set_epi64x is just some way to get num into a vector.
Don't forget to #include <intrin.h>, and none of this was tested.
Codegen seems reasonable enough
A weirder option is gathering the bits with a multiplication: (only slightly tested)
int extracted = (num & 0x2020202020202020) * 0x08102040810204 >> 56;
The idea here is that num & 0x2020202020202020 only has very few bits set, so we can arrange a product that never carries into bits that we need (or indeed at all). The multiplier is constructed to do this:
a0000000b0000000c0000000d0000000e0000000f0000000g0000000h0000000 +
0b0000000c0000000d0000000e0000000f0000000g0000000h00000000000000 +
00c0000000d0000000e0000000f0000000g0000000h000000000000000000000 etc..
Then the top byte will have all the bits "compacted" together. The lower bytes actually have something like that too, but they're missing bits that would have to come from "higher" (bits can only move to the left in a multiplication).

How can i store 2 numbers in a 1 byte char?

I have the question of the title, but If not, how could I get away with using only 4 bits to represent an integer?
EDIT really my question is how. I am aware that there are 1 byte data structures in a language like c, but how could I use something like a char to store two integers?
In C or C++ you can use a struct to allocate the required number of bits to a variable as given below:
#include <stdio.h>
struct packed {
unsigned char a:4, b:4;
};
int main() {
struct packed p;
p.a = 10;
p.b = 20;
printf("p.a %d p.b %d size %ld\n", p.a, p.b, sizeof(struct packed));
return 0;
}
The output is p.a 10 p.b 4 size 1, showing that p takes only 1 byte to store, and that numbers with more than 4 bits (larger than 15) get truncated, so 20 (0x14) becomes 4. This is simpler to use than the manual bitshifting and masking used in the other answer, but it is probably not any faster.
You can store two 4-bit numbers in one byte (call it b which is an unsigned char).
Using hex is easy to see that: in b=0xAE the two numbers are A and E.
Use a mask to isolate them:
a = (b & 0xF0) >> 4
and
e = b & 0x0F
You can easily define functions to set/get both numbers in the proper portion of the byte.
Note: if the 4-bit numbers need to have a sign, things can become a tad more complicated since the sign must be extended correctly when packing/unpacking.

Extremely fast hash function with collisions allowed

My key is a 64 bit address and the output is a 1 byte number (0-255). Collisions are allowed but the probability of them occurring should be low. Also, assume that number of elements to be inserted are low, lets say not more than 255, as to minimize the pigeon hole effect.
The addresses are addresses of the functions in the program.
uint64_t addr = ...
uint8_t hash = addr & 0xFF;
I think that meets all of your requirements.
I would XOR together the 2 LSB (least significant bytes), if this distribues badly, then add a 3rd one, and so forth
The rationale behind this is the following: function addresses do not distribute uniformly. The problem normally lies in the lower (lsb) bits. Functions usually need to begin in addresses divisible by 4/8/16 so the 2-4 lsb are probably meaningless. By XORing with the next byte, you should get rid of most of these problems and it's still pretty fast.
Function addresses are, I think, quite likely to be aligned (see this question, for instance). That seems to indicate that you want to skip least significant bits, depending on the alignment.
So, perhaps take the 8 bits starting from bit 3, i.e. skipping the least significant 3 bits (bits 0 through 2):
const uint8_t hash = (address >> 3);
This should be obvious from inspection of your set of addresses. In hex, watch the rightmost digit.
How about:
uint64_t data = 0x12131212121211B12;
uint32_t d1 = (data >> 32) ^ (uint32_t)(data);
uint16_t d2 = (d1 >> 16) ^ (uint16_t)(d1);
uint8_t d3 = (d2 >> 8) ^ (uint8_t)(d2);
return d3;
It combined all bits of your 8 bytes with 3 shifts and three xor instructions.

How to create a byte out of 8 bool values (and vice versa)?

I have 8 bool variables, and I want to "merge" them into a byte.
Is there an easy/preferred method to do this?
How about the other way around, decoding a byte into 8 separate boolean values?
I come in assuming it's not an unreasonable question, but since I couldn't find relevant documentation via Google, it's probably another one of those "nonono all your intuition is wrong" cases.
The hard way:
unsigned char ToByte(bool b[8])
{
unsigned char c = 0;
for (int i=0; i < 8; ++i)
if (b[i])
c |= 1 << i;
return c;
}
And:
void FromByte(unsigned char c, bool b[8])
{
for (int i=0; i < 8; ++i)
b[i] = (c & (1<<i)) != 0;
}
Or the cool way:
struct Bits
{
unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
union CBits
{
Bits bits;
unsigned char byte;
};
Then you can assign to one member of the union and read from another. But note that the order of the bits in Bits is implementation defined.
Note that reading one union member after writing another is well-defined in ISO C99, and as an extension in several major C++ implementations (including MSVC and GNU-compatible C++ compilers), but is Undefined Behaviour in ISO C++. memcpy or C++20 std::bit_cast are the safe ways to type-pun in portable C++.
(Also, the bit-order of bitfields within a char is implementation defined, as is possible padding between bitfield members.)
You might want to look into std::bitset. It allows you to compactly store booleans as bits, with all of the operators you would expect.
No point fooling around with bit-flipping and whatnot when you can abstract away.
The cool way (using the multiplication technique)
inline uint8_t pack8bools(bool* a)
{
uint64_t t;
memcpy(&t, a, sizeof t); // strict-aliasing & alignment safe load
return 0x8040201008040201ULL*t >> 56;
// bit order: a[0]<<7 | a[1]<<6 | ... | a[7]<<0 on little-endian
// for a[0] => LSB, use 0x0102040810204080ULL on little-endian
}
void unpack8bools(uint8_t b, bool* a)
{
// on little-endian, a[0] = (b>>7) & 1 like printing order
auto MAGIC = 0x8040201008040201ULL; // for opposite order, byte-reverse this
auto MASK = 0x8080808080808080ULL;
uint64_t t = ((MAGIC*b) & MASK) >> 7;
memcpy(a, &t, sizeof t); // store 8 bytes without UB
}
Assuming sizeof(bool) == 1
To portably do LSB <-> a[0] (like the pext/pdep version below) instead of using the opposite of host endianness, use htole64(0x0102040810204080ULL) as the magic multiplier in both versions. (htole64 is from BSD / GNU <endian.h>). That arranges the multiplier bytes to match little-endian order for the bool array. htobe64 with the same constant gives the other order, MSB-first like you'd use for printing a number in base 2.
You may want to make sure that the bool array is 8-byte aligned (alignas(8)) for performance, and that the compiler knows this. memcpy is always safe for any alignment, but on ISAs that require alignment, a compiler can only inline memcpy as a single load or store instruction if it knows the pointer is sufficiently aligned. *(uint64_t*)a would promise alignment, but also violate the strict-aliasing rule. Even on ISAs that allow unaligned loads, they can be faster when naturally aligned. But the compiler can still inline memcpy without seeing that guarantee at compile time.
How they work
Suppose we have 8 bools b[0] to b[7] whose least significant bits are named a-h respectively that we want to pack into a single byte. Treating those 8 consecutive bools as one 64-bit word and load them we'll get the bits in reversed order in a little-endian machine. Now we'll do a multiplication (here dots are zero bits)
| b7 || b6 || b4 || b4 || b3 || b2 || b1 || b0 |
.......h.......g.......f.......e.......d.......c.......b.......a
× 1000000001000000001000000001000000001000000001000000001000000001
────────────────────────────────────────────────────────────────
↑......h.↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
↑....f...↑...e....↑..d.....↑.c......↑b.......a
+ ↑...e....↑..d.....↑.c......↑b.......a
↑..d.....↑.c......↑b.......a
↑.c......↑b.......a
↑b.......a
a
────────────────────────────────────────────────────────────────
= abcdefghxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The arrows are added so it's easier to see the position of the set bits in the magic number. At this point 8 least significant bits has been put in the top byte, we'll just need to mask the remaining bits out
So the magic number for packing would be 0b1000000001000000001000000001000000001000000001000000001000000001 or 0x8040201008040201. If you're on a big endian machine you'll need to use the magic number 0x0102040810204080 which is calculated in a similar manner
For unpacking we can do a similar multiplication
| b7 || b6 || b4 || b4 || b3 || b2 || b1 || b0 |
abcdefgh
× 1000000001000000001000000001000000001000000001000000001000000001
────────────────────────────────────────────────────────────────
= h0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh
& 1000000010000000100000001000000010000000100000001000000010000000
────────────────────────────────────────────────────────────────
= h0000000g0000000f0000000e0000000d0000000c0000000b0000000a0000000
After multiplying we have the needed bits at the most significant positions, so we need to mask out irrelevant bits and shift the remaining ones to the least significant positions. The output will be the bytes contain a to h in little endian.
The efficient way
On newer x86 CPUs with BMI2 there are PEXT and PDEP instructions for this purpose. The pack8bools function above can be replaced with
_pext_u64(*((uint64_t*)a), 0x0101010101010101ULL);
And the unpack8bools function can be implemented as
_pdep_u64(b, 0x0101010101010101ULL);
(This maps LSB -> LSB, like a 0x0102040810204080ULL multiplier constant, opposite of 0x8040201008040201ULL. x86 is little-endian: a[0] = (b>>0) & 1; after memcpy.)
Unfortunately those instructions are very slow on AMD before Zen 3 so you may need to compare with the multiplication method above to see which is better
The other fast way is SSE2
x86 SIMD has an operation that takes the high bit of every byte (or float or double) in a vector register, and gives it to you as an integer. The instruction for bytes is pmovmskb. This can of course do 16 bytes at a time with the same number of instructions, so it gets better than the multiply trick if you have lots of this to do.
#include <immintrin.h>
inline uint8_t pack8bools_SSE2(const bool* a)
{
__m128i v = _mm_loadl_epi64( (const __m128i*)a ); // 8-byte load, despite the pointer type.
// __m128 v = _mm_cvtsi64_si128( uint64 ); // alternative if you already have an 8-byte integer
v = _mm_slli_epi32(v, 7); // low bit of each byte becomes the highest
return _mm_movemask_epi8(v);
}
There isn't a single instruction to unpack until AVX-512, which has mask-to-vector instructions. It is doable with SIMD, but likely not as efficiently as the multiply trick. See Convert 16 bits mask to 16 bytes mask and more generally is there an inverse instruction to the movemask instruction in intel avx2? for unpacking bitmaps to other element sizes.
How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD has some answers specifically for 8-bits -> 8-bytes, but if you can't do 16 bits at a time for that direction, the multiply trick is probably better, and pext certainly is (except on CPUs where it's disastrously slow, like AMD before Zen 3).
#include <stdint.h> // to get the uint8_t type
uint8_t GetByteFromBools(const bool eightBools[8])
{
uint8_t ret = 0;
for (int i=0; i<8; i++) if (eightBools[i] == true) ret |= (1<<i);
return ret;
}
void DecodeByteIntoEightBools(uint8_t theByte, bool eightBools[8])
{
for (int i=0; i<8; i++) eightBools[i] = ((theByte & (1<<i)) != 0);
}
bool a,b,c,d,e,f,g,h;
//do stuff
char y= a<<7 | b<<6 | c<<5 | d<<4 | e <<3 | f<<2 | g<<1 | h;//merge
although you are probably better off using a bitset
http://www.cplusplus.com/reference/stl/bitset/bitset/
I'd like to note that type punning through unions is UB in C++ (as rodrigo does in his answer. The safest way to do that is memcpy()
struct Bits
{
unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
unsigned char toByte(Bits b){
unsigned char ret;
memcpy(&ret, &b, 1);
return ret;
}
As others have said, the compiler is smart enough to optimize out memcpy().
BTW, this is the way that Boost does type punning.
There is no way to pack 8 bool variables into one byte. There is a way packing 8 logical true/false states in a single byte using Bitmasking.
You would use the bitwise shift operation and casting to archive it. a function could work like this:
unsigned char toByte(bool *bools)
{
unsigned char byte = \0;
for(int i = 0; i < 8; ++i) byte |= ((unsigned char) bools[i]) << i;
return byte;
}
Thanks Christian Rau for the correction s!