Reading binary integers - c++

const unsigned char* p;
int64_t u = ...; // ??
What's the recommended way to read a 64-bit binary little endian integer from the 8 bytes pointed to by p?
On x64 a single machine instruction should do, but on big-endian hardware swaps are needed.
How does one do this both optimally and portably?
Carl's solution is good, portable enough but not optimal. This begs the question: why doesn't C/C++ provide a better and standardized way to do this? It's not an uncommon construct.

The commonly seen:
u = (int64_t)(((uint64_t)p[0] << 0)
+ ((uint64_t)p[1] << 8)
+ ((uint64_t)p[2] << 16)
+ ((uint64_t)p[3] << 24)
+ ((uint64_t)p[4] << 32)
+ ((uint64_t)p[5] << 40)
+ ((uint64_t)p[6] << 48)
+ ((uint64_t)p[7] << 56));
Is pretty much the only game in town for portability - it's otherwise tough to avoid potential alignment problems.
This answer does assume an 8-bit char. If you might need to support different sized chars, you'll need a preprocessor definition that checks CHAR_BIT and does the right thing for each.

Carl Norum is right - if you want to be readable as well, you can write a loop (the compiler will unroll it anyway). This will also nicely deal with non-8-bit chars.
u = 0;
const int n = 64 / CHAR_BIT + !!(64 % CHAR_BIT);
for (int i = 0; i < n; i++) {
u += (uint64_t)p[i] << (i * CHAR_BIT);
}

I used the following code to reverse the byte order for any variable. I used it to convert between different "Endianess".
// Reverses the order of bytes in the specified data
void ReverseBytes(LPBYTE pData, int nSize)
{
int i, j;
for (i = 0, j = nSize - 1; i < j; i++, j--)
{
BYTE nTemp = pData[i];
pData[i] = pData[j];
pData[j] = nTemp;
}
}

Related

Convert uint64_t to uint8_t[8]

How can I convert uint64_t to uint8_t[8] without loosing information in C++?
I tried the following:
uint64_t number = 23425432542254234532;
uint8_t result[8];
for(int i = 0; i < 8; i++) {
std::memcpy(result[i], number, 1);
}
You are almost there. Firstly, the literal 23425432542254234532 is too big to fit in uint64_t.
Secondly, as you can see from the documentation, std::memcpy has the following declaration:
void * memcpy ( void * destination, const void * source, size_t num );
As you can see, it takes pointers (addresses) as arguments. Not uint64_t, nor uint8_t. You can easily get the address of the integer using the address-of operator.
Thridly, you are only copying the first byte of the integer into each array element. You would need to increment the input pointer in every iteration. But the loop is unnecessary. You can copy all bytes in one go like this:
std::memcpy(result, &number, sizeof number);
Do realize that the order of the bytes depend on the endianness of the cpu.
First, do you want the conversion to be big-endian, or little-endian? Most of the previous answers are going to start giving you the bytes in the opposite order, and break your program,` as soon as you switch architectures.
If you need to get consistent results, you would want to convert your 64-bit input into big-endian (network) byte order, or perhaps to little-endian. For example, on GNU glib, the function is GUINT64_TO_BE(), but there is an equivalent built-in function for most compilers.
Having done that, there are several alternatives:
Copy with memcpy() or memmove()
This is the method that the language standard guarantees will work, although here I use one function from a third-party library (to convert the argument to big-endian byte order on all platforms). For example:
#include <stdint.h>
#include <stdlib.h>
#include <glib.h>
union eight_bytes {
uint64_t u64;
uint8_t b8[sizeof(uint64_t)];
};
eight_bytes u64_to_eight_bytes( const uint64_t input )
{
eight_bytes result;
const uint64_t big_endian = (uint64_t)GUINT64_TO_BE((guint64)input);
memcpy( &result.b8, &big_endian, sizeof(big_endian) );
return result;
}
On Linux x86_64 with clang++ -std=c++17 -O, this compiles to essentially the instructions:
bswapq %rdi
movq %rdi, %rax
retq
If you wanted the results in little-endian order on all platforms, you could replace GUINT64_TO_BE() with GUINT64_TO_LE() and remove the first instruction, then declare the function inline to remove the third instruction. (Or, if you’re certain that cross-platform compatibility does not matter, you might risk just omitting the normalization.)
So, on a modern, 64-bit compiler, this code is just as efficient as anything else. On another target, it might not be.
Type-Punning
The common way to write this in C would be to declare the union as before, set its uint64_t member, and then read its uint8_t[8] member. This is legal in C.
I personally like it because it allows me to express the entire operation as static single assignments.
However, in C++, it is formally undefined behavior. In practice, all C++ compilers I’m aware of support it for Plain Old Data (the formal term in the language standard), of the same size, with no padding bits, but not for more complicated classes that have virtual function tables and the like. It seems more likely to me that a future version of the Standard will officially support type-punning on POD than that any important compiler will ever break it silently.
The C++ Guidelines Way
Bjarne Stroustrup recommended that, if you are going to type-pun instead of copying, you use reinterpret_cast, such as
uint8_t (&array_of_bytes)[sizeof(uint64_t)] =
*reinterpret_cast<uint8_t(*)[sizeof(uint64_t)]>(
&proper_endian_uint64);
His reasoning was that both an explicit cast and type-punning through a union are undefined behavior, but the cast makes it blatant and unmistakable that you are shooting yourself in the foot on purpose, whereas reading a different union member than the active one can be a very subtle bug.
If I understand correctly you can do this that way for instance:
uint64_t number = 23425432542254234532;
uint8_t *p = (uint8_t *)&number;
//if you need a copy
uint8_t result[8];
for(int i = 0; i < 8; i++) {
result[i] = p[i];
}
When copying memory around between incompatible types, the first thing to be aware of is strict aliasing - you don't want to alias pointers incorrectly. Alignment is also to be considered.
You were almost there, the for is not needed.
uint64_t number = 0x2342543254225423; // trimmed to fit
uint8_t result[sizeof(number)];
std::memcpy(result, &number, sizeof(number));
Note: be aware of the endianness of the platform as well.
Either use a union, or do it with bitwise operations- memcpy is for blocks of memory and might not be the best option here.
uint64_t number = 23425432542254234532;
uint8_t result[8];
for(int i = 0; i < 8; i++) {
result[i] = uint8_t((number >> 8*(7 - i)) & 0xFF);
}
Or, although I'm told this breaks the rules, it works on my compiler:
union
{
uint64_t a;
uint8_t b[8];
};
a = 23425432542254234532;
//Can now read off the value of b
uint8_t copy[8];
for(int i = 0; i < 8; i++)
{
copy[i]= b[i];
}
The packing and unpacking can be done with masks. One more thing to worry about is the byte order. packing and unpacking should use the same byte order. Beware - This is not super efficient implementation and do not come with guarantees on small CPU that are not native 64-bit.
void unpack_uint64(uint64_t number, uint8_t *result) {
result[0] = number & 0x00000000000000FF ; number = number >> 8 ;
result[1] = number & 0x00000000000000FF ; number = number >> 8 ;
result[2] = number & 0x00000000000000FF ; number = number >> 8 ;
result[3] = number & 0x00000000000000FF ; number = number >> 8 ;
result[4] = number & 0x00000000000000FF ; number = number >> 8 ;
result[5] = number & 0x00000000000000FF ; number = number >> 8 ;
result[6] = number & 0x00000000000000FF ; number = number >> 8 ;
result[7] = number & 0x00000000000000FF ;
}
uint64_t pack_uint64(uint8_t *buffer) {
uint64_t value ;
value = buffer[7] ;
value = (value << 8 ) + buffer[6] ;
value = (value << 8 ) + buffer[5] ;
value = (value << 8 ) + buffer[4] ;
value = (value << 8 ) + buffer[3] ;
value = (value << 8 ) + buffer[2] ;
value = (value << 8 ) + buffer[1] ;
value = (value << 8 ) + buffer[0] ;
return value ;
}
#include<cstdint>
#include<iostream>
struct ByteArray
{
uint8_t b[8] = { 0,0,0,0,0,0,0,0 };
};
ByteArray split(uint64_t x)
{
ByteArray pack;
const uint8_t MASK = 0xFF;
for (auto i = 0; i < 7; ++i)
{
pack.b[i] = x & MASK;
x = x >> 8;
}
return pack;
}
int main()
{
uint64_t val_64 = UINT64_MAX;
auto pack = split(val_64);
for (auto i = 0; i < 7; ++i)
{
std::cout << (uint32_t)pack.b[i] << std::endl;
}
system("Pause");
return 0;
}
Although union approach which is addressed by Straw1239 is better and cleaner.Please do care about compiler/platform compatibility with endianness.

Get exact bit representation of a double, in C++

Let us say that we have a double, say, x = 4.3241;
Quite simply, I would like to know, how in C++, can one simply retrieve an int for each bit in the representation of a number?
I have seen other questions and read the page on bitset, but I'm afraid I still do not understand how to retrieve those bits.
So, for example, I would like the input to be x = 4.53, and if the bit representation was 10010101, then I would like 8 ints, each one representing each 1 or 0.
Something like:
double doubleValue = ...whatever...;
uint8_t *bytePointer = (uint8_t *)&doubleValue;
for(size_t index = 0; index < sizeof(double); index++)
{
uint8_t byte = bytePointer[index];
for(int bit = 0; bit < 8; bit++)
{
printf("%d", byte&1);
byte >>= 1;
}
}
... will print the bits out, ordered from least significant to most significant within bytes and reading the bytes from first to last. Depending on your machine architecture that means the bytes may or may not be in order of significance. Intel is strictly little endian so you should get all bits from least significant to most; most CPUs use the same endianness for floating point numbers as for integers but even that's not guaranteed.
Just allocate an array and store the bits instead of printing them.
(an assumption made: that there are eight bits in a byte; not technically guaranteed in C but fairly reliable on any hardware you're likely to encounter nowadays)
This is extremely architecture-dependent. After gathering the following information
The Endianess of your target architecture
The floating point representation (e.g. IEEE754)
The size of your double type
you should be able to get the bit representation you're searching for. An example tested on a x86_64 system
#include <iostream>
#include <climits>
int main()
{
double v = 72.4;
// Boilerplate to circumvent the fact bitwise operators can't be applied to double
union {
double value;
char array[sizeof(double)];
};
value = v;
for (int i = 0; i < sizeof(double) * CHAR_BIT; ++i) {
int relativeToByte = i % CHAR_BIT;
bool isBitSet = (array[sizeof(double) - 1 - i / CHAR_BIT] &
(1 << (CHAR_BIT - relativeToByte - 1))) == (1 << (CHAR_BIT - relativeToByte - 1));
std::cout << (isBitSet ? "1" : "0");
}
return 0;
}
Live Example
The output is
0100000001010010000110011001100110011001100110011001100110011010
which, split into sign, exponent and significand (or mantissa), is
0 10000000101 (1.)0010000110011001100110011001100110011001100110011010
(Image taken from wikipedia)
Anyway you're required to know how your target representation works, otherwise these numbers will pretty much be useless to you.
Since your question is unclear whether you want those integers to be in the order that makes sense with regard to the internal representation of your number of simply dump out the bytes at that address as you encounter them, I'm adding another easier method to just dump out every byte at that address (and showing another way of dealing with bit operators and double)
double v = 72.4;
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int bit = CHAR_BIT - 1; bit >= 0; --bit) // Print each byte
std::cout << ((byte & (1 << bit)) == (1 << bit));
}
The above code will simply print each byte from the one at lower address to the one with higher address.
Edit: since it seems you're just interested in how many 1s and 0s are there (i.e. the order totally doesn't matter), in this specific instance I agree with the other answers and I would also just go for a counting solution
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int j = 0; j < CHAR_BIT; ++j) {
std::cout << (byte & 0x1);
byte >>= 1;
}
}

Bitwise shift of buffer in CUDA [duplicate]

What is the best way to implement a bitwise memmove? The method should take an additional destination and source bit-offset and the count should be in bits too.
I saw that ARM provides a non-standard _membitmove, which does exactly what I need, but I couldn't find its source.
Bind's bitset includes isc_bitstring_copy, but it's not efficient
I'm aware that the C standard library doesn't provide such a method, but I also couldn't find any third-party code providing a similar method.
Assuming "best" means "easiest", you can copy bits one by one. Conceptually, an address of a bit is an object (struct) that has a pointer to a byte in memory and an index of a bit in the byte.
struct pointer_to_bit
{
uint8_t* p;
int b;
};
void membitmovebl(
void *dest,
const void *src,
int dest_offset,
int src_offset,
size_t nbits)
{
// Create pointers to bits
struct pointer_to_bit d = {dest, dest_offset};
struct pointer_to_bit s = {src, src_offset};
// Bring the bit offsets to range (0...7)
d.p += d.b / 8; // replace division by right-shift if bit offset can be negative
d.b %= 8; // replace "%=8" by "&=7" if bit offset can be negative
s.p += s.b / 8;
s.b %= 8;
// Determine whether it's OK to loop forward
if (d.p < s.p || d.p == s.p && d.b <= s.b)
{
// Copy bits one by one
for (size_t i = 0; i < nbits; i++)
{
// Read 1 bit
int bit = (*s.p >> s.b) & 1;
// Write 1 bit
*d.p &= ~(1 << d.b);
*d.p |= bit << d.b;
// Advance pointers
if (++s.b == 8)
{
s.b = 0;
++s.p;
}
if (++d.b == 8)
{
d.b = 0;
++d.p;
}
}
}
else
{
// Copy stuff backwards - essentially the same code but ++ replaced by --
}
}
If you want to write a version optimized for speed, you will have to do copying by bytes (or, better, words), unroll loops, and handle a number of special cases (memmove does that; you will have to do more because your function is more complicated).
P.S. Oh, seeing that you call isc_bitstring_copy inefficient, you probably want the speed optimization. You can use the following idea:
Start copying bits individually until the destination is byte-aligned (d.b == 0). Then, it is easy to copy 8 bits at once, doing some bit twiddling. Do this until there are less than 8 bits left to copy; then continue copying bits one by one.
// Copy 8 bits from s to d and advance pointers
*d.p = *s.p++ >> s.b;
*d.p++ |= *s.p << (8 - s.b);
P.P.S Oh, and seeing your comment on what you are going to use the code for, you don't really need to implement all the versions (byte/halfword/word, big/little-endian); you only want the easiest one - the one working with words (uint32_t).
Here is a partial implementation (not tested). There are obvious efficiency and usability improvements.
Copy n bytes from src to dest (not overlapping src), and shift bits at dest rightwards by bit bits, 0 <= bit <= 7. This assumes that the least significant bits are at the right of the bytes
void memcpy_with_bitshift(unsigned char *dest, unsigned char *src, size_t n, int bit)
{
int i;
memcpy(dest, src, n);
for (i = 0; i < n; i++) {
dest[i] >> bit;
}
for (i = 0; i < n; i++) {
dest[i+1] |= (src[i] << (8 - bit));
}
}
Some improvements to be made:
Don't overwrite first bit bits at beginning of dest.
Merge loops
Have a way to copy a number of bits not divisible by 8
Fix for >8 bits in a char

Gather bits at specific positions into a new value

I have a bit-mask of N chars in size, which is statically known (i.e. can be calculated at compile time, but it's not a single constant, so I can't just write it down), with bits set to 1 denoting the "wanted" bits. And I have a value of the same size, which is only known at runtime. I want to collect the "wanted" bits from that value, in order, into the beginning of a new value. For simplicity's sake let's assume the number of wanted bits is <= 32.
Completely unoptimized reference code which hopefully has the correct behaviour:
template<int N, const char mask[N]>
unsigned gather_bits(const char* val)
{
unsigned result = 0;
char* result_p = (char*)&result;
int pos = 0;
for (int i = 0; i < N * CHAR_BIT; i++)
{
if (mask[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (val[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (pos < sizeof(unsigned) * CHAR_BIT)
{
result_p[pos/CHAR_BIT] |= 1 << (pos % CHAR_BIT);
}
else
{
abort();
}
}
pos += 1;
}
}
return result;
}
Although I'm not sure whether that formulation actually allows access to the contents of the mask at compile time. But in any case, it's available for use, maybe a constexpr function or something would be a better idea. I'm not looking here for the necessary C++ wizardry (I'll figure that out), just the algorithm.
An example of input/output, with 16-bit values and imaginary binary notation for clarity:
mask = 0b0011011100100110
val = 0b0101000101110011
--
wanted = 0b__01_001__1__01_ // retain only those bits which are set in the mask
result = 0b0000000001001101 // bring them to the front
^ gathered bits begin here
My questions are:
What's the most performant way to do this? (Are there any hardware instructions that can help?)
What if both the mask and the value are restricted to be unsigned, so a single word, instead of an unbounded char array? Can it then be done with a fixed, short sequence of instructions?
There will pext (parallel bit extract) that does exactly what you want in Intel Haswell. I don't know what the performance of that instruction will be, probably better than the alternatives though. This operation is also known as "compress-right" or simply "compress", the implementation from Hacker's Delight is this:
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}

Bit packing of array of integers

I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).