C++ BigInt multiplication conceptual problem - c++

I'm building a small BigInt library in C++ for use in my programming language.
The structure is like the following:
short digits[ 1000 ];
int len;
I have a function that converts a string into a bigint by splitting it up into single chars and putting them into digits.
The numbers in digits are all reversed, so the number 123 would look like the following:
digits[0]=3 digits[1]=3 digits[2]=1
I have already managed to code the adding function, which works perfectly.
It works somewhat like this:
overflow = 0
for i ++ until length of both numbers exceeded:
add numberA[ i ] to numberB[ i ]
add overflow to the result
set overflow to 0
if the result is bigger than 10:
substract 10 from the result
overflow = 1
put the result into numberReturn[ i ]
(Overflow is in this case what happens when I add 1 to 9: Substract 10 from 10, add 1 to overflow, overflow gets added to the next digit)
So think of how two numbers are stored, like those:
0 | 1 | 2
---------
A 2 - -
B 0 0 1
The above represents the digits of the bigints 2 (A) and 100 (B).
- means uninitialized digits, they aren't accessed.
So adding the above number works fine: start at 0, add 2 + 0, go to 1, add 0, go to 2, add 1
But:
When I want to do multiplication with the above structure, my program ends up doing the following:
Start at 0, multiply 2 with 0 (eek), go to 1, ...
So it is obvious that, for multiplication, I have to get an order like this:
0 | 1 | 2
---------
A - - 2
B 0 0 1
Then, everything would be clear: Start at 0, multiply 0 with 0, go to 1, multiply 0 with 0, go to 2, multiply 1 with 2
How can I manage to get digits into the correct form for multiplication?
I don't want to do any array moving/flipping - I need performance!

Why are you using short to store digits in the [0..9] a char would suffice
You're thinking incorrectly about the multiplication. In the case of multiplication you need a double for loop that multiplies B with each digit in A and sums them up shifted with the correct power of ten.
EDIT: Since some anonymous downvoted this without a comment this is basically the multiplication algorithm:
bigint prod = 0
for i in A
prod += B * A[i] * (10 ^ i)
The multiplication of B with A[i] is done by an extra for loop where you also keep track of the carry. The (10 ^ i) is achieved by offseting the destination indices since bigint is in base 10.

Your example in the question is over-engineering at its best in my opinion. Your approach will end up even slower than normal long multiplication, because of the shear number of multiplications and additions involved. Don't limit yourself to working at one base digit at a time when you can multiply approximately 9 at a time!. Convert the base10 string to a hugeval, and then do operations on it. Don't do operations directly on the string. You will go crazy. Here is some code which demonstrates addition and multiplication. Change M to use a bigger type. You could also use std::vector, but then you miss out on some optimizations.
#include <iostream>
#include <string>
#include <algorithm>
#include <sstream>
#include <cstdlib>
#include <cstdio>
#include <iomanip>
#ifdef _DEBUG
#include <assert.h>
#define ASSERT(x) assert(x)
#else
#define ASSERT(x)
#endif
namespace Arithmetic
{
const int M = 64;
const int B = (M-1)*32;
struct Flags
{
Flags() : C(false),Z(false),V(false),N(false){}
void Clear()
{
C = false;
Z = false;
V = false;
N = false;
}
bool C,Z,V,N;
};
static unsigned int hvAdd(unsigned int a, unsigned int b, Flags& f)
{
unsigned int c;
f.Clear();
//b = -(signed)b;
c = a + b;
f.N = (c >> 31UL) & 0x1;
f.C = (c < a) && (c < b);
f.Z = !c;
f.V = (((signed)a < (signed)b) != f.N);
return c;
}
static unsigned int hvSub(unsigned int a, unsigned int b, Flags& f)
{
unsigned int c;
f.Clear();
c = a - b;
//f.N = ((signed)c < 0);
f.N = (c >> 31UL) & 0x1;
f.C = (c < a) && (c < b);
f.Z = !c;
f.V = (((signed)a < (signed)b) != f.N);
return c;
}
struct HugeVal
{
HugeVal()
{
std::fill(part, part + M, 0);
}
HugeVal(const HugeVal& h)
{
std::copy(h.part, h.part + M, part);
}
HugeVal(const std::string& str)
{
Flags f;
unsigned int tmp = 0;
std::fill(part, part + M, 0);
for(unsigned int i=0; i < str.length(); ++i){
unsigned int digit = (unsigned int)str[i] - 48UL;
unsigned int carry_last = 0;
unsigned int carry_next = 0;
for(int i=0; i<M; ++i){
tmp = part[i]; //the value *before* the carry add
part[i] = hvAdd(part[i], carry_last, f);
carry_last = 0;
if(f.C)
++carry_last;
for(int j=1; j<10; ++j){
part[i] = hvAdd(part[i], tmp, f);
if(f.C)
++carry_last;
}
}
part[0] = hvAdd(part[0], digit, f);
int index = 1;
while(f.C && index < M){
part[index] = hvAdd(part[index], 1, f);
++index;
}
}
}
/*
HugeVal operator= (const HugeVal& h)
{
*this = HugeVal(h);
}
*/
HugeVal operator+ (const HugeVal& h) const
{
HugeVal tmp;
Flags f;
int index = 0;
unsigned int carry_last = 0;
for(int j=0; j<M; ++j){
if(carry_last){
tmp.part[j] = hvAdd(tmp.part[j], carry_last, f);
carry_last = 0;
}
tmp.part[j] = hvAdd(tmp.part[j], part[j], f);
if(f.C)
++carry_last;
tmp.part[j] = hvAdd(tmp.part[j], h.part[j], f);
if(f.C)
++carry_last;
}
return tmp;
}
HugeVal operator* (const HugeVal& h) const
{
HugeVal tmp;
for(int j=0; j<M; ++j){
unsigned int carry_next = 0;
for(int i=0;i<M; ++i){
Flags f;
unsigned int accum1 = 0;
unsigned int accum2 = 0;
unsigned int accum3 = 0;
unsigned int accum4 = 0;
/* Split into 16-bit values */
unsigned int j_LO = part[j]&0xFFFF;
unsigned int j_HI = part[j]>>16;
unsigned int i_LO = h.part[i]&0xFFFF;
unsigned int i_HI = h.part[i]>>16;
size_t index = i+j;
size_t index2 = index+1;
/* These multiplications are safe now. Can't overflow */
accum1 = j_LO * i_LO;
accum2 = j_LO * i_HI;
accum3 = j_HI * i_LO;
accum4 = j_HI * i_HI;
if(carry_next){ //carry from last iteration
accum1 = hvAdd(accum1, carry_next, f); //add to LSB
carry_next = 0;
if(f.C) //LSB produced carry
++carry_next;
}
/* Add the lower 16-bit parts of accum2 and accum3 to accum1 */
accum1 = hvAdd(accum1, (accum2 << 16), f);
if(f.C)
++carry_next;
accum1 = hvAdd(accum1, (accum3 << 16), f);
if(f.C)
++carry_next;
if(carry_next){ //carry from LSB
accum4 = hvAdd(accum4, carry_next, f); //add to MSB
carry_next = 0;
ASSERT(f.C == false);
}
/* Add the higher 16-bit parts of accum2 and accum3 to accum4 */
/* Can't overflow */
accum4 = hvAdd(accum4, (accum2 >> 16), f);
ASSERT(f.C == false);
accum4 = hvAdd(accum4, (accum3 >> 16), f);
ASSERT(f.C == false);
if(index < M){
tmp.part[index] = hvAdd(tmp.part[index], accum1, f);
if(f.C)
++carry_next;
}
carry_next += accum4;
}
}
return tmp;
}
void Print() const
{
for(int i=(M-1); i>=0; --i){
printf("%.8X", part[i]);
}
printf("\n");
}
unsigned int part[M];
};
}
int main(int argc, char* argv[])
{
std::string a1("273847238974823947823941");
std::string a2("324230432432895745949");
Arithmetic::HugeVal a = a1;
Arithmetic::HugeVal b = a2;
Arithmetic::HugeVal d = a + b;
Arithmetic::HugeVal e = a * b;
a.Print();
b.Print();
d.Print();
e.Print();
system("pause");
}

Andreas is right, that you have to multiply one number by each digit of the other and sum the results accordingly. It is better to multiply a longer number by digits of the shorter one I think. If You store decimal digits in Your array char would indeed suffice, but if you want performance, maybe you should consider bigger type. I don't know what Your platform is, but in case of x86 for example You can use 32 bit ints and hardware support for giving 64 bit result of 32 bit multiplication.

Alright seeing that this question is answered almost 11 years ago, I figure I'll provide some pointers for the one who is writing their own BigInt library.
First off, if what you want is purely performance instead of studying how to actually write performant code, please just learn how to use GMP or OpenSSL. There is a really really steep learning curve to reach the level of GMP's performance.
Ok, let's get right into it.
Don't use base 10 when you can use a bigger base.
CPUs are god-level good at addition, subtraction, multiplication, and division, so take advantage of them.
Suppose you have two BigInt
a = {9,7,4,2,6,1,6,8} // length 8
b = {3,6,7,2,4,6,7,8} // length 8
// Frustrating writing for-loops to calculate a*b
Don't make them do 50 calculations in base 10 when they could do 1 calculations of base 2^32:
a = {97426168}
b = {36724678}
// Literally only need to type a*b
If the biggest number your computer can represent is 2^64-1, use 2^32-1 as the base for your BigInt, as it will solve the problem of actually overflowing when doing multiplication.
Use a structure that supports dynamic memory. Scaling your program to handle the multiplication of two 1-million digits numbers would probably break you program since it doesn't have enough memory on the stack. Use a std::vector instead of std::array or raw int[] in C to make use of your memory.
Learn about SIMD to give your calculation a boost in performance. Typical loops in noobs' codes can't process multiple data at the same time. Learning this should speed things up from 3 to 12 times.
Learn how to write your own memory allocators. If you use std::vector to store your unsigned integers, chances are, later on, you'll suffer performance problems as std::vector is only for general purposes only. Try to tailor your allocator to your own need to avoid allocating and reallocating every time a calculation is performed.
Learn about your computer's architecture and memory layout. Writing your own assembly code to fit certain CPU architecture would certainly boost your performance. This helps with writing your own memory allocator and SIMD too.
Algorithms. For small BigInt you can rely on your grade school multiplication but as the input grows, certainly take a good look at Karatsuba, Toom-Cook, and finally FFT to implement in your library.
If you're stuck, please visit my BigInt library. It doesn't have custom allocator, SIMD code or custom assembly code, but for starters of BigInteger it should be enough.

I'm building a small BigInt library in C++ for use in my programming language.
Why? There are some excellent existing bigint libraries out there (e.g., gmp, tommath) that you can just use without having to write your own from scratch. Making your own is a lot of work, and the result is unlikely to be as good in performance terms. (In particular, writing fast code to perform multiplies and divides is quite a lot trickier than it appears at first glance.)

Related

Big integer numbers & C

I am writing a program which generates big integer numbers, saves them in an array, and does some basic operations such as multiply or add.
I'm really worried about the performance of the actual code and would like tips or improvements to make it faster. Any suggestion is welcome, even if it changes my whole program or data types.
I will add below some piece of code, in order that you can see the structures that I am using and how I'm trying to deal with this B.I.N.:
unsigned int seed;
void initCharArray( char *L, unsigned N )
{
for ( int i=0; i< N; i++ )
{
L[i] = i%50;
}
}
char Addition( char *Vin1, char *Vin2, char *Vout, unsigned N )
{
char CARRY = 0;
for ( int i=0; i< N; i++ )
{
char R = Vin1[i] + Vin2[i] + CARRY;
if ( R <= 9 )
{
Vout[i] = R; CARRY = 0;
}
else
{
Vout[i] = R-10; CARRY = 1;
}
}
return CARRY;
}
int main(int argc, char **argv)
{
int N=10000;
unsigned char *V1=new int[N];
unsigned char *V2=new int[N];
unsigned char *V3=new int[N];
initCharArray(V1,N); initCharArray(V2,N);
Addition(V1,V2,V3,N);
}
Since modern possessors are highly efficient when dealing with fixed bit length numbers why don't you have an array of them?
Suppose you use unsigned long long. They should be 64 bits width, so max possible unsigned long long should be 2^64 - 1. Lets represent any number as a collection of numbers as:
-big_num = ( n_s, n_0, n_1, ...)
-n_s will take only 0 and 1 to represent + and - sign
-n_0 will represent number between 0 and 10^a -1 (exponent a to be determent)
-n_1 will represent number between 10^a and 10^(a+1) -1
and so on, and so on ...
DETERMINING a:
All n_ MUST be bounded by 10^a-1. Thus when adding two big_num this means we need to add the n_ as follow:
// A + B = ( (to be determent later),
// bound(n_A_1 + n_B_1) and carry to next,
// bound(n_A_2 + n_B_2 + carry) and carry to next,
// ...)
The bounding can be done as:
bound(n_A_i + n_B_i + carry) = (n_A_i + n_B_i + carry)%(10^a)
Therefore the carry to i+1 is determined as:
// carry (to be used in i+1) = (n_A_i + n_B_i + carry)/(10^a)
// (division of unsigned in c++ will floor the result by construction)
This tell us that the worst case is carry = 10^a -1, and thus the worst addition (n_A_i + n_B_i + carry) is:
(worst case) (10^a-1) + (10^a-1) + (10^a-1) = 3*(10^a-1)
Since type is unsigned long long if we don't want to have overflow on this addition we must bound our exponent a such that:
// 3*(10^a-1) <= 2^64 - 1, and a an positive integer
// => a <= floor( Log10((2^64 - 1)/3 + 1) )
// => a <= 18
So this has now fixed are maximum possible a=18 and thus the biggest possible n_ represented with unsigned long long is 10^18 -1 = 999,999,999,999,999,999. With this basic set up lets now get to some actual code. For now I will use std::vector to hold the big_num we discussed, but this can change:
// Example code with unsigned long long
#include <cstdlib>
#include <vector>
//
// FOR NOW BigNum WILL BE REPRESENTED
// BY std::vector. YOU CAN CHANGE THIS LATTER
// DEPENDING ON WHAT OPTIMIZATIONS YOU WANT
//
using BigNum = std::vector<unsigned long long>;
// suffix ULL garanties number be interpeted as unsigned long long
#define MAX_BASE_10 999999999999999999ULL
// random generate big number
void randomize_BigNum(BigNum &a){
// assuming MyRandom() returns a random number
// of type unsigned long long
for(size_t i=1; i<a.size(); i++)
a[i] = MyRandom()%(MAX_NUM_BASE_10+1); // cap the numbers
}
// wrapper functions
void add(const BigNum &a, const BigNum &b, BigNum &c); // c = a + b
void add(const BigNum &a, BigNum &b); // b = a + b
// actual work done here
void add_equal_size(const BigNum &a, const BigNum &b, BigNum &c, size_t &N);
void add_equal_size(const BigNum &a, const BigNum &b, size_t &N);
void blindly_add_one(BigNum &c);
// Missing cases
// void add_equal_size(BigNum &a, BigNum &b, BigNum &c, size_t &Na, size_t &Nb);
// void add_equal_size(BigNum &a, BigNum &b, size_t &Na, size_t &Nb);
int main(){
size_t n=10;
BigNum a(n), b(n), c(n);
randomize_BigNum(a);
randomize_BigNum(b);
add(a,b,c);
return;
}
The wrapper functions should look as follows. They will safe guard against incorrect size of array calls:
// To do: add support for when size of a,b,c not equal
// c = a + b
void add(const BigNum &a, const BigNum &b, BigNum &c){
c.resize(std::max(a.size(),b.size()));
if(a.size()==b.size())
add_equal_size(a,b,c,a.size());
else
// To do: add_unequal_size(a,b,c,a.size(),b.size());
return;
};
// b = a + b
void add(const BigNum &a, const BigNum &b){
if(a.size()==b.size())
add_equal_size(a,b,a.size());
else{
b.resize(a.size());
// To do: add_unequal_size(a,b,a.size());
}
return;
};
The main grunt of the work will be done here (which you can call directly and skip a function call, if you know what you are doing):
// If a,b,c same size array
// c = a + b
void add_equal_size(const BigNum &a, const BigNum &b, BigNum &c, const size_t &N){
// start with sign of c is sign of a
// Specific details follow on whether I need to flip the
// sign or not
c[0] = a[0];
unsigned long long carry=0;
// DISTINGUISH TWO GRAND CASES:
//
// a and b have the same sign
// a and b have oposite sign
// no need to check which has which sign (details follow)
//
if(a[0]==b[0]){// if a and b have the same sign
//
// This means that either +a+b or -a-b=-(a+b)
// In both cases I just need to add two numbers a and b
// and I already got the sign of the result c correct form the
// start
//
for(size_t i=1; i<N;i++){
c[i] = (a[i] + b[i] + carry)%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){// if carry>0 then I need to extend my array to fit the final carry
c.resize(N+1);
c[N]=carry;
}
}
else{// if a and b have opposite sign
//
// If I have opposite sign then I am subtracting the
// numbers. The following is inspired by how
// you can subtract two numbers with bitwise operations.
for(size_t i=1; i<N;i++){
c[i] = (a[i] + (MAX_BASE_10 - b[i]) + carry)%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){ // I carried? then I got the sign right from the start
// just add 1 and I am done
blindly_add_one(c);
}
else{ // I didn't carry? then I got the sign wrong from the start
// flip the sign
c[0] ^= 1ULL;
// and take the compliment
for(size_t i=1; i;<N;i++)
c[i] = MAX_BASE_10 - c[i];
}
}
return;
};
A few details about the // if a and b have opposite sign case follow:
Lets work in base 10. Lets say we are subtracting a - b Lets convert this to an addition. Define the following operation:
Lets name the base 10 digits of a number di. Then any number is n = d1 + 10*d2 + 10*10*d3... The compliment of a digit will now be defined as:
`compliment(d1) = 9-d1`
Then the compliment of a number n is:
compliment(n) = compliment(d1)
+ 10*compliment(d2)
+ 10*10*compliment(d3)
...
Consider two case, a>b and a<b:
EXAMPLE OF a>b: lest say a=830 and b=126. Do the following 830 - 126 -> 830 + compliment(126) = 830 + 873 = 1703 ok so if a>b, I drop the 1, and add 1 the result is 704!
EXAMPLE OF a<b: lest say a=126 and b=830. Do the following 126 - 830 -> 126 + compliment(830) = 126 + 169 = 295 ...? Well what if I compliment it? compliment(295) = 704 !!! so if a<b I already have the result... with opposite sign.
Going to our case, since each number in the array is bounded by MAX_BASE_10 the compliment of our numbers is
compliment(n) = MAX_BASE_10 - n
So using this compliment to convert subtraction to addition
I only need to pay attention to if I carried an extra 1 at
the end of the addition (the a>b case). The algorithm now is
FOR EACH ARRAY subtraction (ith iteration):
na_i - nb_i + carry(i-1)
convert -> na_i + compliment(nb_i) + carry(i-1)
bound the result -> (na_i + compliment(nb_i) + carry(i-1))%MAX_BASE_10
find the carry -> (na_i + compliment(nb_i) + carry(i-1))/MAX_BASE_10
keep on adding the array numbers...
At the end of the array if I carried, forget the carry
and add 1. Else take the compliment of the result
This "and add one" is done by yet another function:
// Just add 1, no matter the sign of c
void blindly_add_one(BigNum &c){
unsigned long long carry=1;
for(size_t i=1; i<N;i++){
c[i] = carry%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){ // if carry>0 then I need to extend my basis to fit the number
c.resize(N+1);
c[N]=carry;
}
};
Good up to here. Specifically in this code don't forget that at the start of the function we set the sign of c to the sign of a. So if I carry at the end, that means I had |a|>|b| and I did either +a-b>0 or -a+b=-(a-b)<0. In either case setting the results c sign to a sign was correct. If I don't carry I had |a|<|b| with either +a-b<0 or -a+b=-(a-b)>0. In either case setting the results c sign to a sign was INCORRECT so I need to flip the sign if I don't carry.
The following functions opperates the same way as the above one, only rather than do c = a + b it dose b = a + b
// same logic as above, only b = a + b
void add_equal_size(BigNum &a, BigNum &b, size_t &N){
unsigned long long carry=0;
if(a[0]==b[0]){// if a and b have the same sign
for(size_t i=1; i<N;i++){
b[i] = (a[i] + b[i] + carry)%(MAX_BASE_10+1);
carry = b[i]/(MAX_BASE_10+1);
}
if(carry){// if carry>0 then I need to extend my basis to fit the number
b.resize(N+1);
b[N]=carry;
}
}
else{ // if a and b have oposite sign
b[0] = a[0];
for(size_t i=1; i<N;i++){
b[i] = (a[i] + (MAX_BASE_10 - b[i]) + carry)%(MAX_BASE_10+1);
carry = b[i]/(MAX_BASE_10+1);
}
if(carry){
add_one(b);
}
else{
b[0] ^= 1ULL;
for(size_t i=1; i;<N;i++)
b[i] = MAX_BASE_10 - b[i];
}
}
return;
};
And that is a basic set up on how you could use unsigned numbers in arrays to represent very large integers.
WHERE TO GO FROM HERE
Their are many thing to do from here on out to optimise the code, I will mention a few I could think of:
-Try and replace addition of arrays with possible BLAS calls
-Make sure you are taking advantage of vectorization. Depending on how you write your loops you may or may not be generating vectorized code. If your arrays become big you may benefit from this.
-In the spirit of the above make sure you have properly aligned arrays in memory to actually take advantage of vectorization. From my understanding std::vector dose not guaranty alignment. Neither dose a blind malloc. I think boost libraries have a vector version where you can declare a fixed alignment in which case you can ask for a 64bit aligned array for your unsigned long long array. Another option is to have your own class that manages a raw pointer and dose aligned allocations with a custom alocator. Borrowing aligned_malloc and aligned_free from https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/ you could have a class like this to replace std::vector:
// aligned_malloc and aligned_free from:
// https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/
// wrapping in absolutly minimal class to handle
// memory allocation and freeing
class BigNum{
private:
unsigned long long *ptr;
size_t size;
public:
BigNum() : ptr(nullptr)
, size(0)
{};
BigNum(const size_t &N) : ptr(nullptr)
, size(N)
{
resize(N);
}
// Defining destructor this will now delete copy and move constructor and assignment. Make your own if you need them
~BigNum(){
aligned_free(ptr);
}
// Access an object in aligned storage
const unsigned long long& operator[](std::size_t pos) const{
return *reinterpret_cast<const unsigned long long*>(&ptr[pos]);
}
// return my size
void size(){
return size;
}
// resize memory allocation
void resize(const size_t &N){
size = N;
if(N){
void* temp = aligned_malloc(ptr,N+1); // N+1, always keep first entry for the sign of BigNum
if(temp!=nullptr)
ptr = static_cast<unsigned long long>(temp);
else
throw std::bad_alloc();
}
else{
aligned_free(ptr);
}
}
};

Processing of integers on different CPUs

My task is to design a function that fulfils those requirements:
Function shall sum members of given one-dimensional array. However, it should sum only members whose number of ones in the binary representation is higher than defined threshold (e.g. if the threshold is 4, number 255 will be counted and 15 will not)
The array length is arbitrary
The function shall utilize as little memory as possible and shall be written in an efficient way
The production function code (‘sum_filtered(){..}’) shall not use any standard C library functions (or any other libraries)
The function shall return 0 on success and error code on error
The array elements are of a type 16-bit signed integer and an overflow during calculation shall be regarded as a failure
Use data types that ensure portability between different CPUs (so the calculations will be the same on 8/16/32-bit MCU)
The function code should contain a reasonable amount of comments in doxygen annotation
Here is my solution:
#include <iostream>
using namespace std;
int sum_filtered(short array[], int treshold)
{
// return 1 if invalid input parameters
if((treshold < 0) || (treshold > 16)){return(1);}
int sum = 0;
int bitcnt = 0;
for(int i=0; i < sizeof(array); i++)
{
// Count one bits of integer
bitcnt = 0;
for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}
// Add integer to sum if bitcnt>treshold
if(bitcnt>treshold){sum += array[i];}
}
return(0);
}
int main()
{
short array[5] = {15, 2652, 14, 1562, -115324};
int result = sum_filtered(array, 14);
cout << result << endl;
short array2[5] = {15, 2652, 14, 1562, 15324};
result = sum_filtered(array2, -2);
cout << result << endl;
}
However I'm not sure whether this code is portable between different CPUs.
And I don't how can an overflow occur during calculation and what can be other errors during processing of arrays with this function.
Can somebody more experienced give me his opinion?
Well, I can foresee one problem:
for(int i=0; i < sizeof(array); i++)
array in this context is a pointer, so will likely be 4 on 32bit systems, or 8 on 64bit systems. You really do want to be passing a count variable (in this case 5) into the sum_filtered function (and then you can pass the count as sizeof(array) / sizeof(short)).
Anyhow, this code:
// Count one bits of integer
bitcnt = 0;
for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}
Effectively you are doing a popcount here (which can be done using __builtin_popcount on gcc/clang, or __popcnt on MSVC. They are compiler specific, but usually boil down to a single popcount CPU instruction on most CPUs).
If you do want to do this the slow way, then an efficient approach is to treat the computation as a form of bitwise SIMD operation:
#include <cstdint> // or stdint.h if you have a rubbish compiler :)
uint16_t popcount(uint16_t s)
{
// perform 8x 1bit adds
uint16_t a0 = s & 0x5555;
uint16_t b0 = (s >> 1) & 0x5555;
uint16_t s0 = a0 + b0;
// perform 4x 2bit adds
uint16_t a1 = s0 & 0x3333;
uint16_t b1 = (s0 >> 2) & 0x3333;
uint16_t s1 = a1 + b1;
// perform 2x 4bit adds
uint16_t a2 = s1 & 0x0F0F;
uint16_t b2 = (s1 >> 4) & 0x0F0F;
uint16_t s2 = a2 + b2;
// perform 1x 8bit adds
uint16_t a3 = s2 & 0x00FF;
uint16_t b3 = (s2 >> 8) & 0x00FF;
return a3 + b3;
}
I know it says you can't use stdlib functions (your 4th point), but that shouldn't apply to the standardised integer types surely? (e.g. uint16_t) If it does, well then there is no way to guarantee portability across platforms. You're out of luck.
Personally I'd just use a 64bit integer for the sum. That should reduce the risk of any overflows *(i.e. if the threshold is zero, and all the values are -128, then you'd overflow if the array size exceeded 0x1FFFFFFFFFFFF elements (562,949,953,421,311 in decimal).
#include <cstdint>
int64_t sum_filtered(int16_t array[], uint16_t threshold, size_t array_length)
{
// changing the type on threshold to be unsigned means we don't need to test
// for negative numbers.
if(threshold > 16) { return 1; }
int64_t sum = 0;
for(size_t i=0; i < array_length; i++)
{
if (popcount(array[i]) > threshold)
{
sum += array[i];
}
}
return sum;
}

print fibo big numbers in c++ or c language

I write this code for show fibonacci series using recursion.But It not show correctly for n>43 (ex: for n=100 show:-980107325).
#include<stdio.h>
#include<conio.h>
void fibonacciSeries(int);
void fibonacciSeries(int n)
{
static long d = 0, e = 1;
long c;
if (n>1)
{
c = d + e;
d = e;
e = c;
printf("%d \n", c);
fibonacciSeries(n - 1);
}
}
int main()
{
long a, n;
long long i = 0, j = 1, f;
printf("How many number you want to print in the fibonnaci series :\n");
scanf("%d", &n);
printf("\nFibonacci Series: ");
printf("%d", 0);
fibonacciSeries(n);
_getch();
return 0;
}
The value of fib(100) is so large that it will overflow even a 64 bit number. To operate on such large values, you need to do arbitrary-precision arithmetic. Arbitrary-precision arithmetic is not provided by C nor C++ standard libraries, so you'll need to either implement it yourself or use a library written by someone else.
For smaller values that do fit your long long, your problem is that you use the wrong printf format specifier. To print a long long, you need to use %lld.
Code overflows the range of the integer used long.
Could use long long, but even that may not handle Fib(100) which needs at least 69 bits.
Code could use long double if 1.0/LDBL_EPSILON > 3.6e20
Various libraries exist to handle very large integers.
For this task, all that is needed is a way to add two large integers. Consider using a string. An inefficient but simply string addition follows. No contingencies for buffer overflow.
#include <stdio.h>
#include <string.h>
#include <assert.h>
char *str_revese_inplace(char *s) {
char *left = s;
char *right = s + strlen(s);
while (right > left) {
right--;
char t = *right;
*right = *left;
*left = t;
left++;
}
return s;
}
char *str_add(char *ssum, const char *sa, const char *sb) {
const char *pa = sa + strlen(sa);
const char *pb = sb + strlen(sb);
char *psum = ssum;
int carry = 0;
while (pa > sa || pb > sb || carry) {
int sum = carry;
if (pa > sa) sum += *(--pa) - '0';
if (pb > sb) sum += *(--pb) - '0';
*psum++ = sum % 10 + '0';
carry = sum / 10;
}
*psum = '\0';
return str_revese_inplace(ssum);
}
int main(void) {
char fib[3][300];
strcpy(fib[0], "0");
strcpy(fib[1], "1");
int i;
for (i = 2; i <= 1000; i++) {
printf("Fib(%3d) %s.\n", i, str_add(fib[2], fib[1], fib[0]));
strcpy(fib[0], fib[1]);
strcpy(fib[1], fib[2]);
}
return 0;
}
Output
Fib( 2) 1.
Fib( 3) 2.
Fib( 4) 3.
Fib( 5) 5.
Fib( 6) 8.
...
Fib(100) 3542248xxxxxxxxxx5075. // Some xx left in for a bit of mystery.
Fib(1000) --> 43466...about 200 more digits...8875
You can print some large Fibonacci numbers using only char, int and <stdio.h> in C.
There is some headers :
#include <stdio.h>
#define B_SIZE 10000 // max number of digits
typedef int positive_number;
struct buffer {
size_t index;
char data[B_SIZE];
};
Also some functions :
void init_buffer(struct buffer *buffer, positive_number n) {
for (buffer->index = B_SIZE; n; buffer->data[--buffer->index] = (char) (n % 10), n /= 10);
}
void print_buffer(const struct buffer *buffer) {
for (size_t i = buffer->index; i < B_SIZE; ++i) putchar('0' + buffer->data[i]);
}
void fly_add_buffer(struct buffer *buffer, const struct buffer *client) {
positive_number a = 0;
size_t i = (B_SIZE - 1);
for (; i >= client->index; --i) {
buffer->data[i] = (char) (buffer->data[i] + client->data[i] + a);
buffer->data[i] = (char) (buffer->data[i] - (a = buffer->data[i] > 9) * 10);
}
for (; a; buffer->data[i] = (char) (buffer->data[i] + a), a = buffer->data[i] > 9, buffer->data[i] = (char) (buffer->data[i] - a * 10), --i);
if (++i < buffer->index) buffer->index = i;
}
Example usage :
int main() {
struct buffer number_1, number_2, number_3;
init_buffer(&number_1, 0);
init_buffer(&number_2, 1);
for (int i = 0; i < 2500; ++i) {
number_3 = number_1;
fly_add_buffer(&number_1, &number_2);
number_2 = number_3;
}
print_buffer(&number_1);
}
// print 131709051675194962952276308712 ... 935714056959634778700594751875
Best C type is still char ? The given code is printing f(2500), a 523 digits number.
Info : f(2e5) has 41,798 digits, see also Factorial(10000) and Fibonacci(1000000).
Well, you could want to try implementing BigInt in C++ or C.
Useful Material:
How to implement big int in C++
For this purporse you need implement BigInteger. There is no such build-in support in current c++. You can view few advises on stack overflow
Or you also can use some libs like GMP
Also here is some implementation:
E-maxx - on Russian language description.
Or find some open implementation on GitHub
Try to use a different format and printf, use unsigned to get wider range of digits.
If you use unsigned long long you should get until 18 446 744 073 709 551 615 so until the 93th number for fibonacci serie 12200160415121876738 but after this one you will get incorrect result because the 94th number 19740274219868223167 is too big for unsigned long long.
Keep in mind that the n-th fibonacci number is (approximately) ((1 + sqrt(5))/2)^n.
This allows you to get the value for n that allows the result to fit in 32 /64 unsigned integers. For signed remember that you lose one bit.

factorial of big numbers with strings in c++

I am doing a factorial program with strings because i need the factorial of Numbers greater than 250
I intent with:
string factorial(int n){
string fact="1";
for(int i=2; i<=n; i++){
b=atoi(fact)*n;
}
}
But the problem is that atoi not works. How can i convert my string in a integer.
And The most important Do I want to know if the program of this way will work with the factorial of 400 for example?
Not sure why you are trying to use string. Probably to save some space by not using integer vector? This is my solution by using integer vector to store factorial and print.Works well with 400 or any large number for that matter!
//Factorial of a big number
#include<iostream>
#include<vector>
using namespace std;
int main(){
int num;
cout<<"Enter the number :";
cin>>num;
vector<int> res;
res.push_back(1);
int carry=0;
for(int i=2;i<=num;i++){
for(int j=0;j<res.size();j++){
int tmp=res[j]*i;
res[j]=(tmp+carry)%10 ;
carry=(tmp+carry)/10;
}
while(carry!=0){
res.push_back(carry%10);
carry=carry/10;
}
}
for(int i=res.size()-1;i>=0;i--) cout<<res[i];
cout<<endl;
return 0;
}
Enter the number :400
Factorial of 400 :64034522846623895262347970319503005850702583026002959458684445942802397169186831436278478647463264676294350575035856810848298162883517435228961988646802997937341654150838162426461942352307046244325015114448670890662773914918117331955996440709549671345290477020322434911210797593280795101545372667251627877890009349763765710326350331533965349868386831339352024373788157786791506311858702618270169819740062983025308591298346162272304558339520759611505302236086810433297255194852674432232438669948422404232599805551610635942376961399231917134063858996537970147827206606320217379472010321356624613809077942304597360699567595836096158715129913822286578579549361617654480453222007825818400848436415591229454275384803558374518022675900061399560145595206127211192918105032491008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
There's a web site that will calculate factorials for you: http://www.nitrxgen.net/factorialcalc.php. It reports:
The resulting factorial of 250! is 493 digits long.
The result also contains 62 trailing zeroes (which constitutes to 12.58% of the whole number)
3232856260909107732320814552024368470994843717673780666747942427112823747555111209488817915371028199450928507353189432926730931712808990822791030279071281921676527240189264733218041186261006832925365133678939089569935713530175040513178760077247933065402339006164825552248819436572586057399222641254832982204849137721776650641276858807153128978777672951913990844377478702589172973255150283241787320658188482062478582659808848825548800000000000000000000000000000000000000000000000000000000000000
Many systems using C++ double only work up to 1E+308 or thereabouts; the value of 250! is too large to store in such numbers.
Consequently, you'll need to use some sort of multi-precision arithmetic library, either of your own devising using C++ string values, or using some other widely-used multi-precision library (GNU GMP for example).
The code below uses unsigned double long to calculate very large digits.
#include<iostream.h>
int main()
{
long k=1;
while(k!=0)
{
cout<<"\nLarge Factorial Calculator\n\n";
cout<<"Enter a number be calculated:";
cin>>k;
if (k<=33)
{
unsigned double long fact=1;
fact=1;
for(int b=k;b>=1;b--)
{
fact=fact*b;
}
cout<<"\nThe factorial of "<<k<<" is "<<fact<<"\n";
}
else
{
int numArr[10000];
int total,rem=0,count;
register int i;
//int i;
for(i=0;i<10000;i++)
numArr[i]=0;
numArr[10000]=1;
for(count=2;count<=k;count++)
{
while(i>0)
{
total=numArr[i]*count+rem;
rem=0;
if(total>9)
{
numArr[i]=total%10;
rem=total/10;
}
else
{
numArr[i]=total;
}
i--;
}
rem=0;
total=0;
i=10000;
}
cout<<"The factorial of "<<k<<" is \n\n";
for(i=0;i<10000;i++)
{
if(numArr[i]!=0 || count==1)
{
cout<<numArr[i];
count=1;
}
}
cout<<endl;
}
cout<<"\n\n";
}//while
return 0;
}
Output:
![Large Factorial Calculator
Enter a number be calculated:250
The factorial of 250 is
32328562609091077323208145520243684709948437176737806667479424271128237475551112
09488817915371028199450928507353189432926730931712808990822791030279071281921676
52724018926473321804118626100683292536513367893908956993571353017504051317876007
72479330654023390061648255522488194365725860573992226412548329822048491377217766
50641276858807153128978777672951913990844377478702589172973255150283241787320658
18848206247858265980884882554880000000000000000000000000000000000000000000000000
000000000000][1]
You can make atoi compile by adding c_str(), but it will be a long way to go till getting factorial. Currently you have no b around. And if you had, you still multiply int by int. So even if you eventually convert that to string before return, your range is still limited. Until you start to actually do multiplication with ASCII or use a bignum library there's no point to have string around.
Your factorial depends on conversion to int, which will overflow pretty fast, so you want be able to compute large factorials that way. To properly implement computation on big numbers you need to implement logic as for computation on paper, rules that you were tought in primary school, but treat long long ints as "atoms", not individual digits. And don't do it on strings, it would be painfully slow and full of nasty conversions
If you are going to solve factorial for numbers larger than around 12, you need a different approach than using atoi, since that just gives you a 32-bit integer, and no matter what you do, you are not going to get more than 2 billion (give or take) out of that. Even if you double the size of the number, you'll only get to about 20 or 21.
It's not that hard (relatively speaking) to write a string multiplication routine that takes a small(ish) number and multiplies each digit and ripples the results through to the the number (start from the back of the number, and fill it up).
Here's my obfuscated code - it is intentionally written such that you can't just take it and hand in as school homework, but it appears to work (matches the number in Jonathan Leffler's answer), and works up to (at least) 20000! [subject to enough memory].
std::string operator*(const std::string &s, int x)
{
int l = (int)s.length();
std::string r;
r.resize(l);
std::fill(r.begin(), r.end(), '0');
int b = 0;
int e = ~b;
const int c = 10;
for(int i = l+e; i != e;)
{
int d = (s[i]-0x30) * x, p = i + b;
while (d && p > e)
{
int t = r[p] - 0x30 + (d % c);
r[p] = (t % c) + 0x30;
d = t / c + d / c;
p--;
}
while (d)
{
r = static_cast<char>((d % c) +0x30)+r;
d /= c;
b++;
}
i--;
}
return r;
}
In C++, the largest integer type is 'long long', and it hold 64 bits of memory, so obviously you can't store 250! in an integer type. It is a clever idea to use strings, but what you are basically doing with your code is (I have never used the atoi() function, so I don't know if it even works with strings larger than 1 character, but it doesn't matter):
covert the string to integer (a string that if this code worked well, in one moment contains the value of 249!)
multiply the value of the string
So, after you are done multiplying, you don't even convert the integer back to string. And even if you did that, at one moment when you convert the string back to an integer, your program will crash, because the integer won't be able to hold the value of the string.
My suggestion is, to use some class for big integers. Unfortunately, there isn't one available in C++, so you'll have to code it by yourself or find one on the internet. But, don't worry, even if you code it by yourself, if you think a little, you'll see it's not that hard. You can even use your idea with the strings, which, even tough is not the best approach, for this problem, will still yield the results in the desired time not using too much memory.
This is a typical high precision problem.
You can use an array of unsigned long long instead of string.
like this:
struct node
{
unsigned long long digit[100000];
}
It should be faster than string.
But You still can use string unless you are urgent.
It may take you a few days to calculate 10000!.
I like use string because it is easy to write.
#include <bits/stdc++.h>
#pragma GCC optimize (2)
using namespace std;
const int MAXN = 90;
int n, m;
int a[MAXN];
string base[MAXN], f[MAXN][MAXN];
string sum, ans;
template <typename _T>
void Swap(_T &a, _T &b)
{
_T temp;
temp = a;
a = b;
b = temp;
}
string operator + (string s1, string s2)
{
string ret;
int digit, up = 0;
int len1 = s1.length(), len2 = s2.length();
if (len1 < len2) Swap(s1, s2), Swap(len1, len2);
while(len2 < len1) s2 = '0' + s2, len2++;
for (int i = len1 - 1; i >= 0; i--)
{
digit = s1[i] + s2[i] - '0' - '0' + up; up = 0;
if (digit >= 10) up = digit / 10, digit %= 10;
ret = char(digit + '0') + ret;
}
if (up) ret = char(up + '0') + ret;
return ret;
}
string operator * (string str, int p)
{
string ret = "0", f; int digit, mul;
int len = str.length();
for (int i = len - 1; i >= 0; i--)
{
f = "";
digit = str[i] - '0';
mul = p * digit;
while(mul)
{
digit = mul % 10 , mul /= 10;
f = char(digit + '0') + f;
}
for (int j = 1; j < len - i; j++) f = f + '0';
ret = ret + f;
}
return ret;
}
int main()
{
freopen("factorial.out", "w", stdout);
string ans = "1";
for (int i = 1; i <= 5000; i++)
{
ans = ans * i;
cout << i << "! = " << ans << endl;
}
return 0;
}
Actually, I know where the problem raised At the point where we multiply , there is the actual problem ,when numbers get multiplied and get bigger and bigger.
this code is tested and is giving the correct result.
#include <bits/stdc++.h>
using namespace std;
#define mod 72057594037927936 // 2^56 (17 digits)
// #define mod 18446744073709551616 // 2^64 (20 digits) Not supported
long long int prod_uint64(long long int x, long long int y)
{
return x * y % mod;
}
int main()
{
long long int n=14, s = 1;
while (n != 1)
{
s = prod_uint64(s , n) ;
n--;
}
}
Expexted output for 14! = 87178291200
The logic should be:
unsigned int factorial(int n)
{
unsigned int b=1;
for(int i=2; i<=n; i++){
b=b*n;
}
return b;
}
However b may get overflowed. So you may use a bigger integral type.
Or you can use float type which is inaccurate but can hold much bigger numbers.
But it seems none of the built-in types are big enough.

Bit packing of array of integers

I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).