Related
I'm trying to perform Modular Exponentiation for large values (upto 64-bits) and I wrote this function for it:
uint64_t modularExp(uint64_t num, uint64_t exp, uint64_t mod)
{
string expBits = bitset<64>(exp).to_string();
expBits = expBits.substr(expBits.find("1")+1);
string operations = "";
uint64_t result = num;
for (int i = 0; i < expBits.length(); ++i)
{
result = (uint64_t)pow(result, 2) % mod;
if (expBits[i] == '1')
result = (result * num) % mod;
}
return result;
}
This works good with small numbers (8 digits or less) but for large numbers, even though they're in the 64 bit range, the result comes out wrong.
Additionally, when the value of mod exceeds 4294967296 (Max 32 bit value), the result just comes out as zero. I suspect the pow function perhaps has a role to play in this issue but I can't figure it out for sure.
Any advice would be greatly appreciated.
First of all, some general advice:
It's better not to use strings when working with integers, as operations with strings are much slower and might become a bottleneck for performance. It's also less clear what is actually being done when strings are involved.
You shouldn't use std::pow with integers, because it operates on floating-point numbers and loses precision.
For the main question, as a workaround, you can use this O(log^2(n)) solution, which should work for arguments up to 63 bits (since it only ever uses addition and multiplication by 2). Note how all that string magic is unnecessary if you just iterate over the bits in small-to-large order:
#include <cstdint>
uint64_t modular_mul(uint64_t a, uint64_t b, uint64_t mod) {
uint64_t result = 0;
for (uint64_t current_term = a; b; b >>= 1) {
if (b & 1) {
result = (result + current_term) % mod;
}
current_term = 2 * current_term % mod;
}
return result;
}
uint64_t modular_pow(uint64_t base, uint64_t exp, uint64_t mod) {
uint64_t result = 1;
for (uint64_t current_factor = base; exp; exp >>= 1) {
if (exp & 1) {
result = modular_mul(result, current_factor, mod);
}
current_factor = modular_mul(current_factor, current_factor, mod);
}
return result;
}
Also, in gcc a (non-standard) __uint128_t is available for some targets. (which can be used to replace modular_mul with normal multiplication)
I am writing a program which generates big integer numbers, saves them in an array, and does some basic operations such as multiply or add.
I'm really worried about the performance of the actual code and would like tips or improvements to make it faster. Any suggestion is welcome, even if it changes my whole program or data types.
I will add below some piece of code, in order that you can see the structures that I am using and how I'm trying to deal with this B.I.N.:
unsigned int seed;
void initCharArray( char *L, unsigned N )
{
for ( int i=0; i< N; i++ )
{
L[i] = i%50;
}
}
char Addition( char *Vin1, char *Vin2, char *Vout, unsigned N )
{
char CARRY = 0;
for ( int i=0; i< N; i++ )
{
char R = Vin1[i] + Vin2[i] + CARRY;
if ( R <= 9 )
{
Vout[i] = R; CARRY = 0;
}
else
{
Vout[i] = R-10; CARRY = 1;
}
}
return CARRY;
}
int main(int argc, char **argv)
{
int N=10000;
unsigned char *V1=new int[N];
unsigned char *V2=new int[N];
unsigned char *V3=new int[N];
initCharArray(V1,N); initCharArray(V2,N);
Addition(V1,V2,V3,N);
}
Since modern possessors are highly efficient when dealing with fixed bit length numbers why don't you have an array of them?
Suppose you use unsigned long long. They should be 64 bits width, so max possible unsigned long long should be 2^64 - 1. Lets represent any number as a collection of numbers as:
-big_num = ( n_s, n_0, n_1, ...)
-n_s will take only 0 and 1 to represent + and - sign
-n_0 will represent number between 0 and 10^a -1 (exponent a to be determent)
-n_1 will represent number between 10^a and 10^(a+1) -1
and so on, and so on ...
DETERMINING a:
All n_ MUST be bounded by 10^a-1. Thus when adding two big_num this means we need to add the n_ as follow:
// A + B = ( (to be determent later),
// bound(n_A_1 + n_B_1) and carry to next,
// bound(n_A_2 + n_B_2 + carry) and carry to next,
// ...)
The bounding can be done as:
bound(n_A_i + n_B_i + carry) = (n_A_i + n_B_i + carry)%(10^a)
Therefore the carry to i+1 is determined as:
// carry (to be used in i+1) = (n_A_i + n_B_i + carry)/(10^a)
// (division of unsigned in c++ will floor the result by construction)
This tell us that the worst case is carry = 10^a -1, and thus the worst addition (n_A_i + n_B_i + carry) is:
(worst case) (10^a-1) + (10^a-1) + (10^a-1) = 3*(10^a-1)
Since type is unsigned long long if we don't want to have overflow on this addition we must bound our exponent a such that:
// 3*(10^a-1) <= 2^64 - 1, and a an positive integer
// => a <= floor( Log10((2^64 - 1)/3 + 1) )
// => a <= 18
So this has now fixed are maximum possible a=18 and thus the biggest possible n_ represented with unsigned long long is 10^18 -1 = 999,999,999,999,999,999. With this basic set up lets now get to some actual code. For now I will use std::vector to hold the big_num we discussed, but this can change:
// Example code with unsigned long long
#include <cstdlib>
#include <vector>
//
// FOR NOW BigNum WILL BE REPRESENTED
// BY std::vector. YOU CAN CHANGE THIS LATTER
// DEPENDING ON WHAT OPTIMIZATIONS YOU WANT
//
using BigNum = std::vector<unsigned long long>;
// suffix ULL garanties number be interpeted as unsigned long long
#define MAX_BASE_10 999999999999999999ULL
// random generate big number
void randomize_BigNum(BigNum &a){
// assuming MyRandom() returns a random number
// of type unsigned long long
for(size_t i=1; i<a.size(); i++)
a[i] = MyRandom()%(MAX_NUM_BASE_10+1); // cap the numbers
}
// wrapper functions
void add(const BigNum &a, const BigNum &b, BigNum &c); // c = a + b
void add(const BigNum &a, BigNum &b); // b = a + b
// actual work done here
void add_equal_size(const BigNum &a, const BigNum &b, BigNum &c, size_t &N);
void add_equal_size(const BigNum &a, const BigNum &b, size_t &N);
void blindly_add_one(BigNum &c);
// Missing cases
// void add_equal_size(BigNum &a, BigNum &b, BigNum &c, size_t &Na, size_t &Nb);
// void add_equal_size(BigNum &a, BigNum &b, size_t &Na, size_t &Nb);
int main(){
size_t n=10;
BigNum a(n), b(n), c(n);
randomize_BigNum(a);
randomize_BigNum(b);
add(a,b,c);
return;
}
The wrapper functions should look as follows. They will safe guard against incorrect size of array calls:
// To do: add support for when size of a,b,c not equal
// c = a + b
void add(const BigNum &a, const BigNum &b, BigNum &c){
c.resize(std::max(a.size(),b.size()));
if(a.size()==b.size())
add_equal_size(a,b,c,a.size());
else
// To do: add_unequal_size(a,b,c,a.size(),b.size());
return;
};
// b = a + b
void add(const BigNum &a, const BigNum &b){
if(a.size()==b.size())
add_equal_size(a,b,a.size());
else{
b.resize(a.size());
// To do: add_unequal_size(a,b,a.size());
}
return;
};
The main grunt of the work will be done here (which you can call directly and skip a function call, if you know what you are doing):
// If a,b,c same size array
// c = a + b
void add_equal_size(const BigNum &a, const BigNum &b, BigNum &c, const size_t &N){
// start with sign of c is sign of a
// Specific details follow on whether I need to flip the
// sign or not
c[0] = a[0];
unsigned long long carry=0;
// DISTINGUISH TWO GRAND CASES:
//
// a and b have the same sign
// a and b have oposite sign
// no need to check which has which sign (details follow)
//
if(a[0]==b[0]){// if a and b have the same sign
//
// This means that either +a+b or -a-b=-(a+b)
// In both cases I just need to add two numbers a and b
// and I already got the sign of the result c correct form the
// start
//
for(size_t i=1; i<N;i++){
c[i] = (a[i] + b[i] + carry)%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){// if carry>0 then I need to extend my array to fit the final carry
c.resize(N+1);
c[N]=carry;
}
}
else{// if a and b have opposite sign
//
// If I have opposite sign then I am subtracting the
// numbers. The following is inspired by how
// you can subtract two numbers with bitwise operations.
for(size_t i=1; i<N;i++){
c[i] = (a[i] + (MAX_BASE_10 - b[i]) + carry)%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){ // I carried? then I got the sign right from the start
// just add 1 and I am done
blindly_add_one(c);
}
else{ // I didn't carry? then I got the sign wrong from the start
// flip the sign
c[0] ^= 1ULL;
// and take the compliment
for(size_t i=1; i;<N;i++)
c[i] = MAX_BASE_10 - c[i];
}
}
return;
};
A few details about the // if a and b have opposite sign case follow:
Lets work in base 10. Lets say we are subtracting a - b Lets convert this to an addition. Define the following operation:
Lets name the base 10 digits of a number di. Then any number is n = d1 + 10*d2 + 10*10*d3... The compliment of a digit will now be defined as:
`compliment(d1) = 9-d1`
Then the compliment of a number n is:
compliment(n) = compliment(d1)
+ 10*compliment(d2)
+ 10*10*compliment(d3)
...
Consider two case, a>b and a<b:
EXAMPLE OF a>b: lest say a=830 and b=126. Do the following 830 - 126 -> 830 + compliment(126) = 830 + 873 = 1703 ok so if a>b, I drop the 1, and add 1 the result is 704!
EXAMPLE OF a<b: lest say a=126 and b=830. Do the following 126 - 830 -> 126 + compliment(830) = 126 + 169 = 295 ...? Well what if I compliment it? compliment(295) = 704 !!! so if a<b I already have the result... with opposite sign.
Going to our case, since each number in the array is bounded by MAX_BASE_10 the compliment of our numbers is
compliment(n) = MAX_BASE_10 - n
So using this compliment to convert subtraction to addition
I only need to pay attention to if I carried an extra 1 at
the end of the addition (the a>b case). The algorithm now is
FOR EACH ARRAY subtraction (ith iteration):
na_i - nb_i + carry(i-1)
convert -> na_i + compliment(nb_i) + carry(i-1)
bound the result -> (na_i + compliment(nb_i) + carry(i-1))%MAX_BASE_10
find the carry -> (na_i + compliment(nb_i) + carry(i-1))/MAX_BASE_10
keep on adding the array numbers...
At the end of the array if I carried, forget the carry
and add 1. Else take the compliment of the result
This "and add one" is done by yet another function:
// Just add 1, no matter the sign of c
void blindly_add_one(BigNum &c){
unsigned long long carry=1;
for(size_t i=1; i<N;i++){
c[i] = carry%(MAX_BASE_10+1);
carry = c[i]/(MAX_BASE_10+1);
}
if(carry){ // if carry>0 then I need to extend my basis to fit the number
c.resize(N+1);
c[N]=carry;
}
};
Good up to here. Specifically in this code don't forget that at the start of the function we set the sign of c to the sign of a. So if I carry at the end, that means I had |a|>|b| and I did either +a-b>0 or -a+b=-(a-b)<0. In either case setting the results c sign to a sign was correct. If I don't carry I had |a|<|b| with either +a-b<0 or -a+b=-(a-b)>0. In either case setting the results c sign to a sign was INCORRECT so I need to flip the sign if I don't carry.
The following functions opperates the same way as the above one, only rather than do c = a + b it dose b = a + b
// same logic as above, only b = a + b
void add_equal_size(BigNum &a, BigNum &b, size_t &N){
unsigned long long carry=0;
if(a[0]==b[0]){// if a and b have the same sign
for(size_t i=1; i<N;i++){
b[i] = (a[i] + b[i] + carry)%(MAX_BASE_10+1);
carry = b[i]/(MAX_BASE_10+1);
}
if(carry){// if carry>0 then I need to extend my basis to fit the number
b.resize(N+1);
b[N]=carry;
}
}
else{ // if a and b have oposite sign
b[0] = a[0];
for(size_t i=1; i<N;i++){
b[i] = (a[i] + (MAX_BASE_10 - b[i]) + carry)%(MAX_BASE_10+1);
carry = b[i]/(MAX_BASE_10+1);
}
if(carry){
add_one(b);
}
else{
b[0] ^= 1ULL;
for(size_t i=1; i;<N;i++)
b[i] = MAX_BASE_10 - b[i];
}
}
return;
};
And that is a basic set up on how you could use unsigned numbers in arrays to represent very large integers.
WHERE TO GO FROM HERE
Their are many thing to do from here on out to optimise the code, I will mention a few I could think of:
-Try and replace addition of arrays with possible BLAS calls
-Make sure you are taking advantage of vectorization. Depending on how you write your loops you may or may not be generating vectorized code. If your arrays become big you may benefit from this.
-In the spirit of the above make sure you have properly aligned arrays in memory to actually take advantage of vectorization. From my understanding std::vector dose not guaranty alignment. Neither dose a blind malloc. I think boost libraries have a vector version where you can declare a fixed alignment in which case you can ask for a 64bit aligned array for your unsigned long long array. Another option is to have your own class that manages a raw pointer and dose aligned allocations with a custom alocator. Borrowing aligned_malloc and aligned_free from https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/ you could have a class like this to replace std::vector:
// aligned_malloc and aligned_free from:
// https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/
// wrapping in absolutly minimal class to handle
// memory allocation and freeing
class BigNum{
private:
unsigned long long *ptr;
size_t size;
public:
BigNum() : ptr(nullptr)
, size(0)
{};
BigNum(const size_t &N) : ptr(nullptr)
, size(N)
{
resize(N);
}
// Defining destructor this will now delete copy and move constructor and assignment. Make your own if you need them
~BigNum(){
aligned_free(ptr);
}
// Access an object in aligned storage
const unsigned long long& operator[](std::size_t pos) const{
return *reinterpret_cast<const unsigned long long*>(&ptr[pos]);
}
// return my size
void size(){
return size;
}
// resize memory allocation
void resize(const size_t &N){
size = N;
if(N){
void* temp = aligned_malloc(ptr,N+1); // N+1, always keep first entry for the sign of BigNum
if(temp!=nullptr)
ptr = static_cast<unsigned long long>(temp);
else
throw std::bad_alloc();
}
else{
aligned_free(ptr);
}
}
};
I have written an attempt at my own RSA algorithm, but the encryption portion isn't quite working when I use fairly large numbers (nothing like the size which should be used for RSA) and I'm not sure why.
It works in the following way:
The input is a list of characters, for this example "abc"
This is converted to an array: [10,11,12]. (I have chosen 10 - 35 for lower case letters so that they are all 2 digit numbers just to make it easier)
The numbers are combined to form 121110 (using 12*100^2 + 11*100^1 + 10*100^0)
Apply the algorithm: m^e (mod n)
This is simplified using a^b (mod n) = a^c (mod n) * a^d (mod n)
This works for small values in that it can be deciphered using the decryption program which I have written.
When using larger values the output is always 1844674407188030241, with a little bit of research I found that this is roughly 2^64 (to 10 significant figures, it has been pointed out that odd numbers can't be powers of two, oops). I am sure that there is something that I have overlooked and I apologise for what (I really hope) will be a trivial question with an easy answer. Why is the output value always 2^64 and what can I change to fix it? Thank you very much for any help, here is my code:
#include <iostream>
#include <string>
#include <math.h>
int returnVal (char x)
{
return (int) x;
}
unsigned long long modExp(unsigned long long b, unsigned long long e, unsigned long long m)
{
unsigned long long remainder;
int x = 1;
while (e != 0)
{
remainder = e % 2;
e= e/2;
if (remainder == 1)
x = (x * b) % m;
b= (b * b) % m;
}
return x;
}
unsigned mysteryFunction(const std::string& input)
{
unsigned result = 0;
unsigned factor = 1;
for (size_t i = 0; i < input.size(); ++i)
{
result += factor * (input[i] - 87);
factor *= 100;
}
return result;
}
int main()
{
unsigned long long p = 70021;
unsigned long long q = 80001;
int e = 7;
unsigned long long n = p * q;
std::string foo = "ab";
for (int i = 0; i < foo.length(); i++);
{
std::cout << modExp (mysteryFunction(foo), e, n);
}
}
Your code has several problems.
Problem 1: Inconsistent use of unsigned long long.
int x = 1;
Changing this declaration in modExp to unsigned long long causes the program to give a more reasonable-looking result. I don't whether it's the correct result, but it's less than n, at least. I'm still not sure what the exact mechanism of the error was. I can see ways it would have screwed things up, but none that could have caused an output of 1844674407188030241.
Problem 2: Composite "primes".
For RSA, p and q both need to be prime. Neither p nor q is prime in your code.
70021 = 7^2 * 1429
80001 = 3^2 * 2963
In mysteryFunction, you subtract 89, which corresponds to 'W', from the input characters. You probably want to subtract '97' instead, which corresponds to 'a'.
How would you compute a combination such as (100,000 choose 50,000)?
I have tried three different approaches thus far but for obvious reasons each has failed:
1) Dynamic Programming- The size of the array just gets to be so ridiculous it seg faults
unsigned long long int grid[p+1][q+1];
//Initialise x boundary conditions
for (long int i = 0; i < q; ++i) {
grid[p][i] = 1;
}
//Initialise y boundary conditions
for (long int i = 0; i < p; ++i) {
grid[i][q] = 1;
}
for (long int i = p - 1; i >= 0; --i) {
for (long int j = q - 1; j >= 0; --j) {
grid[i][j] = grid[i+1][j] + grid[i][j+1];
}
}
2) Brute Force - Obviously calculating even 100! isn't realistic
unsigned long long int factorial(long int n)
{
return (n == 1 || n == 0) ? 1 : factorial(n - 1) * n;
}
3) Multiplicative Formula- I'm unable to store the values they are just so large
const int gridSize = 100000; //say 100,000
unsigned long long int paths = 1;
for (int i = 0; i < gridSize; i++) {
paths *= (2 * gridSize) - i;
paths /= i + 1;
}
// code from (http://www.mathblog.dk/project-euler-15/)
If it helps for context the aim of this is to solve the "How many routes are there through an m×n grid" problem for large inputs. Maybe I am miss-attacking the problem?
C(100000, 50000) is a huge number with 30101 decimal digits: http://www.wolframalpha.com/input/?i=C%28100000%2C+50000%29
Obviously unsigned long long will not be enough to store it. You need some arbitrary large integers library, like GMP: http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library
Otherwise, multiplicative formula should be good enough.
"How would you compute ..." depends very much on the desired accuracy. Precise results can only be computed with arbitrary precission numbers (eg. GMP), but it is rather likely that you don't really need the exact result.
In that case I would use the Stirling Approximation for factorials ( http://en.wikipedia.org/wiki/Stirling%27s_approximation ) and calculate with doubles. The number of summands in the expansion can be used to regulate the error. The wikipedia page will also give you an error estimate.
Here is recursive formula that might help : -
NCk = (N-1)C(k-1)*N/K
Use a recursive call for (N-1)C(K-1) first then evaluate NCk on result.
As your numbers will be very large use one of following alternatives.
GMP
Use your own implementation where you can store numbers as sequence of binary bits in array and use booth's algorithm for multiplication
and shift & subtract for division.
I'm building a small BigInt library in C++ for use in my programming language.
The structure is like the following:
short digits[ 1000 ];
int len;
I have a function that converts a string into a bigint by splitting it up into single chars and putting them into digits.
The numbers in digits are all reversed, so the number 123 would look like the following:
digits[0]=3 digits[1]=3 digits[2]=1
I have already managed to code the adding function, which works perfectly.
It works somewhat like this:
overflow = 0
for i ++ until length of both numbers exceeded:
add numberA[ i ] to numberB[ i ]
add overflow to the result
set overflow to 0
if the result is bigger than 10:
substract 10 from the result
overflow = 1
put the result into numberReturn[ i ]
(Overflow is in this case what happens when I add 1 to 9: Substract 10 from 10, add 1 to overflow, overflow gets added to the next digit)
So think of how two numbers are stored, like those:
0 | 1 | 2
---------
A 2 - -
B 0 0 1
The above represents the digits of the bigints 2 (A) and 100 (B).
- means uninitialized digits, they aren't accessed.
So adding the above number works fine: start at 0, add 2 + 0, go to 1, add 0, go to 2, add 1
But:
When I want to do multiplication with the above structure, my program ends up doing the following:
Start at 0, multiply 2 with 0 (eek), go to 1, ...
So it is obvious that, for multiplication, I have to get an order like this:
0 | 1 | 2
---------
A - - 2
B 0 0 1
Then, everything would be clear: Start at 0, multiply 0 with 0, go to 1, multiply 0 with 0, go to 2, multiply 1 with 2
How can I manage to get digits into the correct form for multiplication?
I don't want to do any array moving/flipping - I need performance!
Why are you using short to store digits in the [0..9] a char would suffice
You're thinking incorrectly about the multiplication. In the case of multiplication you need a double for loop that multiplies B with each digit in A and sums them up shifted with the correct power of ten.
EDIT: Since some anonymous downvoted this without a comment this is basically the multiplication algorithm:
bigint prod = 0
for i in A
prod += B * A[i] * (10 ^ i)
The multiplication of B with A[i] is done by an extra for loop where you also keep track of the carry. The (10 ^ i) is achieved by offseting the destination indices since bigint is in base 10.
Your example in the question is over-engineering at its best in my opinion. Your approach will end up even slower than normal long multiplication, because of the shear number of multiplications and additions involved. Don't limit yourself to working at one base digit at a time when you can multiply approximately 9 at a time!. Convert the base10 string to a hugeval, and then do operations on it. Don't do operations directly on the string. You will go crazy. Here is some code which demonstrates addition and multiplication. Change M to use a bigger type. You could also use std::vector, but then you miss out on some optimizations.
#include <iostream>
#include <string>
#include <algorithm>
#include <sstream>
#include <cstdlib>
#include <cstdio>
#include <iomanip>
#ifdef _DEBUG
#include <assert.h>
#define ASSERT(x) assert(x)
#else
#define ASSERT(x)
#endif
namespace Arithmetic
{
const int M = 64;
const int B = (M-1)*32;
struct Flags
{
Flags() : C(false),Z(false),V(false),N(false){}
void Clear()
{
C = false;
Z = false;
V = false;
N = false;
}
bool C,Z,V,N;
};
static unsigned int hvAdd(unsigned int a, unsigned int b, Flags& f)
{
unsigned int c;
f.Clear();
//b = -(signed)b;
c = a + b;
f.N = (c >> 31UL) & 0x1;
f.C = (c < a) && (c < b);
f.Z = !c;
f.V = (((signed)a < (signed)b) != f.N);
return c;
}
static unsigned int hvSub(unsigned int a, unsigned int b, Flags& f)
{
unsigned int c;
f.Clear();
c = a - b;
//f.N = ((signed)c < 0);
f.N = (c >> 31UL) & 0x1;
f.C = (c < a) && (c < b);
f.Z = !c;
f.V = (((signed)a < (signed)b) != f.N);
return c;
}
struct HugeVal
{
HugeVal()
{
std::fill(part, part + M, 0);
}
HugeVal(const HugeVal& h)
{
std::copy(h.part, h.part + M, part);
}
HugeVal(const std::string& str)
{
Flags f;
unsigned int tmp = 0;
std::fill(part, part + M, 0);
for(unsigned int i=0; i < str.length(); ++i){
unsigned int digit = (unsigned int)str[i] - 48UL;
unsigned int carry_last = 0;
unsigned int carry_next = 0;
for(int i=0; i<M; ++i){
tmp = part[i]; //the value *before* the carry add
part[i] = hvAdd(part[i], carry_last, f);
carry_last = 0;
if(f.C)
++carry_last;
for(int j=1; j<10; ++j){
part[i] = hvAdd(part[i], tmp, f);
if(f.C)
++carry_last;
}
}
part[0] = hvAdd(part[0], digit, f);
int index = 1;
while(f.C && index < M){
part[index] = hvAdd(part[index], 1, f);
++index;
}
}
}
/*
HugeVal operator= (const HugeVal& h)
{
*this = HugeVal(h);
}
*/
HugeVal operator+ (const HugeVal& h) const
{
HugeVal tmp;
Flags f;
int index = 0;
unsigned int carry_last = 0;
for(int j=0; j<M; ++j){
if(carry_last){
tmp.part[j] = hvAdd(tmp.part[j], carry_last, f);
carry_last = 0;
}
tmp.part[j] = hvAdd(tmp.part[j], part[j], f);
if(f.C)
++carry_last;
tmp.part[j] = hvAdd(tmp.part[j], h.part[j], f);
if(f.C)
++carry_last;
}
return tmp;
}
HugeVal operator* (const HugeVal& h) const
{
HugeVal tmp;
for(int j=0; j<M; ++j){
unsigned int carry_next = 0;
for(int i=0;i<M; ++i){
Flags f;
unsigned int accum1 = 0;
unsigned int accum2 = 0;
unsigned int accum3 = 0;
unsigned int accum4 = 0;
/* Split into 16-bit values */
unsigned int j_LO = part[j]&0xFFFF;
unsigned int j_HI = part[j]>>16;
unsigned int i_LO = h.part[i]&0xFFFF;
unsigned int i_HI = h.part[i]>>16;
size_t index = i+j;
size_t index2 = index+1;
/* These multiplications are safe now. Can't overflow */
accum1 = j_LO * i_LO;
accum2 = j_LO * i_HI;
accum3 = j_HI * i_LO;
accum4 = j_HI * i_HI;
if(carry_next){ //carry from last iteration
accum1 = hvAdd(accum1, carry_next, f); //add to LSB
carry_next = 0;
if(f.C) //LSB produced carry
++carry_next;
}
/* Add the lower 16-bit parts of accum2 and accum3 to accum1 */
accum1 = hvAdd(accum1, (accum2 << 16), f);
if(f.C)
++carry_next;
accum1 = hvAdd(accum1, (accum3 << 16), f);
if(f.C)
++carry_next;
if(carry_next){ //carry from LSB
accum4 = hvAdd(accum4, carry_next, f); //add to MSB
carry_next = 0;
ASSERT(f.C == false);
}
/* Add the higher 16-bit parts of accum2 and accum3 to accum4 */
/* Can't overflow */
accum4 = hvAdd(accum4, (accum2 >> 16), f);
ASSERT(f.C == false);
accum4 = hvAdd(accum4, (accum3 >> 16), f);
ASSERT(f.C == false);
if(index < M){
tmp.part[index] = hvAdd(tmp.part[index], accum1, f);
if(f.C)
++carry_next;
}
carry_next += accum4;
}
}
return tmp;
}
void Print() const
{
for(int i=(M-1); i>=0; --i){
printf("%.8X", part[i]);
}
printf("\n");
}
unsigned int part[M];
};
}
int main(int argc, char* argv[])
{
std::string a1("273847238974823947823941");
std::string a2("324230432432895745949");
Arithmetic::HugeVal a = a1;
Arithmetic::HugeVal b = a2;
Arithmetic::HugeVal d = a + b;
Arithmetic::HugeVal e = a * b;
a.Print();
b.Print();
d.Print();
e.Print();
system("pause");
}
Andreas is right, that you have to multiply one number by each digit of the other and sum the results accordingly. It is better to multiply a longer number by digits of the shorter one I think. If You store decimal digits in Your array char would indeed suffice, but if you want performance, maybe you should consider bigger type. I don't know what Your platform is, but in case of x86 for example You can use 32 bit ints and hardware support for giving 64 bit result of 32 bit multiplication.
Alright seeing that this question is answered almost 11 years ago, I figure I'll provide some pointers for the one who is writing their own BigInt library.
First off, if what you want is purely performance instead of studying how to actually write performant code, please just learn how to use GMP or OpenSSL. There is a really really steep learning curve to reach the level of GMP's performance.
Ok, let's get right into it.
Don't use base 10 when you can use a bigger base.
CPUs are god-level good at addition, subtraction, multiplication, and division, so take advantage of them.
Suppose you have two BigInt
a = {9,7,4,2,6,1,6,8} // length 8
b = {3,6,7,2,4,6,7,8} // length 8
// Frustrating writing for-loops to calculate a*b
Don't make them do 50 calculations in base 10 when they could do 1 calculations of base 2^32:
a = {97426168}
b = {36724678}
// Literally only need to type a*b
If the biggest number your computer can represent is 2^64-1, use 2^32-1 as the base for your BigInt, as it will solve the problem of actually overflowing when doing multiplication.
Use a structure that supports dynamic memory. Scaling your program to handle the multiplication of two 1-million digits numbers would probably break you program since it doesn't have enough memory on the stack. Use a std::vector instead of std::array or raw int[] in C to make use of your memory.
Learn about SIMD to give your calculation a boost in performance. Typical loops in noobs' codes can't process multiple data at the same time. Learning this should speed things up from 3 to 12 times.
Learn how to write your own memory allocators. If you use std::vector to store your unsigned integers, chances are, later on, you'll suffer performance problems as std::vector is only for general purposes only. Try to tailor your allocator to your own need to avoid allocating and reallocating every time a calculation is performed.
Learn about your computer's architecture and memory layout. Writing your own assembly code to fit certain CPU architecture would certainly boost your performance. This helps with writing your own memory allocator and SIMD too.
Algorithms. For small BigInt you can rely on your grade school multiplication but as the input grows, certainly take a good look at Karatsuba, Toom-Cook, and finally FFT to implement in your library.
If you're stuck, please visit my BigInt library. It doesn't have custom allocator, SIMD code or custom assembly code, but for starters of BigInteger it should be enough.
I'm building a small BigInt library in C++ for use in my programming language.
Why? There are some excellent existing bigint libraries out there (e.g., gmp, tommath) that you can just use without having to write your own from scratch. Making your own is a lot of work, and the result is unlikely to be as good in performance terms. (In particular, writing fast code to perform multiplies and divides is quite a lot trickier than it appears at first glance.)