Represent Integers with 2000 or more digits [duplicate] - c++

This question already has answers here:
Handling large numbers in C++?
(10 answers)
Closed 7 years ago.
I would like to write a program, which could compute integers having more then 2000 or 20000 digits (for Pi's decimals). I would like to do in C++, without any libraries! (No big integer, boost,...). Can anyone suggest a way of doing it? Here are my thoughts:
using const char*, for holding the integer's digits;
representing the number like
( (1 * 10 + x) * 10 + x )...

The obvious answer works along these lines:
class integer {
bool negative;
std::vector<std::uint64_t> data;
};
Where the number is represented as a sign bit and a (unsigned) base 2**64 value.
This means the absolute value of your number is:
data[0] + (data[1] << 64) + (data[2] << 128) + ....
Or, in other terms you represent your number as a little-endian bitstring with words as large as your target machine can reasonably work with. I chose 64 bit integers, as you can minimize the number of individual word operations this way (on a x64 machine).
To implement Addition, you use a concept you have learned in elementary school:
a b
+ x y
------------------
(a+x+carry) (b+y reduced to one digit length)
The reduction (modulo 2**64) happens automatically, and the carry can only ever be either zero or one. All that remains is to detect a carry, which is simple:
bool next_carry = false;
if(x += y < y) next_carry = true;
if(prev_carry && !++x) next_carry = true;
Subtraction can be implemented similarly using a borrow instead.
Note that getting anywhere close to the performance of e.g. libgmp is... unlikely.

A long integer is usually represented by a sequence of digits (see positional notation). For convenience, use little endian convention: A[0] is the lowest digit, A[n-1] is the highest one. In general case your number is equal to sum(A[i] * base^i) for some value of base.
The simplest value for base is ten, but it is not efficient. If you want to print your answer to user often, you'd better use power-of-ten as base. For instance, you can use base = 10^9 and store all digits in int32 type. If you want maximal speed, then better use power-of-two bases. For instance, base = 2^32 is the best possible base for 32-bit compiler (however, you'll need assembly to make it work optimally).
There are two ways to represent negative integers, The first one is to store integer as sign + digits sequence. In this case you'll have to handle all cases with different signs yourself. The other option is to use complement form. It can be used for both power-of-two and power-of-ten bases.
Since the length of the sequence may be different, you'd better store digit sequence in std::vector. Do not forget to remove leading zeroes in this case. An alternative solution would be to store fixed number of digits always (fixed-size array).
The operations are implemented in pretty straightforward way: just as you did them in school =)
P.S. Alternatively, each integer (of bounded length) can be represented by its reminders for a set of different prime modules, thanks to CRT. Such a representation supports only limited set of operations, and requires nontrivial convertion if you want to print it.

Related

Store one decimal digit

I have a problem which concern with large number of small integers (actually decimal digits). What is the space efficient way to store such a data?
Is it good idea to use std::bitset<4> to store one decimal digit?
Depending on how space-efficient it has to be and how efficient the retrieval should be, I see two possibilities:
Since a vector of std::bitset<4> is (as far as I know) stored in an unpacked setting (each bitset is stored in a memory word, either 32 or 64 bit), you should probably at least use a packed representation like using a 64 bit word to store 16 digits:
store (if the digit was not stored before):
block |= digit << 4 * index
load:
digit = (block >> 4 * index) & 0xF
reset:
block &= ~(0xF << 4 * index);
A vector of these 64 bit words (uint64_t) together with some access methods should be easy to implement.
If your space requirements are even tighter, you could e.g. try packing 3 digits in 10 bits (at most 1024) using divisions and modulo, which would be a lot less time-efficient. Also the alignment with 64 bit words is much more difficult, so I would only recommend this if you need to get the final 16% improvement, at most you can get something like 3.3 bits per digit.
If you want a very compact way, then no, using bitset<4> is a bad idea, because bitset<4> will use at least one byte, instead of 4 bits.
I'd recommend using std::vector<std::uint32_t>
You can store multiple digits in an uint32_t. Two usual ways:
Use for 4 bits for each digit, and use bit operations. This way you can store 8 digits in 4 bytes. Here, set/get operations are pretty fast. Efficiency: 4bit/digit
Use base 10 encoding. uint32_t max value is 256^4-1, which is capable to store 9 digits in 4 bytes. Efficiency: 3.55bit/digit. Here, if you need to set/get all the 9 digits, then it is almost as fast than the previous version (as division by 10 will be optimized by a good compiler, no actual division will be done by the CPU). If you need random access, then set/get will be slower than the previous version (you can speed it up with libdivide).
If you use uint64_t instead of uint32_t, then you can store 16 digits with the first way (same 4bit/digit efficiency), and 19 digits with the second way: 3.36bit/digit efficieny, which is pretty close to the theoretical minimum: ~3.3219bit/digit
Is it good idea to use std::bitset<4> to store one decimal digit?
Yes, in principle that's a good idea. It's a well known optimization and called BCD encoding.
(actually decimal digits). What is the space efficient way to store such a data?
You can compact the decimal digit representation by using one nibble of the occupied byte. Also math might be applied optimized, vs. ASCII representation of digits or such.
The std::bitset<4> won't serve that well for compacting the data.
std::bitset<4> will still occupy a full byte.
An alternative data structure I could think of is a bitfield
// Maybe #pragma pack(push(1))
struct TwoBCDDecimalDigits {
uint8_t digit1 : 4;
uint8_t digit2 : 4;
};
// Maybe #pragma pack(pop)
There is even a library available, to convert this format to a normalized numerical format supported at your target CPU architecture:
XBCD_Math
Another way I could think of is to write your own class:
class BCDEncodedNumber {
enum class Sign_t : char {
plus = '+' ,
minus = '-'
};
std::vector<uint8_t> doubleDigitsArray;
public:
BCDEncodedNumber() = default;
BCDEncodedNumber(int num) {
AddDigits(num); // Implements math operation + against the
// current BCD representation stored in
// doubleDigitsArray.
}
};

Own Big Float in C++

I want to write my own variable "type" as a homework in C++. It should be an arbitrarily long float. I was thinking of structure like...
Code:
class bigFloat
{
public:
bigFloat(arguments);
~bigFloat();
private:
std::vector<char> before; // numbers before decimal point
std::vector<char> after; // numbers after decimal point
int pos; // position of decimal point
};
Where if i have number like: 3.1415
Before = '3'; after = '1415'; pos = 1;
If that makes sense to you... BUT assignment wants me to save some memory, which I don't because for every number I allocate about it is about 1 byte, which is too much I guess.
Question:
How would you represent those arbitrarily long numbers?
(Sorry for my bad english, I hope the post makes sense)
If you need to preserve memory, all that means is that you need to use memory as efficiently as possible. In other words, given the value you're storing, you shouldn't waste bytes.
Example:
255 doesn't need 32 bits
I think your vector of chars is fine. If you're allowed to use a C++11 compiler, I'd probably change that to a vector of uint8_t and make sure when I'm storing the value that I can store a value from 0 to 255 in a vector of size 1.
However, that's not the end of it. From the sounds of it, what you're after is an arbitrary number of significant digits. However, for a true float representation, you also need to allocate storage for the base and exponent, after deciding what the base will be for your type. There is also the question of whether you want your exponent to be arbitrarily long too. Let's assume so.
So, I'd probably use something like this for members of your class:
//Assuming a base of 10.
static uint8_t const base = 10;
std::vector<uint8_t> digits_before_decimal;
std::vector<uint8_t> digits_after_decimal;
std::vector<uint8_t> exponent;
std::bitset<1> sign;
It is then a matter of implementing the various operators for your type and testing various scenarios to make sure your solution works.
If you really want to be thorough, you could use a simple testing framework to make sure that problems you fix along the way, stay fixed.
In memory, it will essentially look like a binary representation of the number.
For example:
65535 will be: before_decimal =<0xff,0xff>, after_decimal vector is empty
255.255 will be: before_decimal =<0xff>, after_decimal=<0xff>
255255 will be: before_decimal =<0x03,0xe5,0x17>, after_decimal vector is empty
255255.0 will be: before_decimal =<0x03,0xe5,0x17>, after_decimal: <0>
As others have mentioned, you don't really need two vectors for before and after the decimal. However, I'm using two in my answer because it makes it easier to understand and you don't have to keep track of the decimal. The memory requirements of two vs one vector really aren't that different when you're dealing with a long string of digits.
I should also note that using an integer to record position of the decimal point limits your number of digits to 2 billion, which is not an arbitrarily long number.
UPDATE: If this actually is homework, I would check with whoever has given you the homework if you need to support any floating point special cases, the simplest of which would be NaNs. There are other special cases too, but trying to implement all of them will very quickly turn this from a homework assignment into a thesis. Good luck :)
Don't use two separate vectors before and after. You need whole mantissa to make arithmetic operations.
Actually your pos is exponent. Name it accordingly. Exponent is signed btw.
You need sign of mantissa.
I recommend to store mantissa as rational fraction. You need two numbers: numerator and denominator. Then you can make division without round-off.
It's better to store numbers as ints with arbitrary length instead of arrays of digits.
PS. I made such calculator long time ago. To illustrate my answer, I give you declaration of class for number:
class CNumber
{
// ctors, methods....
char cSign; // sign of mantissa
CString strNumer; // numerator of mantissa
CString strDenom; // denominator of mantissa
char cExpSign; // sign of exponent
CString strExp; // exponent
};
I used MFC. CString is standard string there.

big integer addition without carry flag

In assembly languages, there is usually an instruction that adds two operands and a carry. If you want to implement big integer additions, you simply add the lowest integers without a carry and the next integers with a carry. How would I do that efficiently in C or C++ where I don't have access to the carry flag? It should work on several compilers and architectures, so I cannot simply use inline assembly or such.
You can use "nails" (a term from GMP): rather than using all 64 bits of a uint64_t when representing a number, use only 63 of them, with the top bit zero. That way you can detect overflow with a simple bit-shift. You may even want less than 63.
Or, you can do half-word arithmetic. If you can do 64-bit arithmetic, represent your number as an array of uint32_ts (or equivalently, split 64-bit words into upper and lower 32-bit chunks). Then, when doing arithmetic operations on these 32-bit integers, you can first promote to 64 bits do the arithmetic there, then convert back. This lets you detect carry, and it's also good for multiplication if you don't have a "multiply hi" instruction.
As the other answer indicates, you can detect overflow in an unsigned addition by:
uint64_t sum = a + b;
uint64_t carry = sum < a;
As an aside, while in practice this will also work in signed arithmetic, you have two issues:
It's more complex
Technically, overflowing a signed integer is undefined behavior
so you're usually better off sticking to unsigned numbers.
You can figure out the carry by virtue of the fact that, if you overflow by adding two numbers, the result will always be less than either of those other two values.
In other words, if a + b is less than a, it overflowed. That's for positive values of a and b of course but that's what you'd almost certainly be using for a bignum library.
Unfortunately, a carry introduces an extra complication in that adding the largest possible value plus a carry of one will give you the same value you started with. Hence, you have to handle that as a special case.
Something like:
carry = 0
for i = 7 to 0:
if a[i] > b[i]:
small = b[i], large = a[i]
else:
small = a[i], large = b[i]
if carry is 1 and large is maxvalue:
c[i] = small
carry = 1
else:
c[i] = large + small + carry
if c[i] < large:
carry = 1
else
carry = 0
In reality, you may also want to consider not using all the bits in your array elements.
I've implemented libraries in the past, where the maximum "digit" is less than or equal to the square root of the highest value it can hold. So for 8-bit (octet) digits, you store values from 0 through 15 - that way, multiplying two digits and adding the maximum carry will always fit with an octet, making overflow detection moot, though at the cost of some storage.
Similarly, 16-bit digits would have the range 0 through 255 so that it won't overflow at 65536.
In fact, I've sometimes limited it to more than that, ensuring the artificial wrap value is a power of ten (so an octet would hold 0 through 9, 16-bit digits would be 0 through 99, 32-bit digits from 0 through 9999, and so on.
That's a bit more wasteful on space but makes conversion to and from text (such as printing your numbers) incredibly easy.
u can check for carry for unsigned types by checking, is result less than an operand (any operand will do).
just start the thing with carry 0.
If I understand you correctly, you want to write you own addition for you own big integer type.
You can do this with a simple function. No need to worry about the carry flag in the first run. Just go from right to left, add digit by digit and the carry flag (internally in that function), starting with a carry of 0, and set the result to (a+b+carry) %10 and the carry to (a+b+carry) / 10.
this SO could be relevant:
how to implement big int in c

Best way to get individual digits from int for radix sort in C/C++

What is the best way to get individual digits from an int with n number of digits for use in a radix sort algorithm? I'm wondering if there is a particularly good way to do it in C/C++, if not what is the general best solution?
edit: just to clarify, i was looking for a solution other than converting it to a string and treating it like an array of digits.
Use digits of size 2^k. To extract the nth digit:
#define BASE (2<<k)
#define MASK (BASE-1)
inline unsigned get_digit(unsigned word, int n) {
return (word >> (n*k)) & MASK;
}
Using the shift and mask (enabled by base being a power of 2) avoids expensive integer-divide instructions.
After that, choosing the best base is an experimental question (time/space tradeoff for your particular hardware). Probably k==3 (base 8) works well and limits the number of buckets, but k==4 (base 16) looks more attractive because it divides the word size. However, there is really nothing wrong with a base that does not divide the word size, and you might find that base 32 or base 64 perform better. It's an experimental question and may likely differ by hardware, according to how the cache behaves and how many elements there are in your array.
Final note: if you are sorting signed integers life is a much bigger pain, because you want to treat the most significant bit as signed. I recommend treating everything as unsigned, and then if you really need signed, in the last step of your radix sort you will swap the buckets, so that buckets with a most significant 1 come before a most significant 0. This problem is definitely easier if k divides the word size.
Don't use base 10, use base 16.
for (int i = 0; i < 8; i++) {
printf("%d\n", (n >> (i*4)) & 0xf);
}
Since integers are stored internally in binary, this will be more efficient than dividing by 10 to determine decimal digits.

How to implement big int in C++

I'd like to implement a big int class in C++ as a programming exercise—a class that can handle numbers bigger than a long int. I know that there are several open source implementations out there already, but I'd like to write my own. I'm trying to get a feel for what the right approach is.
I understand that the general strategy is get the number as a string, and then break it up into smaller numbers (single digits for example), and place them in an array. At this point it should be relatively simple to implement the various comparison operators. My main concern is how I would implement things like addition and multiplication.
I'm looking for a general approach and advice as opposed to actual working code.
A fun challenge. :)
I assume that you want integers of arbitrary length. I suggest the following approach:
Consider the binary nature of the datatype "int". Think about using simple binary operations to emulate what the circuits in your CPU do when they add things. In case you are interested more in-depth, consider reading this wikipedia article on half-adders and full-adders. You'll be doing something similar to that, but you can go down as low level as that - but being lazy, I thought I'd just forego and find a even simpler solution.
But before going into any algorithmic details about adding, subtracting, multiplying, let's find some data structure. A simple way, is of course, to store things in a std::vector.
template< class BaseType >
class BigInt
{
typedef typename BaseType BT;
protected: std::vector< BaseType > value_;
};
You might want to consider if you want to make the vector of a fixed size and if to preallocate it. Reason being that for diverse operations, you will have to go through each element of the vector - O(n). You might want to know offhand how complex an operation is going to be and a fixed n does just that.
But now to some algorithms on operating on the numbers. You could do it on a logic-level, but we'll use that magic CPU power to calculate results. But what we'll take over from the logic-illustration of Half- and FullAdders is the way it deals with carries. As an example, consider how you'd implement the += operator. For each number in BigInt<>::value_, you'd add those and see if the result produces some form of carry. We won't be doing it bit-wise, but rely on the nature of our BaseType (be it long or int or short or whatever): it overflows.
Surely, if you add two numbers, the result must be greater than the greater one of those numbers, right? If it's not, then the result overflowed.
template< class BaseType >
BigInt< BaseType >& BigInt< BaseType >::operator += (BigInt< BaseType > const& operand)
{
BT count, carry = 0;
for (count = 0; count < std::max(value_.size(), operand.value_.size(); count++)
{
BT op0 = count < value_.size() ? value_.at(count) : 0,
op1 = count < operand.value_.size() ? operand.value_.at(count) : 0;
BT digits_result = op0 + op1 + carry;
if (digits_result-carry < std::max(op0, op1)
{
BT carry_old = carry;
carry = digits_result;
digits_result = (op0 + op1 + carry) >> sizeof(BT)*8; // NOTE [1]
}
else carry = 0;
}
return *this;
}
// NOTE 1: I did not test this code. And I am not sure if this will work; if it does
// not, then you must restrict BaseType to be the second biggest type
// available, i.e. a 32-bit int when you have a 64-bit long. Then use
// a temporary or a cast to the mightier type and retrieve the upper bits.
// Or you do it bitwise. ;-)
The other arithmetic operation go analogous. Heck, you could even use the stl-functors std::plus and std::minus, std::times and std::divides, ..., but mind the carry. :) You can also implement multiplication and division by using your plus and minus operators, but that's very slow, because that would recalculate results you already calculated in prior calls to plus and minus in each iteration. There are a lot of good algorithms out there for this simple task, use wikipedia or the web.
And of course, you should implement standard operators such as operator<< (just shift each value in value_ to the left for n bits, starting at the value_.size()-1... oh and remember the carry :), operator< - you can even optimize a little here, checking the rough number of digits with size() first. And so on. Then make your class useful, by befriendig std::ostream operator<<.
Hope this approach is helpful!
Things to consider for a big int class:
Mathematical operators: +, -, /,
*, % Don't forget that your class may be on either side of the
operator, that the operators can be
chained, that one of the operands
could be an int, float, double, etc.
I/O operators: >>, << This is
where you figure out how to properly
create your class from user input, and how to format it for output as well.
Conversions/Casts: Figure out
what types/classes your big int
class should be convertible to, and
how to properly handle the
conversion. A quick list would
include double and float, and may
include int (with proper bounds
checking) and complex (assuming it
can handle the range).
There's a complete section on this: [The Art of Computer Programming, vol.2: Seminumerical Algorithms, section 4.3 Multiple Precision Arithmetic, pp. 265-318 (ed.3)]. You may find other interesting material in Chapter 4, Arithmetic.
If you really don't want to look at another implementation, have you considered what it is you are out to learn? There are innumerable mistakes to be made and uncovering those is instructive and also dangerous. There are also challenges in identifying important computational economies and having appropriate storage structures for avoiding serious performance problems.
A Challenge Question for you: How do you intend to test your implementation and how do you propose to demonstrate that it's arithmetic is correct?
You might want another implementation to test against (without looking at how it does it), but it will take more than that to be able to generalize without expecting an excrutiating level of testing. Don't forget to consider failure modes (out of memory problems, out of stack, running too long, etc.).
Have fun!
addition would probably have to be done in the standard linear time algorithm
but for multiplication you could try http://en.wikipedia.org/wiki/Karatsuba_algorithm
Once you have the digits of the number in an array, you can do addition and multiplication exactly as you would do them longhand.
Don't forget that you don't need to restrict yourself to 0-9 as digits, i.e. use bytes as digits (0-255) and you can still do long hand arithmetic the same as you would for decimal digits. You could even use an array of long.
I'm not convinced using a string is the right way to go -- though I've never written code myself, I think that using an array of a base numeric type might be a better solution. The idea is that you'd simply extend what you've already got the same way the CPU extends a single bit into an integer.
For example, if you have a structure
typedef struct {
int high, low;
} BiggerInt;
You can then manually perform native operations on each of the "digits" (high and low, in this case), being mindful of overflow conditions:
BiggerInt add( const BiggerInt *lhs, const BiggerInt *rhs ) {
BiggerInt ret;
/* Ideally, you'd want a better way to check for overflow conditions */
if ( rhs->high < INT_MAX - lhs->high ) {
/* With a variable-length (a real) BigInt, you'd allocate some more room here */
}
ret.high = lhs->high + rhs->high;
if ( rhs->low < INT_MAX - lhs->low ) {
/* No overflow */
ret.low = lhs->low + rhs->low;
}
else {
/* Overflow */
ret.high += 1;
ret.low = lhs->low - ( INT_MAX - rhs->low ); /* Right? */
}
return ret;
}
It's a bit of a simplistic example, but it should be fairly obvious how to extend to a structure that had a variable number of whatever base numeric class you're using.
Use the algorithms you learned in 1st through 4th grade.
Start with the ones column, then the tens, and so forth.
Like others said, do it to old fashioned long-hand way, but stay away from doing this all in base 10. I'd suggest doing it all in base 65536, and storing things in an array of longs.
If your target architecture supports BCD (binary coded decimal) representation of numbers, you can get some hardware support for the longhand multiplication/addition that you need to do. Getting the compiler to emit BCD instruction is something you'll have to read up on...
The Motorola 68K series chips had this. Not that I'm bitter or anything.
My start would be to have an arbitrary sized array of integers, using 31 bits and the 32n'd as overflow.
The starter op would be ADD, and then, MAKE-NEGATIVE, using 2's complement. After that, subtraction flows trivially, and once you have add/sub, everything else is doable.
There are probably more sophisticated approaches. But this would be the naive approach from digital logic.
Could try implementing something like this:
http://www.docjar.org/html/api/java/math/BigInteger.java.html
You'd only need 4 bits for a single digit 0 - 9
So an Int Value would allow up to 8 digits each. I decided i'd stick with an array of chars so i use double the memory but for me it's only being used 1 time.
Also when storing all the digits in a single int it over-complicates it and if anything it may even slow it down.
I don't have any speed tests but looking at the java version of BigInteger it seems like it's doing an awful lot of work.
For me I do the below
//Number = 100,000.00, Number Digits = 32, Decimal Digits = 2.
BigDecimal *decimal = new BigDecimal("100000.00", 32, 2);
decimal += "1000.99";
cout << decimal->GetValue(0x1 | 0x2) << endl; //Format and show decimals.
//Prints: 101,000.99
The computer hardware provides facility of storing integers and doing basic arithmetic over them; generally this is limited to integers in a range (e.g. up to 2^{64}-1). But larger integers can be supported via programs; below is one such method.
Using Positional Numeral System (e.g. the popular base-10 numeral system), any arbitrarily large integer can be represented as a sequence of digits in base B. So, such integers can be stored as an array of 32-bit integers, where each array-element is a digit in base B=2^{32}.
We already know how to represent integers using numeral-system with base B=10, and also how to perform basic arithmetic (add, subtract, multiply, divide etc) within this system. The algorithms for doing these operations are sometimes known as Schoolbook algorithms. We can apply (with some adjustments) these Schoolbook algorithms to any base B, and so can implement the same operations for our large integers in base B.
To apply these algorithms for any base B, we will need to understand them further and handle concerns like:
what is the range of various intermediate values produced during these algorithms.
what is the maximum carry produced by the iterative addition and multiplication.
how to estimate the next quotient-digit in long-division.
(Of course, there can be alternate algorithms for doing these operations).
Some algorithm/implementation details can be found here (initial chapters), here (written by me) and here.
subtract 48 from your string of integer and print to get number of large digit.
then perform the basic mathematical operation .
otherwise i will provide complete solution.