c++ bitwise addition , calculates the final number of representative bits - c++

I am currently developing an utility that handles all arithmetic operations on bitsets.
The bitset can auto-resize to fit any number, so it can perform addition / subtraction / division / multiplication and modulo on very big bitsets (i've come up to load a 700Mo movie inside to treat it just as a primitive integer)
I'm facing one problem though, i need for my addition to resize my bitset to fit the exact number of bits needed after an addition, but i couldn't come up with an absolute law to know exactly how many bits would be needed to store everything, knowing only the number of bits that both numbers are handling (either its representation is positive or negative, it doesn't matter)
I have the whole code that i can share with you to point out the problem if my question is not clear enough.
Thanks in advance.
jav974

but i couldn't come up with an absolute law to know exactly how many bits would be needed to store everything, knowing only the number of bits that both numbers are handling (either its representation is positive or negative, it doesn't matter)
Nor will you: there's no way given "only the number of bits that both numbers are handling".
In the case of same-signed numbers, you may need one extra bit - you can start at the most significant bit of the smaller number, and scan for 0s that would absorb the impact of a carry. For example:
1010111011101 +
..10111010101
..^ start here
As both numbers have a 1 here you need to scan left until you hit a 0 (in which case the result has the same number of digits as the larger input), or until you reach the most significant bit of the larger number (in which case there's one more digit in the result).
1001111011101 +
..10111010101
..^ start here
In this case where the longer input has a 0 at the starting location, you first need to do a right-moving scan to establish whether there'll be a carry from the right of that starting position before launching into the left-moving scan above.
When signs differ:
if one value has 2 or more digits less than the other, then the number of digits required in the result will be either the same or one less than the digits in the larger input
otherwise, you'll have to do more of the work for an addition just to work out how many digits the result needs.
This is assuming the sign bit is separate from the count of magnitude bits.

Finally the number of representative bits after an addition is at maximum the number of bits of the one that owns the most + 1.
Here is an explanation, using an unsigned char:
For max unsigned char :
11111111 (255)
+ 11111111 (255)
= 111111110 (510)
Naturally if max + max = (bits of max + 1) then for x and y between 0 and max the result bits is at max + 1 (very maximum)
this works the same way with signed integers.

Related

How do i subtract equal digit large numbers?

I have subtracted large numbers whose digits are unequal in length but I cant subtract numbers which are equal in length.I am taking a 2 string as input from the user which are numbers and I am converting it into integer array using str[i]-'0'.Till now I have swapped values of smaller length - bigger length integers.I have to do subtraction for 50 digit numbers.I can do subtraction of unequal length strings.But, in case of equal length numbers I am unable to do that.I cant use atoi function.What I have done is converted string to integer array and then I am doing subtraction using subtraction logic in sub_logic
HEre is my logic for subtraction of equal digit numbers.
Semi-answer because I can't think of a good reason to debug Asker's algorithm when a much simpler approach is viable.
This is your great opportunity to act like a child.
Leave the numbers as strings1.
Make them both the same size by prepending zeros to the shortest.
If the number being subtracted (the subtrahend) is the larger number, reverse the two numbers so you are always subtracting the smaller number from the larger. Make a note that you reversed the order of the operands.
Working right to left, subtract the digits and track any borrowing from the larger digits as required.
If you reversed the operand order, mark the result as negative.
1You do not have to parse the characters into numbers because no sane character encoding scrambles the ordering or positioning of the numbers. The C++ standard [lex.charset] requires this.
However, tracking borrowing may force you to use a wider storage this as you may find yourself with a number as high as 18 which the C++ standard does not guarantee a character can store. Overshooting what you can store in a digit and counting on another character to be there will not work if the numbers are at the end of the encoding. This is not a problem with every character encoding I know of, but not guaranteed.
You can most likely (assuming ASCII here) get away with
if (a[index] < b[index])
{
a[index - 1]--; // a > b as per step 3 above, so this can't happen with last digit.
a[index] += 10;
}
result[index] = '0' + a[index] - b[index];
for step 4. I believe this to be a good assumption for a school assignment, but I'd be more careful with production code to make sure a[index] += 10; won't overflow a char
The borrowed numbers will wind up sitting on top of ';' through 'a' and no one will care in terms of the math. It's destructive though. a is damaged as a result

How can 8 bytes hold 302 decimal digits? (Euler challenge 16)

c++ pow(2,1000) is normaly to big for double, but it's working. why?
So I've been learning C++ for couple weeks but the datatypes are still confusing me.
One small minor thing first: the code that 0xbadc0de posted in the other thread is not working for me.
First of all pow(2,1000) gives me this more than once instance of overloaded function "pow" matches the argument list.
I fixed it by changing pow(2,1000) -> pow(2.0,1000)
Seems fine, i run it and get this:
http://i.stack.imgur.com/bbRat.png
Instead of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
it is missing a lot of the values, what might be cause that?
But now for the real problem.
I'm wondering how can 302 digits long number fit a double (8 bytes)?
0xFFFFFFFFFFFFFFFF = 18446744073709551616 so how can the number be larger than that?
I think it has something to do with the floating point number encoding stuff.
Also what is the largest number that can possibly be stored in 8 bytes if it's not 0xFFFFFFFFFFFFFFFF?
Eight bytes contain 64 bits of information, so you can store 2^64 ~ 10^20 unique items using those bits. Those items can easily be interpreted as the integers from 0 to 2^64 - 1. So you cannot store 302 decimal digits in 8 bytes; most numbers between 0 and 10^303 - 1 cannot be so represented.
Floating point numbers can hold approximations to numbers with 302 decimal digits; this is because they store the mantissa and exponent separately. Numbers in this representation store a certain number of significant digits (15-16 for doubles, if I recall correctly) and an exponent (which can go into the hundreds, of memory serves). However, if a decimal is X bytes long, then it can only distinguish between 2^(8X) different values... unlikely enough for exactly representing integers with 302 decimal digits.
To represent such numbers, you must use many more bits: about 1000, actually, or 125 bytes.
It's called 'floating point' for a reason. The datatype contains a number in the standard sense, and an exponent which says where the decimal point belongs. That's why pow(2.0, 1000) works, and it's why you see a lot of zeroes. A floating point (or double, which is just a bigger floating point) number contains a fixed number of digits of precision. All the remaining digits end up being zero. Try pow(2.0, -1000) and you'll see the same situation in reverse.
The number of decimal digits of precision in a float (32 bits) is about 7, and for a double (64 bits) it's about 16 decimal digits.
Most systems nowadays use IEEE floating point, and I just linked to a really good description of it. Also, the article on the specific standard IEEE 754-1985 gives a detailed description of the bit layouts of various sizes of floating point number.
2.0 ^ 1000 mathematically will have a decimal (non-floating) output. IEEE floating point numbers, and in your case doubles (as the pow function takes in doubles and outputs a double) have 52 bits of the 64 bit representation allocated to the mantissa. If you do the math, 2^52 = 4,503,599,627,370,496. Because a floating point number can represent positive and negative numbers, really the integer representation will be ~ 2^51 = 2,251,799,813,685,248. Notice there are 16 digits. there are 16 quality (non-zero) digits in the output you see.
Essentially the pow function is going to perform the exponentiation, but once the exponentiation moves past ~2^51, it is going to begin losing precision. Ultimately it will hold precision for the top ~16 decimal digits, but all other digits right will be un-guaranteed.
Thus it is a floating point precision / rounding problem.
If you were strictly in unsigned integer land, the number would overflow after (2^64 - 1) = 18,446,744,073,709,551,616. What overflowing means, is that you would never actually see the number go ANY HIGHER than the one provided, infact I beleive the answer would be 0 from this operation. Once the answer goes beyond 2^64, the result register would be zero, and any multiply afterwords would be 0 * 2, which would always result in 0. I would have to try it.
The exact answer (as you show) can be obtained using a standard computer using a multi-precision libary. What these do is to emulate a larger bit computer by concatenating multiple of the smaller data types, and use algorithms to convert and print on the fly. Mathematica is one example of a math engine that implements an arbitrary precision math calculation library.
Floating point types can cover a much larger range than integer types of the same size, but with less precision.
They represent a number as:
a sign bit s to indicate positive or negative;
a mantissa m, a value between 1 and 2, giving a certain number of bits of precision;
an exponent e to indicate the scale of the number.
The value itself is calculated as m * pow(2,e), negated if the sign bit is set.
A standard double has a 53-bit mantissa, which gives about 16 decimal digits of precision.
So, if you need to represent an integer with more than (say) 64 bits of precision, then neither a 64-bit integer nor a 64-bit floating-point type will work. You will need either a large integer type, with as many bits as necessary to represent the values you're using, or (depending on the problem you're solving) some other representation such as a prime factorisation. No such type is available in standard C++, so you'll need to make your own.
If you want to calculate the range of the digits that can be hold by some bytes, it should be (2^(64bits - 1bit)) to (2^(64bits - 1bit) - 1).
Because the left most digit of the variable is for representing sign (+ and -).
So the range for negative side of the number should be : (2^(64bits - 1bit))
and the range for positive side of the number should be : (2^(64bits - 1bit) - 1)
there is -1 for the positive range because of 0(to avoid reputation of counting 0 for each side).
For example if we are calculating 64bits, the range should be ==> approximately [-9.223372e+18] to [9.223372e+18]

Conversion Big Integer <-> double in C++

I am writing my own long arithmetic library in C++ for fun and it is already pretty finished, I even implemented several Cryptogrphic algorithms with that library, but one important thing is still missing: I want to convert doubles (and floats/long doubles) into my number and vice versa. My numbers are represented as a variable sized array of unsigned long ints plus a sign bit.
I tried to find the answer with google, but the problem is that people rarely ever implement such things themselves, so I only find things about how to use Java BigInteger etc.
Conceptually, it is rather easy: I take the mantissa, shift it by the number of bits dictated by the exponent and set the sign. In the other direction I truncate it so that it fits into the mantissa and set the exponent depending on my log2 function.
But I am having a hard time to figure out the details, I could either play around with some bit patterns and cast it to a double, but I didn't find an elegant way to achieve that or I could "calculate" it by starting with 2, exponentiate, multiply etc, but that doesn't seem very efficient.
I would appreciate a solution that doesn't use any library calls because I am trying to avoid libraries for my project, otherwise I could just have used gmp, furthermore, I often have two solutions on several other occasions, one using inline assembler which is efficient and one that is more platform independent, so either answer is useful for me.
edit: I use uint64_t for my parts, but I would like to be able to change it depending on the machine, but I am willing to do some different implementations with some #ifdefs to achieve that.
I'm going to make non-portable assumption here: namely, that unsigned long long has more accurate digits than double. (This is true on all modern desktop systems that I know of.)
First, convert the most significant integer(s) into an unsigned long long. Then convert that to a double S. Let M be the number of integers less than those used in that first step. multiply S by(1ull << (sizeof(unsigned)*CHAR_BIT*M). (If shifting more than 63 bits, you will have to split those into seperate shifts and do some alrithmetic) Finally, if the original number was negative you multiply this result by -1.
This rounds a lot, but even with this rounding, due to the above assumption, no digits are lost that wouldn't be lost anyway with the conversion to a double. I think this is a similar process to what Mark Ransom said, but I'm not certain.
For converting from a double to a biginteger, first seperate the mantissa into a double M and the exponent into an int E, using frexp. Multiply M by UNSIGNED_MAX, and store that result in an unsigned R. If std::numeric_limits<double>::radix() is 2 (I don't know if it is or not for x86/x64), you can easily shift R left by E-(sizeof(unsigned)*CHAR_BIT) bits and you're done. Otherwise the result will instead beR*(E**(sizeof(unsigned)*CHAR_BIT)) (where ** means to the power of)
If performance is a concern, you can add an overload to your bignum class for multiplying by std::constant_integer<unsigned, 10>, which simply returns (LHS<<4)+(LHS<<2). You can similarly optimize other constants if you wish.
This blog post might help you Clarifying and optimizing Integer>>asFloat
Otherwise, you can yet have an idea of algorithm with this SO question Converting from unsigned long long to float with round to nearest even
You don't say explicitly, but I assume your library is integer only and the unsigned longs are 32 bit and binary (not decimal). The conversion to double is simple, so I'll tackle that first.
Start with a multiplier for the current piece; if the number is positive it will be 1.0, if negative it will be -1.0. For each of the unsigned long ints in your bignum, multiply by the current multiplier and add it to the result, then multiply your multiplier by pow(2.0, 32) (4294967296.0) for 32 bits or pow(2.0, 64) (18446744073709551616.0) for 64 bits.
You can optimize this process by working with only the 2 most significant values. You need to use 2 even if the number of bits in your integer type is larger than the precision of a double, since the number of used bits in the most significant value might only be 1. You can generate the multiplier by taking a power of 2 to the number of skipped bits, e.g. pow(2.0, most_significant_count*sizeof(bit_array[0])*8). You can't use a bit shift as given in another answer because it will overflow after the first value.
To convert from double, you can get the exponent and mantissa separated from each other with the frexp function. The mantissa will come as a floating point value between 0.5 and 1.0 so you'll want to multiply it by pow(2.0, 32) or pow(2.0, 64) to convert it to an integer, then adjust the exponent by -32 or -64 to compensate.
To go from a big integer to a double, just do it the same way you parse numbers. For example, you parse the number "531" as "1 + (3 * 10) + (5 * 100)". Compute each portion using doubles, starting with the least significant portion.
To go from a double to a big integer, do it the same way but in reverse starting with the most significant portion. So, to convert 531, you first see that it's more than 100 but less than 1000. You find the first digit by dividing by 100. Then you subtract to get the remainder of 31. Then find the next digit by dividing by 10. And so on.
Of course, you won't be using tens (unless you store your big integers as digits). Exactly how you break it apart depends on how your big integer class is constructed. For example, if it's uses 64-bit units, then you'll use powers of 2^64 instead of powers of 10.

Algorithm for dividing very large numbers

I need to write an algorithm(I cannot use any 3rd party library, because this is an assignment) to divide(integer division, floating parts are not important) very large numbers like 100 - 1000 digits. I found http://en.wikipedia.org/wiki/Fourier_division algorithm but I don't know if it's the right way to go. Do you have any suggestions?
1) check divisior < dividend, otherwise it's zero (because it will be an int division)
2) start from the left
3) get equal portion of digits from the dividend
4) if it's divisor portion is still bigger, increment digits of dividend portion by 1
5) multiply divisor by 1-9 through the loop
6) when it exceeds the dividend portion, previous multiplier is the answer
7) repeat steps 3 to 5 until reaching to the end
I'd imagine that dividing the 'long' way like in grade school would be a potential route. I'm assuming you are receiving the original number as a string, so what you do is parse each digit. Example:
Step 0:
/-----------------
13 | 453453453435....
Step 1: "How many times does 13 go into 4? 0
0
/-----------------
13 | 453453453435....
Step 2: "How many times does 13 go into 45? 3
03
/-----------------
13 | 453453453435....
- 39
--
6
Step 3: "How many times does 13 go into 63? 4
etc etc. With this strategy, you can have any number length and only really have to hold enough digits in memory for an int (divisor) and double (dividend). (Assuming I got those terms right). You store the result as the last digit in your result string.
When you hit a point where no digits remain and the calculation wont go in 1 or more times, you return your result, which is already formatted as a string (because it could be potentially larger than an int).
The easiest division algorithm to implement for large numbers is shift and subtract.
if numerator is less than denominator then finish
shift denominator as far left as possible while it is still smaller than numerator
set bit in quotient for amount shifted
subtract shifted denominator from numerator
repeat
the numerator is now the remainder
The shifting need not be literal. For example, you can write an algorithm to subtract a left shifted value from another value, instead of actually shifting the whole value left before subtracting. The same goes for comparison.
Long division is difficult to implement because one of the steps in long division is long division. If the divisor is an int, then you can do long division fairly easily.
Knuth, Donald, The Art of Computer Programming, ISBN 0-201-89684-2, Volume 2: Seminumerical Algorithms, Section 4.3.1: The Classical Algorithms
You should probably try something like long division, but using computer words instead of digits.
In a high-level language, it will be most convenient to consider your "digit" to be half the size of your largest fixed-precision integer. For the long division method, you will need to handle the case where your partial intermediate result may be off by one, since your fixed-precision division can only handle the most-significant part of your arbitrary-precision divisor.
There are faster and more complicated means of doing arbitrary-precision arithmetic. Check out the appropriate wikipedia page. In particular, the Newton-Raphson method, when implemented carefully, can ensure that the time performance of your division is within a constant factor of your arbitrary-precision multiplication.
Unless part of your assignment was to be completely original, I would go with the algorithm I (and I assume you) were taught in grade school for doing large division by hand.

I don't get Golomb / Rice coding: It does make more bits of the input, or does it?

Or, maybe, what I don't get is unary coding:
In Golomb, or Rice, coding, you split a number N into two parts by dividing it by another number M and then code the integer result of that division in unary and the remainder in binary.
In the Wikipedia example, they use 42 as N and 10 as M, so we end up with a quotient q of 4 (in unary: 1110) and a remainder r of 2 (in binary 010), so that the resulting message is 1110,010, or 8 bits (the comma can be skipped). The simple binary representation of 42 is 101010, or 6 bits.
To me, this seems due to the unary representation of q which always has to be more bits than binary.
Clearly, I'm missing some important point here. What is it?
The important point is that Golomb codes are not meant to be shorter than the shortest binary encoding for one particular number. Rather, by providing a specific kind of variable-length encoding, they reduce the average length per encoded value compared to fixed-width encoding, if the encoded values are from a large range, but the most common values are generally small (and hence are using only a small fraction of that range most of the time).
As an example, if you were to transmit integers in the range from 0 to 1000, but a large majority of the actual values were in the range between 0 and 10, in a fixed-width encoding, most of the transmitted codes would have leading 0s that contain no information:
To cover all values between 0 and 1000, you need a 10-bit wide encoding in fixed-width binary. Now, as most of your values would be below 10, at least the first 6 bits of most numbers would be 0 and would carry little information.
To rectify this with Golomb codes, you split the numbers by dividing them by 10 and encoding the quotient and the remainder separately. For most values, all that would have to be transmitted is the remainder which can be encoded using 4 bits at most (if you use truncated binary for the remainder it can be less). The quotient is then transmitted in unary, which encodes as a single 0 bit for all values below 10, as 10 for 10..19, 110 for 20..29 etc.
Now, for most of your values, you have reduced the message size to 5 bits max, but you are still able to transmit all values unambigously without separators.
This comes at a rather high cost for the larger values (for example, values in the range 990..999 need 100 bits for the quotient), which is why the coding is optimal for 2-sided geometric distributions.
The long runs of 1 bits in the quotients of larger values can be addressed with subsequent run-length encoding. However, if the quotients consume too much space in the resulting message, this could indicate that other codes might be more appropriate than Golomb/Rice.
One difference between the Golomb coding and binary code is that binary code is not a prefix code, which is a no-go for coding strings of arbitrarily large numbers (you cannot decide if 1010101010101010 is a concatenation of 10101010 and 10101010 or something else). Hence, they are not that easily comparable.
Second, the Golomb code is optimal for geometric distribution, in this case with parameter 2^(-1/10). The probability of 42 is some 0.3 %, so you get the idea about how important is this for the length of the output string.