Exotic base conversion without an intermediate value? - c++

I was just implementing an arbitrary base converter, and I had a moment of curiosity: is there a generalizable base conversion approach that works with any possible range of base inputs (excluding base 0), AND does not use an intermediate value in a more convenient base?
When writing a base conversion function, particularly one that goes from an arbitrary base to another arbitrary base, it can easily be implemented by first converting the number to decimal, and then re-converting it to the target number. You can also pretty easily bypass the intermediate value step arithmetically, provided you're in the range of base [1,10].
However, it seems to become trickier when you expand the range. Consider the following examples (primes seem like they might narrow the possible approaches a bit?):
base 7 to base 33
base -3 to base 13
base -16 to base 16
I didn't see much conversation about this question here, or elsewhere on Google; most either had a narrower range of bases or used an intermediate value. Any ideas?

The generalized algorithm to perform a base conversion of a value V with n digits in base B to an equivalent value V' in base B' is as follows:
Let d(0),d(1),...d(n-1) be the digits of V in base B. Using a translation table, we convert these digits to a sequence of digits d'(0),d'(1),...,d'(n) where each d'(i) is the original d(i) digit, but expressed in the new base B'
Then, V' is defined by:
V' = d'(0)*B^0 + d'(1)*B^1 + d'(2)*B^2+.....+d'(n-1)*B^(n-1)
Now here's the catch (and the reason you need an intermediate value): not only all values of the above formula have to be expressed in base B', but all the operations (addition and multiplication) have to be performed using aritmetic in base B'
For example: how to convert the number V = 201 expressed in base 3 to base 2 without converting it first to base 10.
The digits of V expressed in base 3 are d(0)=1, d(1)=0, d(2)=2
Converted to base 2, d'(0)=1, d'(1)=0, d'(2)=10
The original base is 3, but the general conversion formula need it to be expressed in the target base (2), so we'll use the value B=11
Then:
V' = d'(0)*B^0 + d'(1)*B^1 + d'(2)*B^2+.....+d'(n-1)*B^(n-1)
Putting values and operating in base 2
V' = 1 + 0 + 10*(11^2) = 1 + 10*11*11 = 1 + 10010 = 10011
So, 201(3 = 10011(2
Proof:
201(3 = 2*3^2 + 0 + 1 = 19
10011 = 1 + 2 + 16 = 19

A brief word about why base-10 is the intermmediate base. Because it's familiar.
School has taught us base-10 whether we knew it or not, from kindergarden to where you now. Our our language centers around base-10, whether articulated, writtened, or compiled. The computer languages by design use base-10 for their high-level human user interface, and common parameters to run-time functions regardless of the internal representation.
Because base-10 is in our everyday lives, we don't need to think as much on the sequence of base-10 artifacts ( 0 through 9 ), leaving our minds to plan other logic for the task at hand. When I designed this alogrythm in 1982, I chose 0-9A-Z artifact list method for simplicity as a type of exponent map, making it easier to leverage the language math operations and string interface, that all center around base-10. (Hmmm that base-10 thing again.)
So if you desire to really want to make your head hurt, by re-engineering a whole new user interface by over complicating the whole process, go for it. You'll still find yourself using base-10 paramters. (Hmmm)
The first effort involved logarithm (The interface to the math function is base-10. (hmmm)).
Largest_exponet_of_new_base = log(your base10 number) / log(the of base you are converting to)

Related

how is 128 bit integer formed in abseil library?

In Abseil library absl::uint128 big = absl::MakeUint128(1, 0);
this represents 2^64 , but i don't understand what does '1' and '0' mean here.
Can someone explain me how the number is actually formed ?
absl::MakeUint128(x, y); constructs a number equal to 2^64 * x + y
And see https://abseil.io/docs/cpp/guides/numeric
How? In any possible way. But there is a very simple way to make it.
You may already know how to do arithmetic with one digit numbers in base ten, right? Then you also know to use this arithmetic to get arithmetic of two digit numbers in base 10, right? Be aware that this then gives you an arithmetic of one digit numbers in base 100 (just consider '34' or '66' as a single symbols).
Your computer knows how to make arithmetic of one number digit in base 2^64, so it makes the same extension that you use to get in base 10 to get arithmetic of two digit numbers in base 2^64. This then leads to an arithmetic in base 2^128, or an arithmetic of 128 digits numbers in base 2.

Why to use higher base for implementing BigInt?

I'm trying to implement BigInt and have read some threads and articles regarding it, most of them suggested to use higher bases (256 or 232 or even 264).
Why higher bases are good for this purpose?
Other question I have is how am I supposed to convert a string into higher base (>16). I read there is no standard way, except for base64. And the last question, how do I use those higher bases. Some examples would be great.
The CPU cycles spent multiplying or adding a number that fits in a register tend to be identical. So you will get the least number of iterations, and best performance, by using up the whole register. That is, on a 32-bit architecture, make your base unit 32 bits, and on a 64-bit architecture, make it 64 bits. Otherwise--say, if you only fill up 8 bits of your 32 bit register--you are wasting cycles.
first answer stated this best. I personally use base 2^16 to keep from overflowing in multiplication. This allows any two digits to be multiplied together once without ever overflowing.
converting to a higher base requires a fast division method as well as packing the numbers as much as possible (assuming ur BigInt lib can handle multiple bases).
Consider base 10 -> base 2. The actual conversions would be 10 -> 10000 -> 32768 -> 2. This may seem slower, but converting from base 10 to 10000 is super fast. The amount of iterations for converting between 10000 and 32768 is very fast as there are very few digits to iterate over. Unpacking 32768 to 2 is also extremely fast.
So first pack the base to the largest base it can go to. To do this, just combine the digits together. This base should be <= 2^16 to prevent overflow.
Next, combine the digits together until they are >= the target base. From here, divide by the target base using a fast division algorithm that would normally overflow in any other scenario.
Some quick code
if (!packed) pack()
from = remake() //moves all of the digits on the current BigInt to a new one, O(1)
loop
addNode()
loop
remainder = 0
remainder = remainder*fromBase + from.digit
enter code here`exitwhen remainder >= toBase
set from = from.prev
exitwhen from.head
if (from.head) node.digit = remainder
else node.digit = from.fastdiv(fromBase, toBase, remainder)
exitwhen from.head
A look at fast division
loop
digit = remainder/divide
remainder = remainder%divide
//gather digits to divide again
loop
this = prev
if (head) return remainder
remainder = remainder*base + digit
exitwhen remainder >= divide
digit = 0
return remainder
Finally, unpack if you should unpack
Packing is just combining the digits together.
Example of base 10 to base 10000
4th*10 + 3rd
*10 + 2nd
*10 + 1st
???
You should have a Base class that stores alphabet + size for toString. If the Base is invalid, then just display the digits in a comma separated list.
All of your methods should be using the BigInt's current base, not some constant.
That's all?
From there, you'll be able to do things like
BigInt i = BigInt.convertString("1234567", Base["0123456789"])
Where the [] is overloaded and will either create a new base or return the already created base.
You'll also be able to do things like
i.toString()
i.base = Base["0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"]
i.base = 256
i.base = 45000
etc ^_^.
Also, if you are using integers and you want to be able to return BigInt remainders, division can get a bit tricky =P, but it's still pretty easy ^_^.
This is a BigInt library I coded in vjass (yes, for Warcraft 3, lol, don't judge me)
Things like TriggerEvaluate(evalBase) are just to keep the threads from crashing (evil operation limit).
Good luck :D

Number Base conversion and digits access?

The language is c++
I have to read in some data from 0 - n, n could theoretically be infinity. Based on the value of n, I have to change the numbers over from decimal to that base, even if its base 10000. So if I read in 5 numbers, n=5, I have to convert them over to base5.
That said, I am not sure how to do the conversion, but I'm sure I could get it reading over some article. But what really concerns me is when I convert over to whatever the n base might be what type would my result be to store in an array? Long?
Once I get the converted numbers in some array, how would I access each individual digit in each number for manipulation later?
Thanks.
Basically, most manipulations you're going to perform on a number are base-invariant. This means that you can add/sub/mul/div (And even perform power/root/log operations) two numbers without even knowing their base.
Think about it this way, the computer does nothing special when it adds two unsigned ints even thou all it's really working with is a 32 digits base-2 number.
You can probably make due with using ints (or whatever data type you need) and convert the base during display.
Conversion from decimal to base is done by division / modulo. x is the decimal number, b is the target base.
r = x % b
y = (x-r) : b
replace x by y and repeat from 1 until y becomes 0
the result are the r's, bottom up
Beneath of that you'll have to create a std::map with replacement patterns for the numbers in r, i. e. for base 16 some entries would be 10 -> A, 11 -> B. This implies, that you'll have to think about a representation form for very large n.
BTW: Consider a book about programming 101, conversion of decimal to bin / oct / hex is always explainend and easily adaptable for other bases.

Arbitrary precision arithmetic with GMP

I'm using the GMP library to make a Pi program, that will calculate about 7 trillion digits of Pi. Problem is, I can't figure out how many bits are needed to hold that many decimal places.
7 trillion digits can represent any of 10^(7 trillion) distinct numbers.
x bits can represent 2^x distinct numbers.
So you want to solve:
2^x = 10^7000000000000
Take the log-base-2 of both sides:
x = log2(10^7000000000000)
Recall that log(a^b) = b * log(a):
x = 7000000000000 * log2(10)
I get 23253496664212 bits. I would add one or two more just to be safe. Good luck finding the petabytes to hold them, though.
I suspect you are going to need a more interesting algorithm.
I wanna just correct one thing about what was written in the response answer:
Recall that log(a^b) = a * log(b)
well it is the opposite :
log(a^b) = b * log(a)
2^10 = 1024, so ten bits will represent slightly more than three digits. Since you're talking about 7 trillion digits, that would be something like 23 trillion bits, or about 3 terabytes, which is more than I could get on one drive from Costco last I visited.
You may be getting overambitious. I'd wonder about the I/O time to read and write entire disks for each operation.
(The mathematical way to solve it is to use logarithms, since a number that takes 7 trillion digits to represent has a log base 10 of about 7 trillion. Find the log of the number in the existing base, convert the base, and you've got your answer. For shorthand between base 2 and base 10, use ten bits==three digits, because that's not very far wrong. It says that the log base 10 of 2 is .3, when it's actually more like .301.)

Representing probability in C++

I'm trying to represent a simple set of 3 probabilities in C++. For example:
a = 0.1
b = 0.2
c = 0.7
(As far as I know probabilities must add up to 1)
My problem is that when I try to represent 0.7 in C++ as a float I end up with 0.69999999, which won't help when I am doing my calculations later. The same for 0.8, 0.80000001.
Is there a better way of representing numbers between 0.0 and 1.0 in C++?
Bear in mind that this relates to how the numbers are stored in memory so that when it comes to doing tests on the values they are correct, I'm not concerned with how they are display/printed out.
This has nothing to do with C++ and everything to do with how floating point numbers are represented in memory. You should never use the equality operator to compare floating point values, see here for better methods: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
My problem is that when I try to
represent 0.7 in C++ as a float I end
up with 0.69999999, which won't help
when I am doing my calculations later.
The same for 0.8, 0.80000001.
Is it really a problem? If you just need more precision, use a double instead of a float. That should get you about 15 digits precision, more than enough for most work.
Consider your source data. Is 0.7 really significantly more correct than 0.69999999?
If so, you could use a rational number library such as:
http://www.boost.org/doc/libs/1_40_0/libs/rational/index.html
If the problem is that probabilities add up to 1 by definition, then store them as a collection of numbers, omitting the last one. Infer the last value by subtracting the sum of the others from 1.
How much precision do you need? You might consider scaling the values and quantizing them in a fixed-point representation.
The tests you want to do with your numbers will be incorrect.
There is no exact floating point representation in a base-2 number system for a number like 0.1, because it is a infinte periodic number. Consider one third, that is exactly representable as 0.1 in a base-3 system, but 0.333... in the base-10 system.
So any test you do with a number 0.1 in floating point will be prone to be flawed.
A solution would be using rational numbers (boost has a rational lib), which will be always exact for, ermm, rationals, or use a selfmade base-10 system by multiplying the numbers with a power of ten.
If you really need the precision, and are sticking with rational numbers, I suppose you could go with a fixed point arithemtic. I've not done this before so I can't recommend any libraries.
Alternatively, you can set a threshold when comparing fp numbers, but you'd have to err on one side or another -- say
bool fp_cmp(float a, float b) {
return (a < b + epsilon);
}
Note that excess precision is automatically truncated in each calculation, so you should take care when operating at many different orders of magnitude in your algorithm. A contrived example to illustrate:
a = 15434355e10 + 22543634e10
b = a / 1e20 + 1.1534634
c = b * 1e20
versus
c = b + 1.1534634e20
The two results will be very different. Using the first method a lot of the precision of the first two numbers will be lost in the divide by 1e20. Assuming that the final value you want is on the order of 1e20, the second method will give you more precision.
If you only need a few digits of precision then just use an integer. If you need better precision then you'll have to look to different libraries that provide guarantees on precision.
The issue here is that floating point numbers are stored in base 2. You can not exactly represent a decimal in base 10 with a floating point number in base 2.
Lets step back a second. What does .1 mean? Or .7? They mean 1x10-1 and 7x10-1. If you're using binary for your number, instead of base 10 as we normally do, .1 means 1x2-1, or 1/2. .11 means 1x2-1 + 1x2-2, or 1/2+1/4, or 3/4.
Note how in this system, the denominator is always a power of 2. You cannot represent a number without a denominator that is a power of 2 in a finite number of digits. For instance, .1 (in decimal) means 1/10, but in binary that is an infinite repeating fraction, 0.000110011... (with the 0011 pattern repeating forever). This is similar to how in base 10, 1/3 is an infinite fraction, 0.3333....; base 10 can only represent numbers exactly with a denominator that is a multiple of powers of 2 and 5. (As an aside, base 12 and base 60 are actually really convenient bases, since 12 is divisible by 2, 3, and 4, and 60 is divisible by 2, 3, 4, and 5; but for some reason we use decimal anyhow, and we use binary in computers).
Since floating point numbers (or fixed point numbers) always have a finite number of digits, they cannot represent these infinite repeating fractions exactly. So, they either truncate or round the values to be as close as possible to the real value, but are not equal to the real value exactly. Once you start adding up these rounded values, you start getting more error. In decimal, if your representation of 1/3 is .333, then three copies of that will add up to .999, not 1.
There are four possible solutions. If all you care about is exactly representing decimal fractions like .1 and .7 (as in, you don't care that 1/3 will have the same problem you mention), then you can represent your numbers as decimal, for instance using binary coded decimal, and manipulate those. This is a common solution in finance, where many operations are defined in terms of decimal. This has the downside that you will need to implement all of your own arithmetic operations yourself, without the benefits of the computer's FPU, or find a decimal arithmetic library. This also, as mentioned, does not help with fractions that can't be represented exactly in decimal.
Another solution is to use fractions to represent your numbers. If you use fractions, with bignums (arbitrarily large numbers) for your numerators and denominators, you can represent any rational number that will fit in the memory of your computer. Again, the downside is that arithmetic will be slower, and you'll need to implement arithmetic yourself or use an existing library. This will solve your problem for all rational numbers, but if you wind up with a probability that is computed based on π or √2, you will still have the same issues with not being able to represent them exactly, and need to also use one of the later solutions.
A third solution, if all you care about is getting your numbers to add up to 1 exactly, is for events where you have n possibilities, to only store the values of n-1 of those probabilities, and compute the probability of the last as 1 minus the sum of the rest of the probabilities.
And a fourth solution is to do what you always need to remember when working with floating point numbers (or any inexact numbers, such as fractions being used to represent irrational numbers), and never compare two numbers for equality. Again in base 10, if you add up 3 copies of 1/3, you will wind up with .999. When you want to compare that number to 1, you have to instead compare to see if it is close enough to 1; check that the absolute value of the difference, 1-.999, is less than a threshold, such as .01.
Binary machines always round decimal fractions (except .0 and .5, .25, .75, etc) to values that don't have an exact representation in floating point. This has nothing to do with the language C++. There is no real way around it except to deal with it from a numerical perspective within your code.
As for actually producing the probabilities you seek:
float pr[3] = {0.1, 0.2, 0.7};
float accPr[3];
float prev = 0.0;
int i = 0;
for (i = 0; i < 3; i++) {
accPr[i] = prev + pr[i];
prev = accPr[i];
}
float frand = rand() / (1 + RAND_MAX);
for (i = 0; i < 2; i++) {
if (frand < accPr[i]) break;
}
return i;
I'm sorry to say there's not really an easy answer to your problem.
It falls into a field of study called "Numerical Analysis" that deals with these types of problems (which goes far beyond just making sure you don't check for equality between 2 floating point values). And by field of study, I mean there are a slew of books, journal articles, courses etc. dealing with it. There are people who do their PhD thesis on it.
All I can say is that that I'm thankful I don't have to deal with these issues very much, because the problems and the solutions are often very non-intuitive.
What you might need to do to deal with representing the numbers and calculations you're working on is very dependent on exactly what operations you're doing, the order of those operations and the range of values that you expect to deal with in those operations.
Depending on the requirements of your applications any one of several solutions could be best:
You live with the inherent lack of precision and use floats or doubles. You cannot test either for equality and this implies that you cannot test the sum of your probabilities for equality with 1.0.
As proposed before, you can use integers if you require a fixed precision. You represent 0.7 as 7, 0.1 as 1, 0.2 as 2 and they will add up perfectly to 10, i.e., 1.0. If you have to calculate with your probabilities, especially if you do division and multiplication, you need to round the results correctly. This will introduce an imprecision again.
Represent your numbers as fractions with a pair of integers (1,2) = 1/2 = 0.5. Precise, more flexible than 2) but you don't want to calculate with those.
You can go all the way and use a library that implements rational numbers (e.g. gmp). Precise, with arbitrary precision, you can calculate with it, but slow.
yeah, I'd scale the numbers (0-100)(0-1000) or whatever fixed size you need if you're worried about such things. It also makes for faster math computation in most cases. Back in the bad-old-days, we'd define entire cos/sine tables and other such bleh in integer form to reduce floating fuzz and increase computation speed.
I do find it a bit interesting that a "0.7" fuzzes like that on storage.