how to determine base of a number? - c++

Given a integer number and its reresentation in some arbitrary number system. The purpose is to find the base of the number system. For example, number is 10 and representation is 000010, then the base should be 10. Another example: number 21 representation is 0010101 then base is 2. One more example is: number is 6 and representation os 10100 then base is sqrt(2). Does anyone have any idea how to solve such problem?

___
\
number = /__ ( digit[i] * base ^ i )
You know number, you know all digit[i], you just have to find out base.
Whether solving this equation is simple or complex is left as an exercise.

I do not think that an answer can be given for every case. And I actually have a reason to think so! =)
Given a number x, with representation a_6 a_5 a_4 a_3 a_2 a_1 in base b, finding the base means solving
a_6 b^5 + a_5 b^4 + a_4 b^3 + a_3 b^2 + a_2 b^1 + a_1 = x.
This cannot be done generally, as shown by Abel and Ruffini. You might be luckier with shorter numbers, but if more than four digits are involved, the formulas are increasingly ugly.
There are quite a lot good approximation algorithms, though. See here.

For integers only, it's not that difficult (we can enumerate).
Let's look at 21 and its representation 10101.
1 * base^4 <= 21 < (1+1) * base^4
Let's generate the numbers for some bases:
base low high
2 16 32
3 81 162
More generally, we have N represented as ∑ ai * basei. Considering I the maximum power for which aI is non null we have:
a[I] * base^I <= N < (a[I] + 1) * base^I # does not matter if not representable
# Isolate base term
N / (a[I] + 1) < base^I <= N / a[I]
# Ith root
Ithroot( N / (a[I] + 1) ) < base <= Ithroot( N / a[I] )
# Or as a range
base in ] Ithroot(N / (a[I] + 1)), Ithroot( N / a[I] ) ]
In the case of an integer base, or if you have a list of known possible bases, I doubt they'll be many possibilities, so we can just try them out.
Note that it may be faster to actually take the Ithroot of N / (a[I] + 1) and iterate from here instead of computing the second one (which should be close enough)... but I'd need math review on that gut feeling.
If you really don't have any idea (trying to find a floating base)... well it's a bit more difficult I guess, but you can always refine the inequality (including one or two more terms) following the same property.

An algorithm like this should find the base if it is an integer, and should at least narrow down the choices for a non-integer base:
Let N be your integer and R be its representation in the mystery base.
Find the largest digit in R and call it r.
You know that your base is at least r + 1.
For base == (r+1, r+2, ...), let I represent R interpreted in base base
If I equals N, then base is your mystery base.
If I is less than N, try the next base.
If I is greater than N, then your base is somewhere between base - 1 and base.
It's a brute-force method, but it should work. You may also be able to speed it up a bit by incrementing base by more than one if I is significantly smaller than N.
Something else that might help speed things up, particularly in the case of a non-integer base: Remember that as several people have mentioned, a number in an arbitrary base can be expanded as a polynomial like
x = a[n]*base^n + a[n-1]*base^(n-1) + ... + a[2]*base^2 + a[1]*base + a[0]
When evaluating potential bases, you don't need to convert the entire number. Start by converting only the largest term, a[n]*base^n. If this is larger than x, then you already know your base is too big. Otherwise, add one term at a time (moving from most-significant to least-significant). That way, you don't waste time computing terms after you know your base is wrong.
Also, there is another quick way to eliminate a potential base. Notice that you can re-arrange the above polynomial expression and get
(x - a[0]) = a[n]*base^n + a[n-1]*base^(n-1) + ... + a[2]*base^2 + a[1]*base
or
(x - a[0]) = (a[n]*base^(n-1) + a[n-1]*base^(n-2) + ... + a[2]*base + a[1])*base
You know the values of x and a[0] (the "ones" digit, you can interpret it regardless of base). What this gives you the extra condition that (x - a[0]) must be evenly divisible by base (since all your a[] values are integers). If you calculate (x - a[0]) % base and get a non-zero result, then base cannot be the correct base.

Im not sure if this is efficiently solvable. I would just try to pick a random base, see if given the base the result is smaller, larger or equal to the number. In case its smaller, pick a larger base, in case its larger pick a smaller base, otherwise you have the correct base.

This should give you a starting point:
Create an equation from the number and representation, number 42 and represenation "0010203" becomes:
1 * base ^ 4 + 2 * base ^ 2 + 3 = 42
Now you solve the equation to get the value of base.

I'm thinking you will need try and check different bases. To be efficient, your starting base could be max(digit) + 1 as you know it won't be less than that. If that's too small double until you exceed, and then use binary search to narrow it down. This way your algorithm should run in O(log n) for normal situations.

Several of the other posts suggest that the solution might be found by finding the roots of the polynomial the number represents. These will, of course, generally work, though they will have a tendency to produce negative and complex bases as well as positive integers.
Another approach would be to cast this as an integer programming problem and solve using branch-and-bound.
But I suspect that the suggestion of guessing-and-testing will be quicker than any of the cleverer proposals.

Related

I'm having problem storing 10^18 as a float

I'm writing a program which must take in an integer, N, in range 3<=N<=10^18.
This is one of the operations I have to perform with N.
final=((0.5*(pow(2,0.5))*(pow((pow(((N/2)-0.5),2)+pow((N/2)-0.5,2)),0.5)))-0.5)*4;
N is such that final is guaranteed to contain an integer.
The problem is that I can't store N in a float type as it is too large. If I store it in long long int, the answer is wrong(I think its because the intermediate value of N / 2 is then rounded off).
The correct answer is
final = 2 * abs(N-1) - 2;
This can be verified, by removing the unnecessary parenthesis, regrouping same terms, distributing multiplication by constants, and using the following identities:
square root of squared is absolute value: pow(pow(x,2),0.5) = abs(x)
pow(k*x,0.5) = pow(k,0.5)*pow(x,0.5)
k*abs(x+y) = abs(k*x + k*y)
This is almost the same than the accepted answer. But the other answer is correct only for any N >= 1. It is wrong as soon as N-1<0, so in the range of the possible N values allowed by your question, for 0 <= N < 1
You can verify this with this online demo
Edit: following your edit of the question that changes the range and therewith exclude the problematic values, the accepted answer will do. I leave my answer here for the records, and for the sake of the maths ;-)
This formula looks intimidating, but can be simplified (if N > 1) to
final = 4 * (N / 2 - 1)
using the identity: (xa)1/a = x.
If N / 2 is supposed to be a floating-point division, not an integer one, the answer is
final = 2 * (N - 2)

Fast integer solution of x(x-1)/2 = c

Given a non-negative integer c, I need an efficient algorithm to find the largest integer x such that
x*(x-1)/2 <= c
Equivalently, I need an efficient and reliably accurate algorithm to compute:
x = floor((1 + sqrt(1 + 8*c))/2) (1)
For the sake of defineteness I tagged this question C++, so the answer should be a function written in that language. You can assume that c is an unsigned 32 bit int.
Also, if you can prove that (1) (or an equivalent expression involving floating-point arithmetic) always gives the right result, that's a valid answer too, since floating-point on modern processors can be faster than integer algorithms.
If you're willing to assume IEEE doubles with correct rounding for all operations including square root, then the expression that you wrote (plus a cast to double) gives the right answer on all inputs.
Here's an informal proof. Since c is a 32-bit unsigned integer being converted to a floating-point type with a 53-bit significand, 1 + 8*(double)c is exact, and sqrt(1 + 8*(double)c) is correctly rounded. 1 + sqrt(1 + 8*(double)c) is accurate to within one ulp, since the last term being less than 2**((32 + 3)/2) = 2**17.5 implies that the unit in the last place of the latter term is less than 1, and thus (1 + sqrt(1 + 8*(double)c))/2 is accurate to within one ulp, since division by 2 is exact.
The last piece of business is the floor. The problem cases here are when (1 + sqrt(1 + 8*(double)c))/2 is rounded up to an integer. This happens if and only if sqrt(...) rounds up to an odd integer. Since the argument of sqrt is an integer, the worst cases look like sqrt(z**2 - 1) for positive odd integers z, and we bound
z - sqrt(z**2 - 1) = z * (1 - sqrt(1 - 1/z**2)) >= 1/(2*z)
by Taylor expansion. Since z is less than 2**17.5, the gap to the nearest integer is at least 1/2**18.5 on a result of magnitude less than 2**17.5, which means that this error cannot result from a correctly rounded sqrt.
Adopting Yakk's simplification, we can write
(uint32_t)(0.5 + sqrt(0.25 + 2.0*c))
without further checking.
If we start with the quadratic formula, we quickly reach sqrt(1/4 + 2c), round up at 1/2 or higher.
Now, if you do that calculation in floating point, there can be inaccuracies.
There are two approaches to deal with these inaccuracies. The first would be to carefully determine how big they are, determine if the calculated value is close enough to a half for them to be important. If they aren't important, simply return the value. If they are, we can still bound the answer to being one of two values. Test those two values in integer math, and return.
However, we can do away with that careful bit, and note that sqrt(1/4 + 2c) is going to have an error less than 0.5 if the values are 32 bits, and we use doubles. (We cannot make this guarantee with floats, as by 2^31 the float cannot handle +0.5 without rounding).
In essense, we use the quadratic formula to reduce it to two possibilities, and then test those two.
uint64_t eval(uint64_t x) {
return x*(x-1)/2;
}
unsigned solve(unsigned c) {
double test = sqrt( 0.25 + 2.*c );
if ( eval(test+1.) <= c )
return test+1.
ASSERT( eval(test) <= c );
return test;
}
Note that converting a positive double to an integral type rounds towards 0. You can insert floors if you want.
This may be a bit tangential to your question. But, what caught my attention is the specific formula. You are trying to find the triangular root of Tn - 1 (where Tn is the nth triangular number).
I.e.:
Tn = n * (n + 1) / 2
and
Tn - n = Tn - 1 = n * (n - 1) / 2
From the nifty trick described here, for Tn we have:
n = int(sqrt(2 * c))
Looking for n such that Tn - 1 ≤ c in this case doesn't change the definition of n, for the same reason as in the original question.
Computationally, this saves a few operations, so it's theoretically faster than the exact solution (1). In reality, it's probably about the same.
Neither this solution or the one presented by David are as "exact" as your (1) though.
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(2 * c)) (red) vs Exact (white line)
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(0.25 + 2 * c) + 0.5 (red) vs Exact (white line)
My real point is that triangular numbers are a fun set of numbers that are connected to squares, pascal's triangle, Fibonacci numbers, et. al.
As such there are loads of identities around them which might be used to rearrange the problem in a way that didn't require a square root.
Of particular interest may be that Tn + Tn - 1 = n2
I'm assuming you know that you're working with a triangular number, but if you didn't realize that, searching for triangular roots yields a few questions such as this one which are along the same topic.

How to represent a number in base 2³²?

If I have some base 10 or base 16 number, how do I change it into base 232?
The reason I'm trying to do this, is for implementing BigInt as suggested by other members here Why to use higher base for implementing BigInt?
Will it be the same as integer (base 10) till 232? What will happen after it?
You are trying to find something of the form
a0 + a1 * (2^32) + a2 * (2^32)^2 + a3 * (2^32)^3 + ...
which is exactly the definition of a base-232 system, so ignore all the people that told you that your question doesn't make sense!
Anyway, what you are describing is known as base conversion. There are quick ways and there are easy ways to solve this. The quick ways are very complicated (there are entire chapters of books dedicated to the subject), and I'm not going to attempt to address them here (not least because I've never attempted to use them).
One easy way is to first implement two functions in your number system, multiplication and addition. (i.e. implement BigInt add(BigInt a, BigInt b) and BigInt mul(BigInt a, BigInt b)). Once you've solved that, you will notice that a base-10 number can be expressed as:
b0 + b1 * 10 + b2 * 10^2 + b3 * 10^3 + ...
which can also be written as:
b0 + 10 * (b1 + 10 * (b2 + 10 * (b3 + ...
so if you move left-to-right in your input string, you can peel off one base-10 digit at a time, and use your add and mul functions to accumulate into your BigInt:
BigInt a = 0;
for each digit b {
a = add(mul(a, 10), b);
}
Disclaimer: This method is not computationally efficient, but it will at least get you started.
Note: Converting from base-16 is much simpler, because 232 is an exact multiple of 16. So the conversion basically comes down to concatenating bits.
Let's suppose that we are talking about a base-10 number:
a[0]*10^0 + a[1]*10^1 + a[2]*10^2 + a[3]*10^3 + ... + a[N]*10^N
where each a[i] is a digit in the range 0 to 9 inclusive.
I'm going to assume that you can parse the string that is your input value and find the array a[]. Once you can do that, and assuming that you have already implemented your BigInt class with the + and * operators, then you are home. You can simply evaluate the expression above with an instance of your BigInt class.
You can evaluate this expression relatively efficiently using Horner's method.
I've just written this down off the top of my head, and I will bet that there are much more efficient base conversion schemes.
If I have some base 10 or base 16 number, how do I change it into base 2^32?
Just like you convert it to any other base. You want to write the number n as
n = a_0 + a_1 * 2^32 + a_2 * 2^64 + a_3 * 2^96 + ... + a_k * 2^(32 * k).
So, find the largest power of 2^32 that divides into n, subtract off the multiple of that power from n, and repeat with the difference.
However, are you sure that you asked the right question?
I suspect that you mean to be asking a different question. I suspect that you mean to ask: how do I parse a base-10 number into an instance of my BigInteger? That's easy. Code up your implementation, and make sure that you've implemented + and *. I'm completely agnostic to how you actually internally represent integers, but if you want to use base 2^32, fine, do it. Then:
BigInteger Parse(string s) {
BigInteger b = new BigInteger(0);
foreach(char c in s) { b = b * 10 + (int)c - (int)'0'; }
return b;
}
I'll leave it to you to translate this to C.
Base 16 is easy, since 232 is 168, an exact power. So, starting from the least significant digit, read 8 base-16 digits at a time, convert those digits into a 32-bit value, and that is the next base-232 "digit".
Base 10 is more difficult. As you say, if it's less than 232, then you just take the value as a single base-232 "digit". Otherwise, the simplest method I can think of is to use the Long Division algorithm to repeatedly divide the base-10 value by 232; at each stage, the remainder is the next base-232 "digit". Perhaps someone who knows more number theory than me could provide a better solution.
I think this is a totally reasonable thing to do.
What you are doing is representing a very large number (like an encryption key) in an array of 32 bit integers.
A base 16 representation is base 2^4, or a series of 4 bits at a time. If you are receiving a stream of base 16 "digits", fill in the low 4 bits of the first integer in your array, then the next lowest, until you read 8 "digits". Then go to the next element in the array.
long getBase16()
{
char cCurr;
switch (cCurr = getchar())
{
case 'A':
case 'a':
return 10;
case 'B':
case 'b':
return 11;
...
default:
return cCurr - '0';
}
}
void read_input(long * plBuffer)
{
long * plDst = plBuffer;
int iPos = 32;
*(++plDst) = 0x00;
long lDigit;
while (lDigit = getBase16())
{
if (!iPos)
{
*(++plDst) = 0x00;
iPos = 32;
}
*plDst >> 4;
iPos -= 4;
*plDst |= (lDigit & 0x0F) << 28
}
}
There is some fix up to do, like ending by shifting *plDst by iPos, and keeping track of the number of integers in your array.
There is also some work to convert from base 10.
But this is enough to get you started.

BigInt implementation - converting a string to binary representatio stored as unsigned int

I'm doing a BigInt implementation in C++ and I'm having a hard time figuring out how to create a converter from (and to) string (C string would suffice for now).
I implement the number as an array of unsigned int (so basically putting blocks of bits next to each other). I just can't figure out how to convert a string to this representation.
For example if usigned int would be 32b and i'd get a string of "4294967296", or "5000000000" or basically anything larger than what a 32b int can hold, how would I properly convert it to appropriate binary representation?
I know I'm missing something obvious, and I'm only asking for a push to the right direction. Thanks for help and sorry for asking such a silly question!
Well one way (not necessarily the most efficient) is to implement the usual arithmetic operators and then just do the following:
// (pseudo-code)
// String to BigInt
String s = ...;
BigInt x = 0;
while (!s.empty())
{
x *= 10;
x += s[0] - '0';
s.pop_front();
}
Output(x);
// (pseudo-code)
// BigInt to String
BigInt x = ...;
String s;
while (x > 0)
{
s += '0' + x % 10;
x /= 10;
}
Reverse(s);
Output(s);
If you wanted to do something trickier than you could try the following:
If input I is < 100 use above method.
Estimate D number of digits of I by bit length * 3 / 10.
Mod and Divide by factor F = 10 ^ (D/2), to get I = X*F + Y;
Execute recursively with I=X and I=Y
Implement and test the string-to-number algorithm using a builtin type such as int.
Implement a bignum class with operator+, operator*, and whatever else the above algorithm uses.
Now the algorithm should work unchanged with the bignum class.
Use the string conversion algo to debug the class, not the other way around.
Also, I'd encourage you to try and write at a high level, and not fall back on C constructs. C may be simpler, but usually does not make things easier.
Take a look at, for instance, mp_toradix and mp_read_radix in Michael Bromberger's MPI.
Note that repeated division by 10 (used in the above) performs very poorly, which shows up when you have very big integers. It's not the "be all and end all", but it's more than good enough for homework.
A divide and conquer approach is possible. Here is the gist. For instance, given the number 123456789, we can break it into pieces: 1234 56789, by dividing it by a power of 10. (You can think of these pieces of two large digits in base 100,000. Now performing the repeated division by 10 is now cheaper on the two pieces! Dividing 1234 by 10 three times and 56879 by 10 four times is cheaper than dividing 123456789 by 10 eight times.
Of course, a really large number can be recursively broken into more than two pieces.
Bruno Haibl's CLN (used in CLISP) does something like that and it is blazingly fast compared to MPI, in converting numbers with thousands of digits to numeric text.

Generating random numbers given a uniform random number generator

I was asked to generate a random number between a and b, inclusive, using random(0,1). random(0,1) generates a uniform random number between 0 and 1.
I answered
(a+(((1+random(0,1))*b))%(b-a))
My interviewer was not satisfied with my usage of b in this piece of the expression:
(((1+random(0,1))*b))
Then I tried changing my answer to:
int*z=(int*)malloc(sizeof(int));
(a+(((1+random(0,1))*(*z)))%(b-a));
Later the question changed to generate random(1,7) from random(1,5). I responded with:
A = rand(1,5)%3
B = (rand(1,5)+1)%3
C = (rand(1,5)+2)%3
rand(1,7) = rand(1,5)+ (A+B+C)%3
Were my answers correct?
I think you were confused between random integral-number generator and random floating-point number generator. In C++, rand() generates random integral number between 0 and 32K. Thus to generate a random number from 1 to 10, we write rand() % 10 + 1. As such, to generate a random number from integer a to integer b, we write rand() % (b - a + 1) + a.
The interviewer told you that you had a random generator from 0 to 1. It means floating-point number generator.
How to get the answer mathematically:
Shift the question to a simple form such that the lower bound is 0.
Scale the range by multiplication
Re-shift to the required range.
For example: to generate R such that
a <= R <= b.
Apply rule 1, we get a-a <= R - a <= b-a
0 <= R - a <= b - a.
Think R - a as R1. How to generate R1 such that R1 has range from 0 to (b-a)?
R1 = rand(0, 1) * (b-a) // by apply rule 2.
Now substitute R1 by R - a
R - a = rand(0,1) * (b-a) ==> R = a + rand(0,1) * (b-a)
==== 2nd question - without explanation ====
We have 1 <= R1 <= 5
==> 0 <= R1 - 1 <= 4
==> 0 <= (R1 - 1)/4 <= 1
==> 0 <= 6 * (R1 - 1)/4 <= 6
==> 1 <= 1 + 6 * (R1 - 1)/4 <= 7
Thus, Rand(1,7) = 1 + 6 * (rand(1,5) - 1) / 4
random(a,b) from random(0,1):
random(0,1)*(b-a)+a
random(c,d) from random(a,b):
(random(a,b)-a)/(b-a)*(d-c)+c
or, simplified for your case (a=1,b=5,c=1,d=7):
random(1,5) * 1.5 - 0.5
(note: I assume we're talking about float values and that rounding errors are negligible)
random(a,b) from random(c,d) = a + (b-a)*((random(c,d) - c)/(d-c))
No?
[random(0,1)*(b-a)] + a, i think would give random numbers b/w a&b.
([random(1,5)-1]/4)*6 + 1 should give the random nubers in the range (1,7)
I am not sure whether the above will destroy the uniform distribution..
Were my answers correct?
I think there are some problems.
First off, I'm assuming that random() returns a floating point value - otherwise to generate any useful distribution of a larger range of numbers using random(0,1) would require repeated calls to generate a pool of bits to work with.
I'm also going to assume C/C++ is the intended platform, since the question is tagged as such.
Given these assumptions, one problem with your answers is that C/C++ do not allow the use of the % operator on floating point types.
But even if we imagine that the % operator was replaced with a function that performed a modulo operation with floating point arguments in a reasonable way, there are still some problems. In your initial answer, if b (or the uninitialized *z allocated in your second attempt - I'm assuming this is a kind of bizarre way to get an arbitrary value, or is something else intended?) is zero (say the range given for a and b is (-5, 0)), then your result will be decidedly non-uniform. The result would always be b.
Finally, I'm certainly no statistician, but in your final answer (to generate random(1,7) from random(1.5)), I'm pretty sure that A+B+C would be non-uniform and would therefore introduce a bias in the result.
I think that there is a nicer answer to this. There is one value (probability -> zero) that this overflows and thus the modulus is there.
Take a random number x in the interval [0,1].
Increment your upper_bound which could be a parameter by one.
Calculate (int(random() / (1.0 / upper_bound)) % upper_bound) + 1 + lower_bound .
This ought to return a number in your desired interval.
given random(0,5) you can generate random(0,7) in the following way
A = random(0,5)*random(0,5)
now the range of A is 0-25
if we simply take the modulo 7 of A, we can get the random numbers but they wont be truly random as for values of A from 22-25, you will get 1-4 values after modulo operation, hence getting modulo 7 from range(0,25) will bias the output towards 1-4. This is because 7 does not evenly divide 25: the largest multiple of 7 less than or equal to 25 is 7*3=21 and it is the numbers in the incomplete range from 21-25 that will cause the bias.
Easiest way to fix this problem is to discard those numbers (from 22-25) and to keep tying again until a number in the suitable range come up.
Obviously, this is true when we assume that we want random integers.
However to get random float numbers we need to modify the range accordingly as described in above posts.