Fastest way of computing the power that a "power of 2" number used? - c++

What would be the quickest way to find the power of 2, that a certain number (that is a power of two) used?
I'm not very skilled at mathematics, so I'm not sure how best to describe it. But the function would look similar to x = 2^y where y is the output, and x is the input. Here's a truth table of what it'd look like if that helps explain it.
0 = f(1)
1 = f(2)
2 = f(4)
3 = f(8)
...
8 = f(256)
9 = f(512)
I've made a function that does this, but I fear it's not very efficient (or elegant for that matter). Would there be a simpler and more efficient way of doing this? I'm using this to compute what area of a texture is used to buffer how drawing is done, so it's called at least once for every drawn object. Here's the function I've made so far:
uint32 getThePowerOfTwo(uint32 value){
for(uint32 n = 0; n < 32; ++n){
if(value <= (1 << n)){
return n;
}
}
return 32; // should never be called
}

Building on woolstar's answer - I wonder if a binary search of a lookup table would be slightly faster? (and much nicer looking)...
int getThePowerOfTwo(int value) {
static constexpr int twos[] = {
1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7,
1<<8, 1<<9, 1<<10, 1<<11, 1<<12, 1<<13, 1<<14, 1<<15,
1<<16, 1<<17, 1<<18, 1<<19, 1<<20, 1<<21, 1<<22, 1<<23,
1<<24, 1<<25, 1<<26, 1<<27, 1<<28, 1<<29, 1<<30, 1<<31
};
return std::lower_bound(std::begin(twos), std::end(twos), value) - std::begin(twos);
}

This operation is sufficiently popular for processor vendors to come up with hardware support for it. Check out find first set. Compiler vendors offer specific functions for this, unfortunately there appears to be no standard how to name it. So if you need maximum performance you have to create compiler-dependent code:
# ifdef __GNUC__
return __builtin_ffs( x ) - 1; // GCC
#endif
#ifdef _MSC_VER
return CHAR_BIT * sizeof(x)-__lzcnt( x ); // Visual studio
#endif

If input value is only 2^n where n - integer, optimal way to find n is to use hash table with perfect hash function. In that case hash function for 32 unsigned integer could be defined as value % 37
template < size_t _Div >
std::array < uint8_t, _Div > build_hash()
{
std::array < uint8_t, _Div > hash_;
std::fill(hash_.begin(), hash_.end(), std::numeric_limits<uint8_t>::max());
for (size_t index_ = 0; index_ < 32; ++index_)
hash_[(1 << index_) % _Div] = index_;
return hash_;
}
uint8_t hash_log2(uint32_t value_)
{
static const std::array < uint8_t, 37 > hash_ = build_hash<37> ();
return hash_[value_%37];
}
Check
int main()
{
for (size_t index_ = 0; index_ < 32; ++index_)
assert(hash_log2(1 << index_) == index_);
}

Your version is just fine, but as you surmised, its O(n) which means it takes one step through the loop for every bit. You can do better. To take it to the next step, try doing the equivalent of a divide and conquer:
unsigned int log2(unsigned int value)
{
unsigned int val = 0 ;
unsigned int mask= 0xffff0000 ;
unsigned int step= 16 ;
while ( value )
{
if ( value & mask ) { val += step ; value &= ~ mask ; }
step /= 2 ;
if ( step ) { mask >>= step ; } else { mask >>= 1 ; }
}
return val ;
}
Since we're just hunting for the highest bit, we start out asking if any bits are on in the upper half of the word. If there are, we can throw away all the lower bits, else we just narrow the search down.
Since the question was marked C++, here's a version using templates that tries to figure out the initial mask & step:
template <typename T>
T log2(T val)
{
T result = 0 ;
T step= ( 4 * sizeof( T ) ) ; // half the number of bits
T mask= ~ 0L - ( ( 1L << ( 4 * sizeof( T )) ) -1 ) ;
while ( val && step )
{
if ( val & mask ) { result += step ; val >>= step ; }
mask >>= ( step + 1) / 2 ;
step /= 2 ;
}
return result ;
}
While performance of either version is going to be a blip on a modern x86 architecture, this has come up for me in embedded solutions, and in the last case where I was solving a bit search very similar to this, even the O(log N) was too slow for the interrupt and we had to use a combo of divide and conquer plus table lookup to squeeze the last few cycles out.

If you KNOW that it is indeed a power of two (which is easy enough to verify),
Try the variant below.
Full description here: http://sree.kotay.com/2007/04/shift-registers-and-de-bruijn-sequences_10.html
//table
static const int8 xs_KotayBits[32] = {
0, 1, 2, 16, 3, 6, 17, 21,
14, 4, 7, 9, 18, 11, 22, 26,
31, 15, 5, 20, 13, 8, 10, 25,
30, 19, 12, 24, 29, 23, 28, 27
};
//only works for powers of 2 inputs
static inline int32 xs_ILogPow2 (int32 v){
assert (v && (v&(v-1)==0));
//constant is binary 10 01010 11010 00110 01110 11111
return xs_KotayBits[(uint32(v)*uint32( 0x04ad19df ))>>27];
}

Related

Split number into sum of preselected other numbers

I have a number (for example 301, but can be even 10^11).
n = lenght of that number
I have to break it down to sum of max n components. Those components are 0^n, 1^n, 2^n, 3^n...9^n.
How can I do that?
Since you have 1^n included in your options, this becomes a really simple problem solvable through Greedy Approach.
Firstly, let me clarify that the way I understand this, for an input N of length n, you want some solution to this equation:
A.1^n + B.2^n + C.3^n + ... + H.8^n + I.9^n
There are infinitely many possible solutions (just by theory of equations). One possible solution can be found as follows:
a = [x ** n for x in range(0,10)]
consts = [0] * 10
ctr = 9
while N > 0:
consts[ctr] = N // a[ctr]
N = N % a[ctr]
ctr -= 1
return consts
This consts array will have the constant values for the above equation at respective indices.
PS: I've written this in python but you can translate it to C++ as you want. I saw that tag later. If you have any confusion regarding the code, feel free to ask in the comments.
You could use the following to determine the number of components.
int remain = 301; // Target number
int exp = 3; // Length of number (exponent)
int total = 0; // Number of components
bool first = true; // Used to determinie if plus sign is output
for ( int comp = 9; comp > 0; --comp )
{
int count = 0; // Number of times this component is needed
while ( pow(comp, exp) <= remain )
{
++total; // Count up total number of components
++count; // Count up number of times this component is used
remain -= int(pow(comp, exp));
}
if ( count ) // If count is not zero, component is used
{
if ( first )
{
first = false;
}
else
{
printf(" + ");
}
if ( count > 1 )
{
printf("%d(%d^%d)", count, comp, exp);
}
else
{
printf("%d^%d", comp, exp);
}
}
}
if ( total == exp )
{
printf("\r\nTarget number has %d components", exp);
}
else if ( total < exp )
{
printf("\r\nTarget number has less than %d components", exp);
}
else
{
printf("\r\nTarget number has more than %d components", exp);
}
Output for 301:
6^3 + 4^3 + 2(2^3) + 5(1^3)
Target number has more than 3 components
Output for 251:
6^3 + 3^3 + 2^3
Target number has 3 components

How can I convert the given code to a mathematical function?

I am trying to convert a recursion into a mathematical formula. The code snippet (c++) below is a simplified variant. The values look like an exponential function, however I am trying to find a closed form. For example, rec(8, 6) is 1287. As an assumption, I first assumed 6 to be constant and tried to find an exponential function for rec(?, 6). However, these results were extremely inaccurate. Does anyone have an idea how I could achieve a closed function?
Many thanks
int rec(const int a, const int b, const int c = 0, const int d = 0)
{
int result = 0;
if (d == a)
++result;
else
for (int i = 0; c + i < b; ++i)
result += rec(a, b, c + i, d + 1);
return result;
}
There is no universal method of converting a recursive function to a mathematical, closed function. In your case the answer is the number of "b-1" combinations from an "a"-element set, that is, a!/((a-b+1)!(b-1)!).
Proof: Your rec is equivalent to
int rec(const int a, const int b)
{
if (0 == a)
return 1;
int result = 0;
for (int i = 0; i < b; ++i)
result += rec(a - 1, b - i);
return result;
}
because all that matters is a-d and b-c.
If a==0, rec returns 1
If a>0, it returns the sum of rec(a-1, i) where i is in (0, b). This is true and only true for the combinations. If you ask me to prove that, I will, but the plain text format is not good for mathematical proofs
Edit: A general idea.: print all rec(i,j) as a table and try to spot the rule by looking at the table. I did:
for (int i = 0; i != 10 ; ++i){
for (int j = 0; j != 10; ++j){
cout << rec(i, j) << "\t";
}
cout << endl;
}
In this way I spotted that it is the Pascals_triangle
I will give a hint how you could have guessed the result yourself, with the stress on guess.
Take the sequence for rec(i, 6), i = 0,...,10. This is the sequence that you had already investigated. The answer is:
1 6 21 56 126 252 462 792 1287 2002
Now, insert it into Google (I don't know if other search engines can do the trick; Google certainly can). The first result should point you to this famous online encyclopedia:
https://oeis.org/A000389
This the Online Encyclopedia of Integer Sequences! Now, read the description:
A000389 Binomial coefficients C(n,5).
You may be not familiar with the C(*,*) notation, but you can easily understatand its "Binomial coefficient" description.
You certainly notice the relation between 6 in your function and 5 in the answer formula, but to be sure you can repeat your experiment for several other numbers other than 6.
The next step is to see how the A000389 sequence looks like:
0, 0, 0, 0, 0, 1, 6, 21, 56, 126, 252, 462, 792, 1287, 2002, ...
Well, C(i,j) is undefined (or zero, depending on the convention) if i < j. Aha! A000389 is this:
C(0,5) = 0, C(1,5) = 0, ... , C(4,5) = 0, C(5,5) = 1, C(6,5) = 6,...
This is your sequence if you started from the term of index 5, if we start counting from 0.
res(0,6) = C(5,5), res(1,6) = C(6,5),..., res(k, 6) = C(5+k, 5)
You can generalize it to
res(k, j) = C(k + j - 1, j -1)
and then start thinking how to prove it in a mathematically strict way. The usual method is by mathematical induction - I'll skip it.
This final result is already given by #Botond_Horwath, I just show to you the magic of Google search engine + the OEIS website. (If you know the latter, the former is redundant).

How to store output of very large Fibonacci number?

I am making a program for nth Fibonacci number. I made the following program using recursion and memoization.
The main problem is that the value of n can go up to 10000 which means that the Fibonacci number of 10000 would be more than 2000 digit long.
With a little bit of googling, I found that i could use arrays and store every digit of the solution in an element of the array but I am still not able to figure out how to implement this approach with my program.
#include<iostream>
using namespace std;
long long int memo[101000];
long long int n;
long long int fib(long long int n)
{
if(n==1 || n==2)
return 1;
if(memo[n]!=0)
return memo[n];
return memo[n] = fib(n-1) + fib(n-2);
}
int main()
{
cin>>n;
long long int ans = fib(n);
cout<<ans;
}
How do I implement that approach or if there is another method that can be used to achieve such large values?
One thing that I think should be pointed out is there's other ways to implement fib that are much easier for something like C++ to compute
consider the following pseudo code
function fib (n) {
let a = 0, b = 1, _;
while (n > 0) {
_ = a;
a = b;
b = b + _;
n = n - 1;
}
return a;
}
This doesn't require memoisation and you don't have to be concerned about blowing up your stack with too many recursive calls. Recursion is a really powerful looping construct but it's one of those fubu things that's best left to langs like Lisp, Scheme, Kotlin, Lua (and a few others) that support it so elegantly.
That's not to say tail call elimination is impossible in C++, but unless you're doing something to optimise/compile for it explicitly, I'm doubtful that whatever compiler you're using would support it by default.
As for computing the exceptionally large numbers, you'll have to either get creative doing adding The Hard Way or rely upon an arbitrary precision arithmetic library like GMP. I'm sure there's other libs for this too.
Adding The Hard Way™
Remember how you used to add big numbers when you were a little tater tot, fresh off the aluminum foil?
5-year-old math
1259601512351095520986368
+ 50695640938240596831104
---------------------------
?
Well you gotta add each column, right to left. And when a column overflows into the double digits, remember to carry that 1 over to the next column.
... <-001
1259601512351095520986368
+ 50695640938240596831104
---------------------------
... <-472
The 10,000th fibonacci number is thousands of digits long, so there's no way that's going to fit in any integer C++ provides out of the box. So without relying upon a library, you could use a string or an array of single-digit numbers. To output the final number, you'll have to convert it to a string tho.
(woflram alpha: fibonacci 10000)
Doing it this way, you'll perform a couple million single-digit additions; it might take a while, but it should be a breeze for any modern computer to handle. Time to get to work !
Here's an example in of a Bignum module in JavaScript
const Bignum =
{ fromInt: (n = 0) =>
n < 10
? [ n ]
: [ n % 10, ...Bignum.fromInt (n / 10 >> 0) ]
, fromString: (s = "0") =>
Array.from (s, Number) .reverse ()
, toString: (b) =>
b .reverse () .join ("")
, add: (b1, b2) =>
{
const len = Math.max (b1.length, b2.length)
let answer = []
let carry = 0
for (let i = 0; i < len; i = i + 1) {
const x = b1[i] || 0
const y = b2[i] || 0
const sum = x + y + carry
answer.push (sum % 10)
carry = sum / 10 >> 0
}
if (carry > 0) answer.push (carry)
return answer
}
}
We can verify that the Wolfram Alpha answer above is correct
const { fromInt, toString, add } =
Bignum
const bigfib = (n = 0) =>
{
let a = fromInt (0)
let b = fromInt (1)
let _
while (n > 0) {
_ = a
a = b
b = add (b, _)
n = n - 1
}
return toString (a)
}
bigfib (10000)
// "336447 ... 366875"
Expand the program below to run it in your browser
const Bignum =
{ fromInt: (n = 0) =>
n < 10
? [ n ]
: [ n % 10, ...Bignum.fromInt (n / 10 >> 0) ]
, fromString: (s = "0") =>
Array.from (s) .reverse ()
, toString: (b) =>
b .reverse () .join ("")
, add: (b1, b2) =>
{
const len = Math.max (b1.length, b2.length)
let answer = []
let carry = 0
for (let i = 0; i < len; i = i + 1) {
const x = b1[i] || 0
const y = b2[i] || 0
const sum = x + y + carry
answer.push (sum % 10)
carry = sum / 10 >> 0
}
if (carry > 0) answer.push (carry)
return answer
}
}
const { fromInt, toString, add } =
Bignum
const bigfib = (n = 0) =>
{
let a = fromInt (0)
let b = fromInt (1)
let _
while (n > 0) {
_ = a
a = b
b = add (b, _)
n = n - 1
}
return toString (a)
}
console.log (bigfib (10000))
Try not to use recursion for a simple problem like fibonacci. And if you'll only use it once, don't use an array to store all results. An array of 2 elements containing the 2 previous fibonacci numbers will be enough. In each step, you then only have to sum up those 2 numbers. How can you save 2 consecutive fibonacci numbers? Well, you know that when you have 2 consecutive integers one is even and one is odd. So you can use that property to know where to get/place a fibonacci number: for fib(i), if i is even (i%2 is 0) place it in the first element of the array (index 0), else (i%2 is then 1) place it in the second element(index 1). Why can you just place it there? Well when you're calculating fib(i), the value that is on the place fib(i) should go is fib(i-2) (because (i-2)%2 is the same as i%2). But you won't need fib(i-2) any more: fib(i+1) only needs fib(i-1)(that's still in the array) and fib(i)(that just got inserted in the array).
So you could replace the recursion calls with a for loop like this:
int fibonacci(int n){
if( n <= 0){
return 0;
}
int previous[] = {0, 1}; // start with fib(0) and fib(1)
for(int i = 2; i <= n; ++i){
// modulo can be implemented with bit operations(much faster): i % 2 = i & 1
previous[i&1] += previous[(i-1)&1]; //shorter way to say: previous[i&1] = previous[i&1] + previous[(i-1)&1]
}
//Result is in previous[n&1]
return previous[n&1];
}
Recursion is actually discommanded while programming because of the time(function calls) and ressources(stack) it consumes. So each time you use recursion, try to replace it with a loop and a stack with simple pop/push operations if needed to save the "current position" (in c++ one can use a vector). In the case of the fibonacci, the stack isn't even needed but if you are iterating over a tree datastructure for example you'll need a stack (depends on the implementation though). As I was looking for my solution, I saw #naomik provided a solution with the while loop. That one is fine too, but I prefer the array with the modulo operation (a bit shorter).
Now concerning the problem of the size long long int has, it can be solved by using external libraries that implement operations for big numbers (like the GMP library or Boost.multiprecision). But you could also create your own version of a BigInteger-like class from Java and implement the basic operations like the one I have. I've only implemented the addition in my example (try to implement the others they are quite similar).
The main idea is simple, a BigInt represents a big decimal number by cutting its little endian representation into pieces (I'll explain why little endian at the end). The length of those pieces depends on the base you choose. If you want to work with decimal representations, it will only work if your base is a power of 10: if you choose 10 as base each piece will represent one digit, if you choose 100 (= 10^2) as base each piece will represent two consecutive digits starting from the end(see little endian), if you choose 1000 as base (10^3) each piece will represent three consecutive digits, ... and so on. Let's say that you have base 100, 12765 will then be [65, 27, 1], 1789 will be [89, 17], 505 will be [5, 5] (= [05,5]), ... with base 1000: 12765 would be [765, 12], 1789 would be [789, 1], 505 would be [505]. It's not the most efficient, but it is the most intuitive (I think ...)
The addition is then a bit like the addition on paper we learned at school:
begin with the lowest piece of the BigInt
add it with the corresponding piece of the other one
the lowest piece of that sum(= the sum modulus the base) becomes the corresponding piece of the final result
the "bigger" pieces of that sum will be added ("carried") to the sum of the following pieces
go to step 2 with next piece
if no piece left, add the carry and the remaining bigger pieces of the other BigInt (if it has pieces left)
For example:
9542 + 1097855 = [42, 95] + [55, 78, 09, 1]
lowest piece = 42 and 55 --> 42 + 55 = 97 = [97]
---> lowest piece of result = 97 (no carry, carry = 0)
2nd piece = 95 and 78 --> (95+78) + 0 = 173 = [73, 1]
---> 2nd piece of final result = 73
---> remaining: [1] = 1 = carry (will be added to sum of following pieces)
no piece left in first `BigInt`!
--> add carry ( [1] ) and remaining pieces from second `BigInt`( [9, 1] ) to final result
--> first additional piece: 9 + 1 = 10 = [10] (no carry)
--> second additional piece: 1 + 0 = 1 = [1] (no carry)
==> 9542 + 1 097 855 = [42, 95] + [55, 78, 09, 1] = [97, 73, 10, 1] = 1 107 397
Here is a demo where I used the class above to calculate the fibonacci of 10000 (result is too big to copy here)
Good luck!
PS: Why little endian? For the ease of the implementation: it allows to use push_back when adding digits and iteration while implementing the operations will start from the first piece instead of the last piece in the array.

How to implement bitwise of 3-states bit operators to any-size of memory while maximizing size effectivity?

I can use 2 bits to every 3-states bit to implement it, [00 - first, 10 - second, 11\01 - third], but when the second bit is enabled then the first one is useless. In theory there's implementation that will outperform this method (The 2 bits I mentioned) in size by 37%. (Which is 1-log3(2))
The code I already tried:
#define uint unsigned int
uint set( uint x, uint place, uint value ) {
double result = ( double )x;
result /= pow( 3, place );
result += value - ( ( uint )result ) % 3;
return result * pow( 3, place );
}
uint get( uint x, uint place ) {
return ( ( uint )( ( ( double )x ) / pow( 3, place ) ) ) % 3;
}
int main( ) {
uint s = 0;
for ( int i = 0; i < 20; ++i )
s = set( s, i, i % 3 );
for ( int i = 0; i < 20; ++i )
printf( "get( s, %d ) -> %u\n", i, get( s, i ) );
}
Which prints:
get( s, 0 ) -> 0
get( s, 1 ) -> 1
get( s, 2 ) -> 2
get( s, 3 ) -> 0
...
get( s, 16 ) -> 1
get( s, 17 ) -> 2
get( s, 18 ) -> 0
get( s, 19 ) -> 1
This method saves 20% in size. (1-32/40 - 40 bits is required to do it with the first method I mentioned) In theory when the capacity grows the effectivity grows too. (Towards 37% of course)
How I can implement similar 3-states bitwise method to data of any-size and to maximize size effectivity? If I will use the data as array of uints and use this method on them, I will only get 20% effectivity. (Or lower if the data's size isn't multiplied by 4)
NOTE: The only thing I need is size effectivity, I don't care about speed performance. (Well except if you choose to use BigInteger instead of uint)
log32 is irrelevant.
The maximal possible efficiency for representing 3-valued units is log23 bits per unit, and the compression from 2 bits per unit is (2-log23))/2, which is roughly 20.75%. So 20% is pretty good.
You shouldn't use pow for integer exponentiation; aside from being slow, it is sometimes off by 1ULP which can be enough to make it off by 1 once you coerce it to an integer. But there's no need for all that work either; you can compress five 3-state values into a byte (35 = 243 < 256), and its straightforward to build a lookup table with 256 entries, one for each possible byte value.
With the LUT, you can extract a 3-state value from a large vector:
/* All error checking omitted */
uint8_t LUT[243][5] = { {0,0,0,0,0}, {1,0,0,0,0}, ... };
uint8_t extract(const uint8_t* data, int offset) {
return LUT[data[offset/5]][offset%5];
}
By the way, if a 1215-byte lookup-table is to be considered "big" (which seems odd, given that you're talking about a data vector of 1GB), it's easy enough to compress it by a factor of 4, although it complicates the table construction
/* All error checking omitted */
uint8_t LUT[] = { /* Left as an exercise */ };
uint8_t extract(const uint8_t* data, unsigned offset) {
unsigned index = data[offset/5] * 5 + offset % 5;
return (LUT[index / 4] >> (2 * (index % 4))) & 3;
}
In additional to rici's answer I want to post the code I did, which can help too: (simplified one)
uint8_t ft[ 5 ] = { 1, 3, 3 * 3, 3 * 3 * 3, 3 * 3 * 3 * 3 };
void set( uint8_t *data, int offset, int value ) {
uint8_t t1 = data[ offset / 5 ], t2 = ft[ offset % 5 ], u8 = t1 / t2;
u8 += value - u8 % 3;
data[ offset / 5 ] = t1 + ( u8 - t1 / t2 )*t2;
}
uint8_t get( uint8_t *data, int offset ) {
return data[ offset / 5 ] / ft[ offset % 5 ] % 3;
}
Instead of the big look up table, I re-implemented the pow method just safer and faster one, and added set function too.

calculating square root for implementating a fixed point function

i am trying to find the square root of a fixed point and i used the following Calculation to find an approximation of the square root using an integer algorithm. The algorithm is described in Wikipedia:
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
uint32_t SquareRoot(uint32_t a_nInput)
{
uint32_t op = a_nInput;
uint32_t res = 0;
uint32_t one = 1uL << 30; // The second-to-top bit is set: use 1u << 14 for uint16_t type; use 1uL<<30 for uint32_t type
// "one" starts at the highest power of four <= than the argument.
while (one > op)
{
one >>= 2;
}
while (one != 0)
{
if (op >= res + one)
{
op = op - (res + one);
res = res + 2 * one;
}
res >>= 1;
one >>= 2;
}
return res;
}
but i am unable to follow whats happening in the code what does the comment // "one" starts at the highest power of four <= than the argument. exactly means. Can someone please hint me whats happening in the code to calculate the square root of the argument a_nInput
Thanks much
Note how one is initialized.
uint32_t one = 1uL << 30;
That's 230, or 1073741824. Which is also 415.
This line:
one >>= 2;
Is equivalent to
one = one / 4;
So the pseudocode for what's happening is:
one = 415
if one is more than a_nInput
one = 414
if one is still more than a_nInput
one = 413
(and so on...)
Eventually, one will not be more than a_nInput.
// "one" starts at the highest power of four less than or equal to a_nInput