What is unoptimized about this code? [closed] - c++

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I wrote a solution for a question on interviewstreet, here is the problem description:
https://www.interviewstreet.com/challenges/dashboard/#problem/4e91289c38bfd
Here is the solution they have given:
https://gist.github.com/1285119
Here is the solution that I coded:
#include<iostream>
#include <string.h>
using namespace std;
#define LOOKUPTABLESIZE 10000000
int popCount[2*LOOKUPTABLESIZE];
int main()
{
int numberOfTests = 0;
cin >> numberOfTests;
for(int test = 0;test<numberOfTests;test++)
{
int startingNumber = 0;
int endingNumber = 0;
cin >> startingNumber >> endingNumber;
int numberOf1s = 0;
for(int number=startingNumber;number<=endingNumber;number++)
{
if(number >-LOOKUPTABLESIZE && number < LOOKUPTABLESIZE)
{
if(popCount[number+LOOKUPTABLESIZE] != 0)
{
numberOf1s += popCount[number+LOOKUPTABLESIZE];
}
else
{
popCount[number+LOOKUPTABLESIZE] =__builtin_popcount (number);
numberOf1s += popCount[number+LOOKUPTABLESIZE];
}
}
else
{
numberOf1s += __builtin_popcount (number);
}
}
cout << numberOf1s << endl;
}
}
Can you please point me what is wrong with my code? It can only pass 3/10 of tests. The time limit is 3 seconds.

What is unoptimized about this code?
The algorithm. You are looping
for(int number=startingNumber;number<=endingNumber;number++)
computing or looking up the number of 1-bits in each. That can take a while.
A good algorithm counts the number of 1-bits in all numbers 0 <= k < n in O(log n) time using a bit of math.
Here is an implementation counting 0s in decimal expansions, the modification to make it count 1-bits shouldn't be hard.

When looking at such a question, you need to break it down in simple pieces.
For example, suppose that you know how many 1s there are in all numbers [0, N] (let's call this ones(N)), then we have:
size_t ones(size_t N) { /* magic ! */ }
size_t count(size_t A, size_t B) {
return ones(B) - (A ? ones(A - 1) : 0);
}
This approach has the advantage that one is probably simpler to program that count, for example using recursion. As such, a first naive attempt would be:
// Naive
size_t naive_ones(size_t N) {
if (N == 0) { return 0; }
return __builtin_popcount(N) + naive_ones(N-1);
}
But this is likely to be too slow. Even when simply computing the value of count(B, A) we will be computing naive_ones(A-1) twice!
Fortunately, there is always memoization to assist here, and the transformation is quite trivial:
size_t memo_ones(size_t N) {
static std::deque<size_t> Memo(1, 0);
for (size_t i = Memo.size(); i <= N; ++i) {
Memo.push_back(Memo[i-1] + __builtin_popcnt(i));
}
return Memo[N];
}
It's likely that this helps, however the cost in terms of memory might be... crippling. Ugh. Imagine that for computing ones(1,000,000) we will occupy 8MB of memory on a 64bits computer! A sparser memoization could help (for example, only memoizing every 8th or 16th count):
// count number of ones in (A, B]
static unoptimized_count(size_t A, size_t B) {
size_t result = 0;
for (size_t i = A + 1; i <= B; ++i) {
result += __builtin_popcount(i);
}
return result;
}
// something like this... be wary it's not tested.
size_t memo16_ones(size_t N) {
static std::vector<size_t> Memo(1, 0);
size_t const n16 = N - (N % 16);
for (size_t i = Memo.size(); i*16 <= n16; ++i) {
Memo.push_back(Memo[i-1] + unoptimized_count(16*(i-1), 16*i);
}
return Memo[n16/16] + unoptimized_count(n16, N);
}
However, while it does reduce the memory cost, it does not solve the main speed issue: we must at least use __builtin_popcount B times! And for large values of B this is a killer.
The above solutions are mechanical, they did not require one ounce of thought. It turns out that interviews are not so much about writing code than they are about thinking.
Can we solve this problem more efficiently than dumbly enumerating all integers until B ?
Let's see what our brains (quite the amazing pattern machine) picks up when considering the first few entries:
N bin 1s ones(N)
0 0000 0 0
1 0001 1 1
2 0010 1 2
3 0011 2 4
4 0100 1 5
5 0101 2 7
6 0110 2 9
7 0111 3 12
8 1000 1 13
9 1001 2 15
10 1010 2 17
11 1011 3 20
12 1100 2 22
13 1101 3 25
14 1110 3 28
15 1111 3 32
Notice a pattern ? I do ;) The range 8-15 is built exactly like 0-7 but with one more 1 per line => it's like a transposition. And it's quite logical too, isn't it ?
Therefore, ones(15) - ones(7) = 8 + ones(7), ones(7) - ones(3) = 4 + ones(3) and ones(1) - ones(0) = 1 + ones(0).
Well, let's make this a formula:
Reminder: ones(N) = popcount(N) + ones(N-1) (almost) by definition
We now know that ones(2**n - 1) - ones(2**(n-1) - 1) = 2**(n-1) + ones(2**(n-1) - 1)
Let's make isolate ones(2**n), it's easier to deal with, note that popcount(2**n) = 1:
regroup: ones(2**n - 1) = 2**(n-1) + 2*ones(2**(n-1) - 1)
use the definition: ones(2**n) - 1 = 2**(n-1) + 2*ones(2**(n-1)) - 2
simplify: ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1)), with ones(1) = 1.
Quick sanity check:
1 = 2**0 => 1 (bottom)
2 = 2**1 => 2 = 2**0 - 1 + 2 * ones(1)
4 = 2**2 => 5 = 2**1 - 1 + 2 * ones(2)
8 = 2**3 => 13 = 2**2 - 1 + 2 * ones(4)
16 = 2**4 => 33 = 2**3 - 1 + 2 * ones(8)
Looks like it works!
We are not quite done though. A and B might not necessarily be powers of 2, and if we have to count all the way from 2**n to 2**n + 2**(n-1) that's still O(N)!
On the other hand, if we manage to express a number in base 2, then we should be able to leverage our newly acquired formula. The main advantage being than there are only log2(N) bits in the representation.
Let's pick an example and understand how it works: 13 = 8 + 4 + 1
1 -> 0001
4 -> 0100
8 -> 1000
13 -> 1101
... however, the count is not just merely the sum:
ones(13) != ones(8) + ones(4) + ones(1)
Let's express it in terms of the "transposition" strategy instead:
ones(13) - ones(8) = ones(5) + (13 - 8)
ones(5) - ones(4) = ones(1) + (5 - 4)
Okay, easy to do with a bit of recursion.
#include <cmath>
#include <iostream>
static double const Log2 = log(2);
// store ones(2**n) at P2Count[n]
static size_t P2Count[64] = {};
// Unfortunately, the conversion to double might lose some precision
// static size_t log2(size_t n) { return log(double(n - 1))/Log2 + 1; }
// __builtin_clz* returns the number of leading 0s
static size_t log2(size_t n) {
if (n == 0) { return 0; }
return sizeof(n) - __builtin_clzl(n) - 1;
}
static size_t ones(size_t n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
size_t const lg2 = log2(n);
size_t const np2 = 1ul << lg2; // "next" power of 2
if (np2 == n) { return P2Count[lg2]; }
size_t const pp2 = np2 / 2; // "previous" power of 2
return ones(pp2) + ones(n - pp2) + (n - pp2);
} // ones
// reminder: ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1))
void initP2Count() {
P2Count[0] = 1;
for (size_t i = 1; i != 64; ++i) {
P2Count[i] = (1ul << (i-1)) - 1 + 2 * P2Count[i-1];
}
} // initP2Count
size_t count(size_t const A, size_t const B) {
if (A == 0) { return ones(B); }
return ones(B) - ones(A - 1);
} // count
And a demonstration:
int main() {
// Init table
initP2Count();
std::cout << "0: " << P2Count[0] << ", 1: " << P2Count[1] << ", 2: " << P2Count[2] << ", 3: " << P2Count[3] << "\n";
for (size_t i = 0; i != 16; ++i) {
std::cout << i << ": " << ones(i) << "\n";
}
std::cout << "count(7, 14): " << count(7, 14) << "\n";
}
Victory!
Note: as Daniel Fisher noted, this fails to account for negative number (but assuming two-complement it can be inferred from their positive count).

Related

Number of steps to reduce a number in binary representation to 1

Given the binary representation of an integer as a string s, return the number of steps to reduce it to 1 under the following rules:
If the current number is even, you have to divide it by 2.
If the current number is odd, you have to add 1 to it.
It is guaranteed that you can always reach one for all test cases.
Step 1) 13 is odd, add 1 and obtain 14.
Step 2) 14 is even, divide by 2 and obtain 7.
Step 3) 7 is odd, add 1 and obtain 8.
Step 4) 8 is even, divide by 2 and obtain 4.
Step 5) 4 is even, divide by 2 and obtain 2.
Step 6) 2 is even, divide by 2 and obtain 1.
My input = 1111011110000011100000110001011011110010111001010111110001
Expected output = 85
My output = 81
For the above input, the output is supposed to be 85. But my output shows 81. For other test cases it
seems to be giving the right answer. I have been trying all possible debugs, but I am stuck.
#include <iostream>
#include <string.h>
#include <vector>
#include <bits/stdc++.h>
using namespace std;
int main()
{
string s =
"1111011110000011100000110001011011110010111001010111110001";
long int count = 0, size;
unsigned long long int dec = 0;
size = s.size();
// cout << s[size - 1] << endl;
for (int i = 0; i < size; i++)
{
// cout << pow(2, size - i - 1) << endl;
if (s[i] == '0')
continue;
// cout<<int(s[i])-48<<endl;
dec += (int(s[i]) - 48) * pow(2, size - 1 - i);
}
// cout << dec << endl;
// dec = 278675673186014705;
while (dec != 1)
{
if (dec % 2 == 0)
dec /= 2;
else
dec += 1;
count += 1;
}
cout << count;
return 0;
}
This line:
pow(2, size - 1 - i)
Can face precision errors as pow takes and returns doubles.
Luckily, for powers base 2 that won't overflow unsigned long longs, we can simply use bit shift (which is equivalent to pow(2, x)).
Replace that line with:
1LL<<(size - 1 - i)
So that it should look like this:
dec += (int(s[i]) - 48) * 1ULL<<(size - 1 - i);
And we will get the correct output of 85.
Note: as mentioned by #RSahu, you can remove (int(s[i]) - 48), as the case where int(s[i]) == '0' is already caught in an above if statement. Simply change the line to:
dec += 1ULL<<(size - 1 - i);
The core problem has already been pointed out in answer by #Ryan Zhang.
I want to offer some suggestions to improve your code and make it easier to debug.
The main function has two parts -- first part coverts a string to number and the second part computes the number of steps to get the number to 1. I suggest creating two helper functions. That will allow you to debug each piece separately.
int main()
{
string s = "1111011110000011100000110001011011110010111001010111110001";
unsigned long long int dec = stringToNumber(s);
cout << "Number: " << dec << endl;
// dec = 278675673186014705;
int count = getStepsTo1(dec);
cout << "Steps to 1: " << count << endl;
return 0;
}
Iterate over the string from right to left using std::string::reverse_iterator. That will obviate the need for size and use of size - i - 1. You can just use i.
unsigned long long stringToNumber(string const& s)
{
size_t i = 0;
unsigned long long num = 0;
for (auto it = s.rbegin(); it != s.rend(); ++it, ++i )
{
if (*it != '0')
{
num += 1ULL << i;
}
}
return num;
}
Here's the other helper function.
int getStepsTo1(unsigned long long num)
{
long int count = 0;
while (num != 1 )
{
if (num % 2 == 0)
num /= 2;
else
num += 1;
count += 1;
}
return count;
}
Working demo: https://ideone.com/yerRfK.

Convert a 74-bit integer to base 31

To generate a UFI number, I use a bitset of size 74. To perform step 2 of UFI generation, I need to convert this number:
9 444 732 987 799 592 368 290
(10000000000000000000000000000101000001000001010000011101011111100010100010)
into:
DFSTTM62QN6DTV1
by converting the first representation to base 31 and getting the equivalent chars from a table.
#define PAYLOAD_SIZE 74
// payload = binary of 9444732987799592368290
std::bitset<PAYLOAD_SIZE> bs_payload(payload);
/*
perform modulo 31 to obtain:
12(D), 14(F), 24(S), 25(T), 25, 19, 6, 2, 22, 20, 6, 12, 25, 27, 1
*/
Is there a way to perform the conversion on my bitset without using an external BigInteger library?
Edit: I finally done a BigInteger class even if the Cheers and hth. - Alf's solution works like a charm
To get modulo 31 of a number you just need to sum up the digits in base 32, just like how you calculate modulo 3 and 9 of a decimal number
unsigned mod31(std::bitset<74> b) {
unsigned mod = 0;
while (!b.none()) {
mod += (b & std::bitset<74>(0x1F)).to_ulong();
b >>= 5;
}
while (mod > 31)
mod = (mod >> 5) + (mod & 0x1F);
return mod;
}
You can speedup the modulo calculation by running the additions in parallel like how its done here. The similar technique can be used to calculate modulo 3, 5, 7, 15... and 231 - 1
C - Algorithm for Bitwise operation on Modulus for number of not a power of 2
Is there any easy way to do modulus of 2^32 - 1 operation?
Logic to check the number is divisible by 3 or not?
However since the question is actually about base conversion and not about modulo as the title said, you need to do a real division for this purpose. Notice 1/b is 0.(1) in base b + 1, we have
1/31 = 0.000010000100001000010000100001...32 = 0.(00001)32
and then N/31 can be calculated like this
N/31 = N×2-5 + N×2-10 + N×2-15 + ...
uint128_t result = 0;
while (x)
{
x >>= 5;
result += x;
}
Since both modulo and division use shift-by-5, you can also do both them together in a single loop.
However the tricky part here is how to round the quotient properly. The above method will work for most values except some between a multiple of 31 and the next power of 2. I've found the way to correct the result for values up to a few thousands but yet to find a generic way for all values
You can see the same shift-and-add method being used to divide by 10 and by 3. There are more examples in the famous Hacker's Delight with proper rounding. I didn't have enough time to read through the book to understand how they implement the result correction part so maybe I'll get back to this later. If anyone has any idea to do that it'll be grateful.
One suggestion is to do the division in fixed-point. Just shift the value left so that we have enough fractional part to round later
uint128_t result = 0;
const unsigned num_fraction = 125 - 75 // 125 and 75 are the nearest multiple of 5
// or maybe 128 - 74 will also work
uint128_t x = UFI_Number << num_fraction;
while (x)
{
x >>= 5;
result += x;
}
// shift the result back and add the fractional bit to round
result = (result >> num_fraction) + ((result >> (num_fraction - 1)) & 1)
Note that your result above is incorrect. I've confirmed the result is CEOPPJ62MK6CPR1 from both Yaniv Shaked's answer and Wolfram alpha unless you use different symbols for the digits
This code seems to work. To guarantee the result I think you need to do additional testing. E.g. first with small numbers where you can compute the result directly.
Edit: Oh, now I noticed you posted the required result digits, and they match. Means it's generally good, but still not tested for corner cases.
#include <assert.h>
#include <algorithm> // std::reverse
#include <bitset>
#include <vector>
#include <iostream>
using namespace std;
template< class Type > using ref_ = Type&;
namespace base31
{
void mul2( ref_<vector<int>> digits )
{
int carry = 0;
for( ref_<int> d : digits )
{
const int local_sum = 2*d + carry;
d = local_sum % 31;
carry = local_sum / 31;
}
if( carry != 0 )
{
digits.push_back( carry );
}
}
void add1( ref_<vector<int>> digits )
{
int carry = 1;
for( ref_<int> d : digits )
{
const int local_sum = d + carry;
d = local_sum % 31;
carry = local_sum / 31;
}
if( carry != 0 )
{
digits.push_back( carry );
}
}
void divmod2( ref_<vector<int>> digits, ref_<int> mod )
{
int carry = 0;
for( int i = int( digits.size() ) - 1; i >= 0; --i )
{
ref_<int> d = digits[i];
const int divisor = d + 31*carry;
carry = divisor % 2;
d = divisor/2;
}
mod = carry;
if( digits.size() > 0 and digits.back() == 0 )
{
digits.resize( digits.size() - 1 );
}
}
}
int main() {
bitset<74> bits(
"10000000000000000000000000000101000001000001010000011101011111100010100010"
);
vector<int> reversed_binary;
for( const char ch : bits.to_string() ) { reversed_binary.push_back( ch - '0' ); }
vector<int> base31;
for( const int bit : reversed_binary )
{
base31::mul2( base31 );
if( bit != 0 )
{
base31::add1( base31 );
}
}
{ // Check the conversion to base31 by converting back to base 2, roundtrip:
vector<int> temp31 = base31;
int mod;
vector<int> base2;
while( temp31.size() > 0 )
{
base31::divmod2( temp31, mod );
base2.push_back( mod );
}
reverse( base2.begin(), base2.end() );
cout << "Original : " << bits.to_string() << endl;
cout << "Reconstituted: ";
string s;
for( const int bit : base2 ) { s += bit + '0'; cout << bit; }; cout << endl;
assert( s == bits.to_string() );
}
cout << "Base 31 digits (msd to lsd order): ";
for( int i = int( base31.size() ) - 1; i >= 0; --i )
{
cout << base31[i] << ' ';
}
cout << endl;
cout << "Mod 31 = " << base31[0] << endl;
}
Results with MinGW g++:
Original : 10000000000000000000000000000101000001000001010000011101011111100010100010
Reconstituted: 10000000000000000000000000000101000001000001010000011101011111100010100010
Base 31 digits (msd to lsd order): 12 14 24 25 25 19 6 2 22 20 6 12 25 27 1
Mod 31 = 1
I did not compile the psuedo code, but you can get the generate understanding of how to convert the number:
// Array for conversion of value to base-31 characters:
char base31Characters[] =
{
'0',
'1',
'2',
...
'X',
'Y'
};
void printUFINumber(__int128_t number)
{
string result = "";
while (number != 0)
{
var mod = number % 31;
result = base31Characters[mod] + result;
number = number / 31;
}
cout << number;
}

Algorithm for Combinations of given numbers with repetition? C++

So I N - numbers I have to input, and I got M - numbers of places for those numbers and I need to find all combinations with repetition of given numbers.
Here is example:
Let's say that N is 3(I Have to input 3 numbers), and M is 4.
For example let's input numbers: 6 11 and 533.
This should be result
6,6,6,6
6,6,6,11
6,6,6,533
6,6,11,6
...
533,533,533,533
I know how to do that manualy when I know how much is N and M:
In example where N is 3 and M is 4:
int main()
{
int N = 3;
int M = 4;
int *numbers = new int[N + 1];
for (int i = 0; i < N; i++)
cin >> numbers[i];
for (int a = 0; a < N; a++)
for (int b = 0; b < N; b++)
for (int c = 0; c < N; c++)
for (int d = 0; d < N; d++)
{
cout << numbers[a] << " " << numbers[b] << " " << numbers[c] << " " << numbers[d] << endl;
}
return 0;
}
But how can I make algorithm so I can enter N and M via std::cin and I get correct resut?
Thanks.
First one short tip: don't use "new" or C-style arrays in C++ when we have RAII and much faster data structures.
For the solution to your problem I would suggest making separate function with recursion. You said you know how to do it manually so the first step in making it into algorithm is to tear down you manual solution step by step. For this problem when you solve it by hand you basically start with array of all first numbers and then for last position you just loop through available numbers. Then you go to the second last position and again loop through available numbers just now with the difference that for every number there you must also repeat the last spot number loop. Here is the recursion. For every "n"th position you must loop through available numbers and for every call the same function for "n+1"th number.
Here is a simplified solution, leaving out the input handling and exact print to keep code shorter and more focused on the problem:
#include <vector>
#include <iostream>
void printCombinations(const std::vector<int>& numbers, unsigned size, std::vector<int>& line) {
for (unsigned i = 0; i < numbers.size(); i++) {
line.push_back(numbers[i]);
if (size <= 1) { // Condition that prevents infinite loop in recursion
for (const auto& j : line)
std::cout << j << ","; // Simplified print to keep code shorter
std::cout << std::endl;
line.erase(line.end() - 1);
} else {
printCombinations(numbers, size - 1, line); // Recursion happens here
line.erase(line.end() - 1);
}
}
}
int main() {
std::vector<int> numbers = {6, 11, 533};
unsigned size = 4;
std::vector<int> line;
printCombinations(numbers, size, line);
return 0;
}
If you have any questions feel free to ask.
Totally there is no need for recursion here. This is a typical job for dynamic programming. Just get the first solution right for n = 1 (1 slot is available) which means the answer is [[6],[11],[533]] and then move on one by one by relying on the one previously memoized solution.
Sorry that i am not fluent in C, yet in JS this is the solution. I hope it helps.
function combosOfN(a,n){
var res = {};
for(var i = 1; i <= n; i++) res[i] = res[i-1] ? res[i-1].reduce((r,e) => r.concat(a.map(n => e.concat(n))),[])
: a.map(e => [e]);
return res[n];
}
var arr = [6,11,533],
n = 4;
console.log(JSON.stringify(combosOfN(arr,n)));
Normally the easiest way to do dynamic nested for loops is to create your own stack and use recursion.
#include <iostream>
#include <vector>
void printCombinations(int sampleCount, const std::vector<int>& options, std::vector<int>& numbersToPrint) {
if (numbersToPrint.size() == sampleCount) {
// got all the numbers we need, print them.
for (int number : numbersToPrint) {
std::cout << number << " ";
}
std::cout << "\n";
}
else {
// Add a new number, iterate over all possibilities
numbersToPrint.push_back(0);
for (int number : options) {
numbersToPrint.back() = number;
printCombinations(sampleCount, options, numbersToPrint);
}
numbersToPrint.pop_back();
}
}
void printCombinations(int sampleCount, const std::vector<int>& options) {
std::vector<int> stack;
printCombinations(sampleCount, options, stack);
}
int main()
{
printCombinations(3, {1,2,3});
}
output
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
1 3 1
1 3 2
1 3 3
2 1 1
2 1 2
2 1 3
2 2 1
2 2 2
2 2 3
2 3 1
2 3 2
2 3 3
3 1 1
3 1 2
3 1 3
3 2 1
3 2 2
3 2 3
3 3 1
3 3 2
3 3 3
Here is an algorithm to solve this, that does't use recursion.
Let's say n=2 and m=3. Consider the following sequence that corresponds to these values:
000
001
010
011
100
101
110
111
The meaning of this is that when you see a 0 you take the first number, and when you see a 1 you take the second number. So given the input numbers [5, 7], then 000 = 555, 001=557, 010=575 etc.
The sequence above looks identical to representing numbers from 0 to 7 in base 2. Basically, if you go from 0 to 7 and represent the numbers in base 2, you have the sequence above.
If you take n=3, m=4 then you need to work in base 3:
0000
0001
0002
0010
0011
0012
....
So you go over all the numbers from 0 to 63 (4^3-1), represent them in base 3 and follow the coding: 0 = first number, 1 = second number, 2 = third number and 3 = fourth number.
For the general case, you go from 0 to M^N-1, represent each number in base N, and apply the coding 0 = first number, etc.
Here is some sample code:
#include <stdio.h>
#include <math.h>
void convert_to_base(int number, char result[], int base, int number_of_digits) {
for (int i = number_of_digits - 1; i >= 0; i--) {
int remainder = number % base;
number = number / base;
result[i] = '0' + remainder;
}
}
int main() {
int n = 2, m = 3;
int num = pow(n, m) - 1;
for (int i = 0; i <= num; i++) {
char str[33];
convert_to_base(i, str, n, m);
printf("%s\n", str);
}
return 0;
}
Output:
000
001
010
011
100
101
110
111

Finding Hamming Numbers - not code or distance

I'm currently learning C++.
I am looking for Hamming numbers (numbers whose prime divisors are less or equal to 5).
When I input a number n, the program should output the n-th Hamming number.
Following numbers are input, and output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 ...
Finding Hamming numbers looks easy, but increasing the input number increases run time cost exponentially.
If I input over 1000, it almost costs over 1 second,
and over 1200, it almost costs over 5 seconds.
This is the code I wrote:
while (th > 1)
{
h++;
x = h;
while (x % 2 == 0)
x /= 2;
while (x % 3 == 0)
x /= 3;
while (x % 5 == 0)
x /= 5;
if (x == 1)
th--;
}
So I would like to know how I can find the answer faster.
This algorithm doesn't seem to be very good.
Thanks in advance.
Your code is good if you want to check whether one particular number is a hamming number. When you want to build a list of hamming numbers, it is inefficient.
You can use a bottom-up approach: Start with 1 and then recursively multiply that with 2, 3, and 5 to get all hamming numbers up to a certain limit. You have to take care of duplicates, because you can get to 6 by way of 2·3 and 3·2. A set can take care of that.
The code below will generate all hamming numbers that fit into a 32-bit unsigned int. It fills a set by "spreading" to all hamming numbers. Then it constructs a sorted vector from the set, which you can use to find a hamming number at a certain index:
#include <iostream>
#include <algorithm>
#include <set>
#include <vector>
typedef unsigned int uint;
const uint umax = 0xffffffff;
void spread(std::set<uint> &hamming, uint n)
{
if (hamming.find(n) == hamming.end()) {
hamming.insert(n);
if (n < umax / 2) spread(hamming, n * 2);
if (n < umax / 3) spread(hamming, n * 3);
if (n < umax / 5) spread(hamming, n * 5);
}
}
int main()
{
std::set<uint> hamming;
spread(hamming, 1);
std::vector<uint> ordered(hamming.begin(), hamming.end());
for (size_t i = 0; i < ordered.size(); i++) {
std::cout << i << ' ' << ordered[i] << '\n';
}
return 0;
}
This code is faster than your linear method even if you end up creating more hamming numbers than you need.
You don't even need a set if you make sure that you don't construct a number twice. Every hamming number can be written as h = 2^n2 + 3^n3 + 5^n5, so if you find a means to iterate through these uniquely, you're done:
#include <iostream>
#include <algorithm>
#include <set>
#include <vector>
typedef unsigned int uint;
int main()
{
const uint umax = 0xffffffff;
std::vector<uint> hamming;
for (uint k = 1;; k *= 2) {
for (uint l = k;; l *= 3) {
for (uint m = l;; m *= 5) {
hamming.push_back(m);
if (m > umax / 5) break;
}
if (l > umax / 3) break;
}
if (k > umax / 2) break;
}
std::sort(hamming.begin(), hamming.end());
for (size_t i = 0; i < hamming.size(); i++) {
std::cout << i << ' ' << hamming[i] << '\n';
}
return 0;
}
The strange break syntax for the loops is required, because we have to check the size before the overflow. If umax*5 were guananteed not to overflow, these conditions could be written in the condition part of the loop.
The code examples in the Rosetta Code link Koshinae posted use similar strategies, but I'm surprised how lengthy some of them are.
In this link you can find two different solutions for finding the nth hamming number. The second method is the optimized one which can get the result in a few seconds.
/* Function to get the nth ugly number*/
unsigned getNthUglyNo(unsigned n)
{
unsigned ugly[n]; // To store ugly numbers
unsigned i2 = 0, i3 = 0, i5 = 0;
unsigned next_multiple_of_2 = 2;
unsigned next_multiple_of_3 = 3;
unsigned next_multiple_of_5 = 5;
unsigned next_ugly_no = 1;
ugly[0] = 1;
for (int i=1; i<n; i++)
{
next_ugly_no = min(next_multiple_of_2,
min(next_multiple_of_3,
next_multiple_of_5));
ugly[i] = next_ugly_no;
if (next_ugly_no == next_multiple_of_2)
{
i2 = i2+1;
next_multiple_of_2 = ugly[i2]*2;
}
if (next_ugly_no == next_multiple_of_3)
{
i3 = i3+1;
next_multiple_of_3 = ugly[i3]*3;
}
if (next_ugly_no == next_multiple_of_5)
{
i5 = i5+1;
next_multiple_of_5 = ugly[i5]*5;
}
} /*End of for loop (i=1; i<n; i++) */
return next_ugly_no;
}

Most efficient way to calculate lexicographic index

Can anybody find any potentially more efficient algorithms for accomplishing the following task?:
For any given permutation of the integers 0 thru 7, return the index which describes the permutation lexicographically (indexed from 0, not 1).
For example,
The array 0 1 2 3 4 5 6 7 should return an index of 0.
The array 0 1 2 3 4 5 7 6 should return an index of 1.
The array 0 1 2 3 4 6 5 7 should return an index of 2.
The array 1 0 2 3 4 5 6 7 should return an index of 5039 (that's 7!-1 or factorial(7)-1).
The array 7 6 5 4 3 2 1 0 should return an index of 40319 (that's 8!-1). This is the maximum possible return value.
My current code looks like this:
int lexic_ix(int* A){
int value = 0;
for(int i=0 ; i<7 ; i++){
int x = A[i];
for(int j=0 ; j<i ; j++)
if(A[j]<A[i]) x--;
value += x*factorial(7-i); // actual unrolled version doesn't have a function call
}
return value;
}
I'm wondering if there's any way I can reduce the number of operations by removing that inner loop, or if I can reduce conditional branching in any way (other than unrolling - my current code is actually an unrolled version of the above), or if there are any clever bitwise hacks or filthy C tricks to help.
I already tried replacing
if(A[j]<A[i]) x--;
with
x -= (A[j]<A[i]);
and I also tried
x = A[j]<A[i] ? x-1 : x;
Both replacements actually led to worse performance.
And before anyone says it - YES this is a huge performance bottleneck: currently about 61% of the program's runtime is spent in this function, and NO, I don't want to have a table of precomputed values.
Aside from those, any suggestions are welcome.
Don't know if this helps but here's an other solution :
int lexic_ix(int* A, int n){ //n = last index = number of digits - 1
int value = 0;
int x = 0;
for(int i=0 ; i<n ; i++){
int diff = (A[i] - x); //pb1
if(diff > 0)
{
for(int j=0 ; j<i ; j++)//pb2
{
if(A[j]<A[i] && A[j] > x)
{
if(A[j]==x+1)
{
x++;
}
diff--;
}
}
value += diff;
}
else
{
x++;
}
value *= n - i;
}
return value;
}
I couldn't get rid of the inner loop, so complexity is o(n log(n)) in worst case, but o(n) in best case, versus your solution which is o(n log(n)) in all cases.
Alternatively, you can replace the inner loop by the following to remove some worst cases at the expense of another verification in the inner loop :
int j=0;
while(diff>1 && j<i)
{
if(A[j]<A[i])
{
if(A[j]==x+1)
{
x++;
}
diff--;
}
j++;
}
Explanation :
(or rather "How I ended with that code", I think it is not that different from yours but it can make you have ideas, maybe)
(for less confusion I used characters instead and digit and only four characters)
abcd 0 = ((0 * 3 + 0) * 2 + 0) * 1 + 0
abdc 1 = ((0 * 3 + 0) * 2 + 1) * 1 + 0
acbd 2 = ((0 * 3 + 1) * 2 + 0) * 1 + 0
acdb 3 = ((0 * 3 + 1) * 2 + 1) * 1 + 0
adbc 4 = ((0 * 3 + 2) * 2 + 0) * 1 + 0
adcb 5 = ((0 * 3 + 2) * 2 + 1) * 1 + 0 //pb1
bacd 6 = ((1 * 3 + 0) * 2 + 0) * 1 + 0
badc 7 = ((1 * 3 + 0) * 2 + 1) * 1 + 0
bcad 8 = ((1 * 3 + 1) * 2 + 0) * 1 + 0 //First reflexion
bcda 9 = ((1 * 3 + 1) * 2 + 1) * 1 + 0
bdac 10 = ((1 * 3 + 2) * 2 + 0) * 1 + 0
bdca 11 = ((1 * 3 + 2) * 2 + 1) * 1 + 0
cabd 12 = ((2 * 3 + 0) * 2 + 0) * 1 + 0
cadb 13 = ((2 * 3 + 0) * 2 + 1) * 1 + 0
cbad 14 = ((2 * 3 + 1) * 2 + 0) * 1 + 0
cbda 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 //pb2
cdab 16 = ((2 * 3 + 2) * 2 + 0) * 1 + 0
cdba 17 = ((2 * 3 + 2) * 2 + 1) * 1 + 0
[...]
dcba 23 = ((3 * 3 + 2) * 2 + 1) * 1 + 0
First "reflexion" :
An entropy point of view. abcd have the fewest "entropy". If a character is in a place it "shouldn't" be, it creates entropy, and the earlier the entropy is the greatest it becomes.
For bcad for example, lexicographic index is 8 = ((1 * 3 + 1) * 2 + 0) * 1 + 0 and can be calculated that way :
value = 0;
value += max(b - a, 0); // = 1; (a "should be" in the first place [to create the less possible entropy] but instead it is b)
value *= 3 - 0; //last index - current index
value += max(c - b, 0); // = 1; (b "should be" in the second place but instead it is c)
value *= 3 - 1;
value += max(a - c, 0); // = 0; (a "should have been" put earlier, so it does not create entropy to put it there)
value *= 3 - 2;
value += max(d - d, 0); // = 0;
Note that the last operation will always do nothing, that's why "i
First problem (pb1) :
For adcb, for example, the first logic doesn't work (it leads to an lexicographic index of ((0* 3+ 2) * 2+ 0) * 1 = 4) because c-d = 0 but it creates entropy to put c before b. I added x because of that, it represents the first digit/character that isn't placed yet. With x, diff cannot be negative.
For adcb, lexicographic index is 5 = ((0 * 3 + 2) * 2 + 1) * 1 + 0 and can be calculated that way :
value = 0; x=0;
diff = a - a; // = 0; (a is in the right place)
diff == 0 => x++; //x=b now and we don't modify value
value *= 3 - 0; //last index - current index
diff = d - b; // = 2; (b "should be" there (it's x) but instead it is d)
diff > 0 => value += diff; //we add diff to value and we don't modify x
diff = c - b; // = 1; (b "should be" there but instead it is c) This is where it differs from the first reflexion
diff > 0 => value += diff;
value *= 3 - 2;
Second problem (pb2) :
For cbda, for example, lexicographic index is 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0, but the first reflexion gives : ((2 * 3 + 0) * 2 + 1) * 1 + 0 = 13 and the solution to pb1 gives ((2 * 3 + 1) * 2 + 3) * 1 + 0 = 17. The solution to pb1 doesn't work because the two last characters to place are d and a, so d - a "means" 1 instead of 3. I had to count the characters placed before that comes before the character in place, but after x, so I had to add an inner loop.
Putting it all together :
I then realised that pb1 was just a particular case of pb2, and that if you remove x, and you simply take diff = A[i], we end up with the unnested version of your solution (with factorial calculated little by little, and my diff corresponding to your x).
So, basically, my "contribution" (I think) is to add a variable, x, which can avoid doing the inner loop when diff equals 0 or 1, at the expense of checking if you have to increment x and doing it if so.
I also checked if you have to increment x in the inner loop (if(A[j]==x+1)) because if you take for example badce, x will be b at the end because a comes after b, and you will enter the inner loop one more time, encountering c. If you check x in the inner loop, when you encounter d you have no choice but doing the inner loop, but x will update to c, and when you encounter c you will not enter the inner loop. You can remove this check without breaking the program
With the alternative version and the check in the inner loop it makes 4 different versions. The alternative one with the check is the one in which you enter the less the inner loop, so in terms of "theoretical complexity" it is the best, but in terms of performance/number of operations, I don't know.
Hope all of this helps (since the question is rather old, and I didn't read all the answers in details). If not, I still had fun doing it. Sorry for the long post. Also I'm new on Stack Overflow (as a member), and not a native speaker, so please be nice, and don't hesitate to let me know if I did something wrong.
Linear traversal of memory already in cache really doesn't take much times at all. Don't worry about it. You won't be traversing enough distance before factorial() overflows.
Move the 8 out as a parameter.
int factorial ( int input )
{
return input ? input * factorial (input - 1) : 1;
}
int lexic_ix ( int* arr, int N )
{
int output = 0;
int fact = factorial (N);
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
output += order * (fact /= N - i);
}
return output;
}
int main()
{
int arr [ ] = { 11, 10, 9, 8, 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 };
const int length = 12;
for ( int i = 0; i < length; ++i )
std::cout << lexic_ix ( arr + i, length - i ) << std::endl;
}
Say, for a M-digit sequence permutation, from your code, you can get the lexicographic SN formula which is something like: Am-1*(m-1)! + Am-2*(m-2)! + ... + A0*(0)! , where Aj range from 0 to j. You can calculate SN from A0*(0)!, then A1*(1)!, ..., then Am-1 * (m-1)!, and add these together(suppose your integer type does not overflow), so you do not need calculate factorials recursively and repeatedly. The SN number is a range from 0 to M!-1 (because Sum(n*n!, n in 0,1, ...n) = (n+1)!-1)
If you are not calculating factorials recursively, I cannot think of anything that could make any big improvement.
Sorry for posting the code a little bit late, I just did some research, and find this:
http://swortham.blogspot.com.au/2011/10/how-much-faster-is-multiplication-than.html
according to this author, integer multiplication can be 40 times faster than integer division. floating numbers are not so dramatic though, but here is pure integer.
int lexic_ix ( int arr[], int N )
{
// if this function will be called repeatedly, consider pass in this pointer as parameter
std::unique_ptr<int[]> coeff_arr = std::make_unique<int[]>(N);
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
coeff_arr[i] = order; // save this into coeff_arr for later multiplication
}
//
// There are 2 points about the following code:
// 1). most modern processors have built-in multiplier, \
// and multiplication is much faster than division
// 2). In your code, you are only the maximum permutation serial number,
// if you put in a random sequence, say, when length is 10, you put in
// a random sequence, say, {3, 7, 2, 9, 0, 1, 5, 8, 4, 6}; if you look into
// the coeff_arr[] in debugger, you can see that coeff_arr[] is:
// {3, 6, 2, 6, 0, 0, 1, 2, 0, 0}, the last number will always be zero anyway.
// so, you will have good chance to reduce many multiplications.
// I did not do any performance profiling, you could have a go, and it will be
// much appreciated if you could give some feedback about the result.
//
long fac = 1;
long sn = 0;
for (int i = 1; i < N; ++i) // start from 1, because coeff_arr[N-1] is always 0
{
fac *= i;
if (coeff_arr[N - 1 - i])
sn += coeff_arr[N - 1 - i] * fac;
}
return sn;
}
int main()
{
int arr [ ] = { 3, 7, 2, 9, 0, 1, 5, 8, 4, 6 }; // try this and check coeff_arr
const int length = 10;
std::cout << lexic_ix(arr, length ) << std::endl;
return 0;
}
This is the whole profiling code, I only run the test in Linux, code was compiled using G++8.4, with '-std=c++11 -O3' compiler options. To be fair, I slightly rewrote your code, pre-calculate the N! and pass it into the function, but it seems this does not help much.
The performance profiling for N = 9 (362,880 permutations) is:
Time durations are: 34, 30, 25 milliseconds
Time durations are: 34, 30, 25 milliseconds
Time durations are: 33, 30, 25 milliseconds
The performance profiling for N=10 (3,628,800 permutations) is:
Time durations are: 345, 335, 275 milliseconds
Time durations are: 348, 334, 275 milliseconds
Time durations are: 345, 335, 275 milliseconds
The first number is your original function, the second is the function re-written that gets N! passed in, the last number is my result. The permutation generation function is very primitive and runs slowly, but as long as it generates all permutations as testing dataset, that is alright. By the way, these tests are run on a Quad-Core 3.1Ghz, 4GBytes desktop running Ubuntu 14.04.
EDIT: I forgot a factor that the first function may need to expand the lexi_numbers vector, so I put an empty call before timing. After this, the times are 333, 334, 275.
EDIT: Another factor that could influence the performance, I am using long integer in my code, if I change those 2 'long' to 2 'int', the running time will become: 334, 333, 264.
#include <iostream>
#include <vector>
#include <chrono>
using namespace std::chrono;
int factorial(int input)
{
return input ? input * factorial(input - 1) : 1;
}
int lexic_ix(int* arr, int N)
{
int output = 0;
int fact = factorial(N);
for (int i = 0; i < N - 1; i++)
{
int order = arr[i];
for (int j = 0; j < i; j++)
order -= arr[j] < arr[i];
output += order * (fact /= N - i);
}
return output;
}
int lexic_ix1(int* arr, int N, int N_fac)
{
int output = 0;
int fact = N_fac;
for (int i = 0; i < N - 1; i++)
{
int order = arr[i];
for (int j = 0; j < i; j++)
order -= arr[j] < arr[i];
output += order * (fact /= N - i);
}
return output;
}
int lexic_ix2( int arr[], int N , int coeff_arr[])
{
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
coeff_arr[i] = order;
}
long fac = 1;
long sn = 0;
for (int i = 1; i < N; ++i)
{
fac *= i;
if (coeff_arr[N - 1 - i])
sn += coeff_arr[N - 1 - i] * fac;
}
return sn;
}
std::vector<std::vector<int>> gen_permutation(const std::vector<int>& permu_base)
{
if (permu_base.size() == 1)
return std::vector<std::vector<int>>(1, std::vector<int>(1, permu_base[0]));
std::vector<std::vector<int>> results;
for (int i = 0; i < permu_base.size(); ++i)
{
int cur_int = permu_base[i];
std::vector<int> cur_subseq = permu_base;
cur_subseq.erase(cur_subseq.begin() + i);
std::vector<std::vector<int>> temp = gen_permutation(cur_subseq);
for (auto x : temp)
{
x.insert(x.begin(), cur_int);
results.push_back(x);
}
}
return results;
}
int main()
{
#define N 10
std::vector<int> arr;
int buff_arr[N];
const int length = N;
int N_fac = factorial(N);
for(int i=0; i<N; ++i)
arr.push_back(N-i-1); // for N=10, arr is {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}
std::vector<std::vector<int>> all_permus = gen_permutation(arr);
std::vector<int> lexi_numbers;
// This call is not timed, only to expand the lexi_numbers vector
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
lexi_numbers.clear();
auto t0 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix(&x[0], length));
auto t1 = high_resolution_clock::now();
lexi_numbers.clear();
auto t2 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix1(&x[0], length, N_fac));
auto t3 = high_resolution_clock::now();
lexi_numbers.clear();
auto t4 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
auto t5 = high_resolution_clock::now();
std::cout << std::endl << "Time durations are: " << duration_cast<milliseconds> \
(t1 -t0).count() << ", " << duration_cast<milliseconds>(t3 - t2).count() << ", " \
<< duration_cast<milliseconds>(t5 - t4).count() <<" milliseconds" << std::endl;
return 0;
}