Random choices of two values - c++

In my algorithm I have two values that I need to choose at random but each one has to be chosen a predetermined number of times.
So far my solution is to put the choices into a vector the correct number of times and then shuffle it. In C++:
// Example choices (can be any positive int)
int choice1 = 3;
int choice2 = 4;
int number_of_choice1s = 5;
int number_of_choice2s = 1;
std::vector<int> choices;
for(int i = 0; i < number_of_choice1s; ++i) choices.push_back(choice1);
for(int i = 0; i < number_of_choice2s; ++i) choices.push_back(choice2);
std::random_shuffle(choices.begin(), choices.end());
Then I keep an iterator to choices and whenever I need a new one I increase the iterator and grab that value.
This works but it seems like there might be a more efficient way. Since I always know how many of each value I'll use I'm wondering if there is a more algorithmic way to go about doing this, rather than just storing the values.

You are unnecessarily using so much memory. You have two variables:
int number_of_choice1s = 5;
int number_of_choice2s = 1;
Now simply randomize:
int result = rand() % (number_of_choice1s + number_of_choice2s);
if(result < number_of_choice1s) {
--number_of_choice1s;
return choice1;
} else {
--number_of_choice2s;
return choice2;
}
This scales very well two millions of random invocations.

You could write this a bit more simply:
std::vector<int> choices(number_of_choice1s, choice1);
choices.resize(number_of_choice1s + number_of_choice2s, choice2);
std::random_shuffle(choices.begin(), choices.end());

A biased random distribution will keep some kind of order over the resulting set ( the choice that was picked the most have lesser and lesser chance to be picked next ), which give a biased result (specially if the number of time you have to pick the first value is large compared to the second value, you'll endup with something like this {1,1,1,2,1,1,1,1,2}.
Here's the code, which looks a lot like the one written by #Tomasz Nurkiewicz but using a simple even/odd which should give about 50/50 chance to pick either values.
int result = rand();
if ( result & 1 && number_of_choice1s > 0)
{
number_of_choice1s--;
return choice1;
}else if (number_of_choice2s>0)
{
number_of_choice2s--;
return choice2;
}
else
{
return -1;
}

Related

Writing two versions of a function, one for "clarity" and one for "speed"

My professor assigned homework to write a function that takes in an array of integers and sorts all zeros to the end of the array while maintaining the current order of non-zero ints. The constraints are:
Cannot use the STL or other templated containers.
Must have two solutions: one that emphasizes speed and another that emphasizes clarity.
I wrote up this function attempting for speed:
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;
void sortArray(int array[], int size)
{
int i = 0;
int j = 1;
int n = 0;
for (i = j; i < size;)
{
if (array[i] == 0)
{
n++;
i++;
}
else if (array[i] != 0 && j != i)
{
array[j++] = array[i++];
}
else
{
i++;
n++;
}
}
while (j < size)
{
array[j++] = 0;
}
}
int main()
{
//Example 1
int array[]{20, 0, 0, 3, 14, 0, 5, 11, 0, 0};
int size = sizeof(array) / sizeof(array[0]);
sortArray(array, size);
cout << "Result :\n";
for (int i = 0; i < size; i++)
{
cout << array[i] << " ";
}
cout << endl << "Press any key to exit...";
cin.get();
return 0;
}
It outputs correctly, but;
I don't know what the speed of it actually is, can anyone help me figure out how to calculate that?
I have no idea how to go about writing a function for "clarity"; any ideas?
I my experience, unless you have very complicated algorithm, speed and clarity come together:
void sortArray(int array[], int size)
{
int item;
int dst = 0;
int src = 0;
// collect all non-zero elements
while (src < size) {
if (item = array[src++]) {
array[dst++] = item;
}
}
// fill the rest with zeroes
while (dst < size) {
array[dst++] = 0;
}
}
Speed comes from a good algorithm. Clarity comes from formatting, naming variables and commenting.
Speed as in complexity?
Since you are, and need, to look at all the elements in the array — and as such have a single loop going through the indexes in the range [0, N)—where N denotes the size of the input—your solution is O(N).
Further reading:
Plain English explanation of big O
Determining big O Notation
Regarding clearity
In my honest opinion there shouldn't need to be two alternatives when implementing such functionality as you are presenting. If you rename your variables to more suitable (descriptive) names your current solution should be clear enough to count as both performant and clear.
Your current approach can be written in plain english in a very clear fashion:
pseudo-explanation
set write_index to 0
set number_of_zeroes to 0
For each element in array
If element is 0
increase number_of_zeros by one
otherwise
write element value to position denoted by write_index
increase write_index by one
write number_of_zeroes 0s at the end of array
Having stated the explanation above we can quickly see that sortArray is not a descriptive name for your function, a more suitable name would probably be partition_zeroes or similar.
Adding comments could improve readability, but you current focus should lie in renaming your variables to better express the intent of the code.
(I feel your question is almost off-topic; I am answering it from a Linux perspective; I recommend using Linux to learn C++ programming; you'll adapt my advices to your operating system if you are using something else....)
speed
Regarding speed, you should have two complementary approaches.
The first (somehow "theoretical") is to analyze (i.e. think on) your algorithm and give (with some proof) its asymptotic time complexity.
The second approach (only "practical", and often pragmatical) is to benchmark and profile your program. Don't forget to compile with optimizations enabled (e.g. using g++ -Wall -O2 with GCC). Have a benchmark which runs for more than half of a second (so processes a large amount of data, e.g. several million numbers) and repeat it several times (e.g. using time(1) command on Linux). You could also measure some time inside your program using e.g. <chrono> in C++11, or just clock(3) (if you read a large array from some file, or build a large array of pseudo-random numbers with <random> or with random(3) you certainly want to measure separately the time to read or fill the array with the time to move zeros out of it). See also time(7).
(You need to process a large amount of data - more than a million items, perhaps many millions of them - because computer are very fast; a typical "elementary" operation -a machine instruction- takes less than a nanosecond, and you have lot of uncertainty on a single run, see this)
clarity
Regarding clarity, it is a bit subjective, but you might try to make your code readable and concise. Adding a few good comments could also help.
Be careful about naming: sorting is not exactly what your program is doing (it is more moving zeros than sorting the array)...
I think this is the best - Of course you may wish to use doxygen or some other
// Shift the non-zeros to the front and put zero in the rest of the array
void moveNonZerosTofront(int *list, unsigned int length)
{
unsigned int from = 0, to = 0;
// This will move the non-zeros
for (; from < length; ++from) {
if (list[from] != 0) {
list[to] = list[from];
to++;
}
}
// So the rest of the array needs to be assigned zero (as we found those on the way)
for (; to < length; +=to) {
list[to] = 0;
}
}

Vector + for + if

OK, so the goal of this was to write some code for the Fibonacci numbers itself then take those numbers figure out which ones were even then add those specific numbers together. Everything works except I tried and tried to figure out a way to add the numbers up, but I always get errors and am stumped as of how to add them together. I looked elsewhere but they were all asking for all the elements in the vector. Not specific ones drawn out of an if statement.
P.S. I know system("pause") is bad but i tried a few other options but sometimes they work and sometimes they don't and I am not sure why. Such as cin.get().
P.S.S I am also new to programming my own stuff so I have limited resources as far as what I know already and will appreciate any ways of how I might "improve" my program to make it work more fluently. I also take criticism well so please do.
#include "../../std_lib_facilities.h"
int main(){
vector<int>Fibonacci;
int one = 0;
int two = 1;
int three = 0;
int i = 0;
while (i < 4000000){
i += three;
three = two + one; one = two; two = three;
cout << three << ", ";
Fibonacci.push_back(three);
//all of the above is to produce the Fibonacci number sequence which starts with 1, 2 and adds the previous one to the next so on and so forth.
//bellow is my attempt and taking those numbers and testing for evenness or oddness and then adding the even ones together for one single number.
}
cout << endl;
//go through all points in the vector Fibonacci and execute code for each point
for (i = 0; i <= 31; ++i)
if (Fibonacci.at(i) % 2 == 0)//is Fibonacci.at(i) even?
cout << Fibonacci.at(i) << endl;//how to get these numbers to add up to one single sum
system("pause");
}
Just do it by hand. That is loop over the whole array and and keep track of the cumulative sum.
int accumulator = 0; // Careful, this might Overflow if `int` is not big enough.
for (i = 0; i <= 31; i ++) {
int fib = Fibonacci.at(i);
if(fib % 2)
continue;
cout << fib << endl;//how to get these numbers to add up to one single sum
accumulator += fib;
}
// now do what you want with "accumulator".
Be careful about this big methematical series, they can explode really fast. In your case I think the calulation will just about work with 32-bit integers. Best to use 64-bit or even better, a propery BigNum class.
In addition to the answer by Adrian Ratnapala, I want to encourage you to use algorithms where possible. This expresses your intent clearly and avoids subtle bugs introduced by mis-using iterators, indexing variables and what have you.
const auto addIfEven = [](int a, int b){ return (b % 2) ? a : a + b; };
const auto result = accumulate(begin(Fibonacci), end(Fibonacci), 0, addIfEven);
Note that I used a lambda which is a C++11 feature. Not all compilers support this yet, but most modern ones do. You can always define a function instead of a lambda and you don't have to create a temporary function pointer like addIfEven, you can also pass the lambda directly to the algorithm.
If you have trouble understanding any of this, don't worry, I just want to point you into the "right" direction. The other answers are fine as well, it's just the kind of code which gets hard to maintain once you work in a team or have a large codebase.
Not sure what you're after...
but
int sum=0; // or long or double...
for (i = 0; i <= 31; ++i)
if (Fibonacci.at(i) % 2 == 0) {//is Fibonacci.at(i) even?
cout << Fibonacci.at(i) << endl;//how to get these numbers to add up to one single sum
sum+=Fibonacci.at(i);
}
// whatever
}

Stack versus Integer

I've created a program to solve Cryptarithmetics for a class on Data Structures. The professor recommended that we utilize a stack consisting of linked nodes to keep track of which letters we replaced with which numbers, but I realized an integer could do the same trick. Instead of a stack {A, 1, B, 2, C, 3, D, 4} I could hold the same info in 1234.
My program, though, seems to run much more slowly than the estimation he gave us. Could someone explain why a stack would behave much more efficiently? I had assumed that, since I wouldn't be calling methods over and over again (push, pop, top, etc) and instead just add one to the 'solution' that mine would be faster.
This is not an open ended question, so do not close it. Although you can implement things different ways, I want to know why, at the heart of C++, accessing data via a Stack has performance benefits over storing in ints and extracting by moding.
Although this is homework, I don't actually need help, just very intrigued and curious.
Thanks and can't wait to learn something new!
EDIT (Adding some code)
letterAssignments is an int array of size 26. for a problem like SEND + MORE = MONEY, A isn't used so letterAssignments[0] is set to 11. All chars that are used are initialized to 10.
answerNum is a number with as many digits as there are unique characters (in this case, 8 digits).
int Cryptarithmetic::solve(){
while(!solved()){
for(size_t z = 0; z < 26; z++){
if(letterAssignments[z] != 11) letterAssignments[z] = 10;
}
if(answerNum < 1) return NULL;
size_t curAns = answerNum;
for(int i = 0; i < numDigits; i++){
if(nextUnassigned() != '$') {
size_t nextAssign = curAns % 10;
if(isAssigned(nextAssign)){
answerNum--;
continue;
}
assign(nextUnassigned(), nextAssign);
curAns /= 10;
}
}
answerNum--;
}
return answerNum;
}
Two helper methods in case you'd like to see them:
char Cryptarithmetic::nextUnassigned(){
char nextUnassigned = '$';
for(int i = 0; i < 26; i++) {
if(letterAssignments[i] == 10) return ('A' + i);
}
}
void Cryptarithmetic::assign(char letter, size_t val){
assert('A' <= letter && letter <= 'Z'); // valid letter
assert(letterAssignments[letter-'A'] != 11); // has this letter
assert(!isAssigned(val)); // not already assigned.
letterAssignments[letter-'A'] = val;
}
From the looks of things the way you are doing things here is quite inefficiant.
As a general rule try to have the least amount of for loops possible since each one will slow down your implementation greatly.
for instance if we strip all other code away, your program looks like
while(thing) {
for(z < 26) {
}
for(i < numDigits) {
for(i < 26) {
}
for(i < 26) {
}
}
}
this means that for each while loop you are doing ((26+26)*numDigits)+26 loop operations. Thats assuming isAssigned() does not use a loop.
Idealy you want:
while(thing) {
for(i < numDigits) {
}
}
which i'm sure is possible with changes to your code.
This is why your implementation with the integer array is much slower than an implementation using the stack which does not use the for(i < 26) loops (I assume).
In Answer to your original question however, storing an array of integers will always be faster than any struct you can come up with simply because there are more overheads involved in assigning the memory, calling functions, etc.
But as with everything, implementation is the key difference between a slow program and a fast program.
The problem is that by counting you are considering also repetitions, when may be the problem asks to assign a different number to each different letter so that the numeric equation holds.
For example for four letters you are testing 10*10*10*10=10000 letter->number mappings instead of 10*9*8*7=5040 of them (the bigger is the number of letters and bigger becomes the ratio between the two numbers...).
The div instruction used by the mod function is quite expensive. Using it for your purpose can easily be less efficient than a good stack implementation. Here is an instruction timings table: http://gmplib.org/~tege/x86-timing.pdf
You should also write unit tests for your int-based stack to make sure that it works as intended.
Programming is actually trading memory for time and vice versa.
Here you are packing data into integer. You spare memory but loose time.
Speed of course depends on the implementation of stack. C++ is C with classes. If you are not using classes it's basically C(as fast as C).
const int stack_size = 26;
struct Stack
{
int _data[stack_size];
int _stack_p;
Stack()
:_stack_size(0)
{}
inline void push(int val)
{
assert(_stack_p < stack_size); // this won't be overhead
// unless you compile debug version(-DNDEBUG)
_data[_stack_p] = val;
}
inline int pop()
{
assert(_stack_p > 0); // same thing. assert is very useful for tracing bugs
return _data[--_stack_p]; // good hint for RVO
}
inline int size()
{
return _stack_p;
}
inline int val(int i)
{
assert(i > 0 && i < _stack_p);
return _data[i];
}
}
There is no overhead like vtbp. Also pop() and push() are very simple so they will be inlined, so no overhead of function call. Using int as stack element also good for speed because int is guaranteed to be of best suitable size for processor(no need for alignment etc).

How to find first non-repeating element?

How to find first non-repeating element in an array.
Provided that you can only use 1 bit for every element of the array and time complexity should be O(n) where n is length of array.
Please make sure that I somehow imposed constraint on memory requirements. It is also possible that it can not be done with just an extra bit per element of the string. Also please let me know if it is possible or not?
I would say there is no comparison based algorithm, that can do it in O(n). As you have to compare the the first element of the array with all others, the 2nd with all except the first, the 3rd with all except the first = Sum i = O(n^2).
(But that does not necessarily mean that there is no faster algorithm, see sorting: There is a proof that you cant sort fast than O(n log n) if you are comparison based - and there is indeed one faster: Bucket Sort, which can do it in O(n)).
EDIT: In one of the other comments I said something about hash functions. I checked some facts about it, and here are the hashmap approach thoughts:
Obvious approach is (in Pseudocode):
for (i = 0; i < maxsize; i++)
count[i] = 0;
for (i = 0; i < maxsize; i++) {
h = hash(A[i]);
count[h]++;
}
first = -1;
for (i = 0; i < maxsize; i++)
if (count[i] == 0) {
first = i;
break;
}
}
for (i = 0; hash(A[i]) != first; i++) ;
printf("first unique: " + A[i]);
There are some caveats:
How to get hash. I did some research on perfect hash functions. And indeed you can generate one in O(n). (Optimal algorithms for minimal perfect hashing by George Havas et al. - Not sure how good this paper is, as it claims as Time Limit O(n) but speaks from non linear space limit (which is plan an error, I hope I am not the only seeing the flaw in the this, but according to all theorical computer science I know off time is an upper border for space (as you dont have time to write in more space)). But I believe them when they say it is possible in O(n).
The additional space - here I dont see a solution. Above papers cites some research that says that you need 2.7 bits for the perfect hash function. With the additional count array (which you can shorten to the states: Empty + 1 Element + More than 1 Element) you need 2 additional bits per element (1.58 if you assume you can it somehow combine with the above 2.7), which sums up to additional 5 bits.
Here I'm just taking one assumption that the string is Character String, just containing small alphabets, so that I can use one Integer (32 bit) so that with 26 alphabets it will be sufficient to take one bit per alphabet. Earlier I thought to take an array of 256 elements but then it will have 256*32 bits in total. 32 bits per element. But finally I found that I will be unable to do it without one more variable. So the solution is like this with just one integer (32 bits) for 26 alphabets:
int print_non_repeating(char* str)
{
int bitmap = 0, bitmap_check = 0;
int length = strlen(str);
for(int i=0;i<len;i++)
{
if(bitmap & 1<<(str[i] - 'a'))
{
bitmap_check = bitmap_check | ( 1 << (str[i] - 'a');
}
else
bitmap = bitmap | (1 << str[i] - 'a');
}
bitmap = bitmap ^ bitmap_check;
i = 0;
if(bitmap != 0)
{
while(!bitmap & (1<< (str[i])))
i++;
cout<<*(str+i);
return 1;
}
else
return 0;
}
You can try doing a modified bucketsort as exemplified below. However, you need to know the max value in the array passed into the firstNonRepeat method. So this runs at O(n).
For comparison based methods, the theoretical fastest (at least in terms of sorting) is O(n log n). Alternatively, you can even use modified versions of radix sort to accomplish this.
public class BucketSort{
//maxVal is the max value in the array
public int firstNonRepeat(int[] a, int maxVal){
int [] bucket=new int[maxVal+1];
for (int i=0; i<bucket.length; i++){
bucket[i]=0;
}
for (int i=0; i<a.length; i++){
if(bucket[a[i]] == 0) {
bucket[a[i]]++;
} else {
return bucket[a[i]];
}
}
}
}
This code finds the first repeating element. havent figured out yet if in the same for loop if it is possible to find the non-repeating element without introducing another for (to keep the code O(n)). Other answers suggest bubble sort which is O(n^2)
#include <iostream>
using namespace std;
#define max_size 10
int main()
{
int numbers[max_size] = { 1, 2, 3, 4, 5, 1, 3, 4 ,2, 7};
int table[max_size] = {0,0,0,0,0,0,0,0,0,0};
int answer = 0, j=0;
for (int i = 0; i < max_size; i++)
{
j = numbers[i] %max_size;
table[j]++;
if(table[j] >1)
{
answer = 1;
break;
}
}
std::cout << "answer = " << answer ;
}

Unusual Speed Difference between Python and C++

I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it. For a speed comparison I decided to make the most direct translation of the algorithm I knew of from python to c++.
Surprisingly, the c++ version runs significantly slower than the python version. Accurate speed tests between the execution times for discovering the first 10,000 happy numbers indicate the python program runs on average in 0.59 seconds and the c++ version runs on average in 8.5 seconds.
I would attribute this speed difference to the fact that I had to write helper functions for parts of the calculations (for example determining if an element is in a list/array/vector) in the c++ version which were already built in to the python language.
Firstly, is this the true reason for such an absurd speed difference, and secondly, how can I change the c++ version to execute more quickly than the python version (the way it should be in my opinion).
The two pieces of code, with speed testing are here: Python Version, C++ Version. Thanks for the help.
#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <windows.h>
using namespace std;
bool inVector(int inQuestion, vector<int> known);
int sum(vector<int> given);
int pow(int given, int power);
void calcMain(int upperBound);
int main()
{
while(true)
{
int upperBound;
cout << "Pick an upper bound: ";
cin >> upperBound;
long start, end;
start = GetTickCount();
calcMain(upperBound);
end = GetTickCount();
double seconds = (double)(end-start) / 1000.0;
cout << seconds << " seconds." << endl << endl;
}
return 0;
}
void calcMain(int upperBound)
{
vector<int> known;
for(int i = 0; i <= upperBound; i++)
{
bool next = false;
int current = i;
vector<int> history;
while(!next)
{
char* buffer = new char[10];
itoa(current, buffer, 10);
string digits = buffer;
delete buffer;
vector<int> squares;
for(int j = 0; j < digits.size(); j++)
{
char charDigit = digits[j];
int digit = atoi(&charDigit);
int square = pow(digit, 2);
squares.push_back(square);
}
int squaresum = sum(squares);
current = squaresum;
if(inVector(current, history))
{
next = true;
if(current == 1)
{
known.push_back(i);
//cout << i << "\t";
}
}
history.push_back(current);
}
}
//cout << "\n\n";
}
bool inVector(int inQuestion, vector<int> known)
{
for(vector<int>::iterator it = known.begin(); it != known.end(); it++)
if(*it == inQuestion)
return true;
return false;
}
int sum(vector<int> given)
{
int sum = 0;
for(vector<int>::iterator it = given.begin(); it != given.end(); it++)
sum += *it;
return sum;
}
int pow(int given, int power)
{
int original = given;
int current = given;
for(int i = 0; i < power-1; i++)
current *= original;
return current;
}
#!/usr/bin/env python
import timeit
upperBound = 0
def calcMain():
known = []
for i in range(0,upperBound+1):
next = False
current = i
history = []
while not next:
digits = str(current)
squares = [pow(int(digit), 2) for digit in digits]
squaresum = sum(squares)
current = squaresum
if current in history:
next = True
if current == 1:
known.append(i)
##print i, "\t",
history.append(current)
##print "\nend"
while True:
upperBound = input("Pick an upper bound: ")
result = timeit.Timer(calcMain).timeit(1)
print result, "seconds.\n"
For 100000 elements, the Python code took 6.9 seconds while the C++ originally took above 37 seconds.
I did some basic optimizations on your code and managed to get the C++ code above 100 times faster than the Python implementation. It now does 100000 elements in 0.06 seconds. That is 617 times faster than the original C++ code.
The most important thing is to compile in Release mode, with all optimizations. This code is literally orders of magnitude slower in Debug mode.
Next, I will explain the optimizations I did.
Moved all vector declarations outside of the loop; replaced them by a clear() operation, which is much faster than calling the constructor.
Replaced the call to pow(value, 2) by a multiplication : value * value.
Instead of having a squares vector and calling sum on it, I sum the values in-place using just an integer.
Avoided all string operations, which are very slow compared to integer operations. For instance, it is possible to compute the squares of each digit by repeatedly dividing by 10 and fetching the modulus 10 of the resulting value, instead of converting the value to a string and then each character back to int.
Avoided all vector copies, first by replacing passing by value with passing by reference, and finally by eliminating the helper functions completely.
Eliminated a few temporary variables.
And probably many small details I forgot. Compare your code and mine side-by-side to see exactly what I did.
It may be possible to optimize the code even more by using pre-allocated arrays instead of vectors, but this would be a bit more work and I'll leave it as an exercise to the reader. :P
Here's the optimized code :
#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <algorithm>
#include <windows.h>
using namespace std;
void calcMain(int upperBound, vector<int>& known);
int main()
{
while(true)
{
vector<int> results;
int upperBound;
cout << "Pick an upper bound: ";
cin >> upperBound;
long start, end;
start = GetTickCount();
calcMain(upperBound, results);
end = GetTickCount();
for (size_t i = 0; i < results.size(); ++i) {
cout << results[i] << ", ";
}
cout << endl;
double seconds = (double)(end-start) / 1000.0;
cout << seconds << " seconds." << endl << endl;
}
return 0;
}
void calcMain(int upperBound, vector<int>& known)
{
vector<int> history;
for(int i = 0; i <= upperBound; i++)
{
int current = i;
history.clear();
while(true)
{
int temp = current;
int sum = 0;
while (temp > 0) {
sum += (temp % 10) * (temp % 10);
temp /= 10;
}
current = sum;
if(find(history.begin(), history.end(), current) != history.end())
{
if(current == 1)
{
known.push_back(i);
}
break;
}
history.push_back(current);
}
}
}
There's a new, radically faster version as a separate answer, so this answer is deprecated.
I rewrote your algorithm by making it cache whenever it finds the number to be happy or unhappy. I also tried to make it as pythonic as I could, for example by creating separate functions digits() and happy(). Sorry for using Python 3, but I get to show off a couple a useful things from it as well.
This version is much faster. It runs at 1.7s which is 10 times faster than your original program that takes 18s (well, my MacBook is quite old and slow :) )
#!/usr/bin/env python3
from timeit import Timer
from itertools import count
print_numbers = False
upperBound = 10**5 # Default value, can be overidden by user.
def digits(x:'nonnegative number') -> "yields number's digits":
if not (x >= 0): raise ValueError('Number should be nonnegative')
while x:
yield x % 10
x //= 10
def happy(number, known = {1}, happies = {1}) -> 'True/None':
'''This function tells if the number is happy or not, caching results.
It uses two static variables, parameters known and happies; the
first one contains known happy and unhappy numbers; the second
contains only happy ones.
If you want, you can pass your own known and happies arguments. If
you do, you should keep the assumption commented out on the 1 line.
'''
# assert 1 in known and happies <= known # <= is expensive
if number in known:
return number in happies
history = set()
while True:
history.add(number)
number = sum(x**2 for x in digits(number))
if number in known or number in history:
break
known.update(history)
if number in happies:
happies.update(history)
return True
def calcMain():
happies = {x for x in range(upperBound) if happy(x) }
if print_numbers:
print(happies)
if __name__ == '__main__':
upperBound = eval(
input("Pick an upper bound [default {0}]: "
.format(upperBound)).strip()
or repr(upperBound))
result = Timer(calcMain).timeit(1)
print ('This computation took {0} seconds'.format(result))
It looks like you're passing vectors by value to other functions. This will be a significant slowdown because the program will actually make a full copy of your vector before it passes it to your function. To get around this, pass a constant reference to the vector instead of a copy. So instead of:
int sum(vector<int> given)
Use:
int sum(const vector<int>& given)
When you do this, you'll no longer be able to use the vector::iterator because it is not constant. You'll need to replace it with vector::const_iterator.
You can also pass in non-constant references, but in this case, you don't need to modify the parameter at all.
This is my second answer; which caches things like sum of squares for values <= 10**6:
happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
That is,
the number is split into 3 digits + 3 digits
the precomputed table is used to get sum of squares for both parts
these two results are added
the precomputed table is consulted to get the happiness of number:
I don't think Python version can be made much faster than that (ok, if you throw away fallback to old version, that is try: overhead, it's 10% faster).
I think this is an excellent question which shows that, indeed,
things that have to be fast should be written in C
however, usually you don't need things to be fast (even if you needed the program to run for a day, it would be less then the combined time of programmers optimizing it)
it's easier and faster to write programs in Python
but for some problems, especially computational ones, a C++ solution, like the ones above, are actually more readable and more beautiful than an attempt to optimize Python program.
Ok, here it goes (2nd version now...):
#!/usr/bin/env python3
'''Provides slower and faster versions of a function to compute happy numbers.
slow_happy() implements the algorithm as in the definition of happy
numbers (but also caches the results).
happy() uses the precomputed lists of sums of squares and happy numbers
to return result in just 3 list lookups and 3 arithmetic operations for
numbers less than 10**6; it falls back to slow_happy() for big numbers.
Utilities: digits() generator, my_timeit() context manager.
'''
from time import time # For my_timeit.
from random import randint # For example with random number.
upperBound = 10**5 # Default value, can be overridden by user.
class my_timeit:
'''Very simple timing context manager.'''
def __init__(self, message):
self.message = message
self.start = time()
def __enter__(self):
return self
def __exit__(self, *data):
print(self.message.format(time() - self.start))
def digits(x:'nonnegative number') -> "yields number's digits":
if not (x >= 0): raise ValueError('Number should be nonnegative')
while x:
yield x % 10
x //= 10
def slow_happy(number, known = {1}, happies = {1}) -> 'True/None':
'''Tell if the number is happy or not, caching results.
It uses two static variables, parameters known and happies; the
first one contains known happy and unhappy numbers; the second
contains only happy ones.
If you want, you can pass your own known and happies arguments. If
you do, you should keep the assumption commented out on the 1 line.
'''
# This is commented out because <= is expensive.
# assert {1} <= happies <= known
if number in known:
return number in happies
history = set()
while True:
history.add(number)
number = sum(x**2 for x in digits(number))
if number in known or number in history:
break
known.update(history)
if number in happies:
happies.update(history)
return True
# This will define new happy() to be much faster ------------------------.
with my_timeit('Preparation time was {0} seconds.\n'):
LogAbsoluteUpperBound = 6 # The maximum possible number is 10**this.
happy_list = [slow_happy(x)
for x in range(81*LogAbsoluteUpperBound + 1)]
happy_base = 10**((LogAbsoluteUpperBound + 1)//2)
sq_list = [sum(d**2 for d in digits(x))
for x in range(happy_base + 1)]
def happy(x):
'''Tell if the number is happy, optimized for smaller numbers.
This function works fast for numbers <= 10**LogAbsoluteUpperBound.
'''
try:
return happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
except IndexError:
return slow_happy(x)
# End of happy()'s redefinition -----------------------------------------.
def calcMain(print_numbers, upper_bound):
happies = [x for x in range(upper_bound + 1) if happy(x)]
if print_numbers:
print(happies)
if __name__ == '__main__':
while True:
upperBound = eval(input(
"Pick an upper bound [{0} default, 0 ends, negative number prints]: "
.format(upperBound)).strip() or repr(upperBound))
if not upperBound:
break
with my_timeit('This computation took {0} seconds.'):
calcMain(upperBound < 0, abs(upperBound))
single = 0
while not happy(single):
single = randint(1, 10**12)
print('FYI, {0} is {1}.\n'.format(single,
'happy' if happy(single) else 'unhappy'))
print('Nice to see you, goodbye!')
I can see that you have quite a few heap allocations that are unnecessary
For example:
while(!next)
{
char* buffer = new char[10];
This doesn't look very optimized. So, you probably want to have the array pre-allocated and using it inside your loop. This is a basic optimizing technique which is easy to spot and to do. It might become into a mess too, so be careful with that.
You are also using the atoi() function, which I don't really know if it is really optimized. Maybe doing a modulus 10 and getting the digit might be better (you have to measure thou, I didn't test this).
The fact that you have a linear search (inVector) might be bad. Replacing the vector data structure with a std::set might speed things up. A hash_set could do the trick too.
But I think that the worst problem is the string and this allocation of stuff on the heap inside that loop. That doesn't look good. I would try at those places first.
Well, I also gave it a once-over. I didn't test or even compile, though.
General rules for numerical programs:
Never process numbers as text. That's what makes lesser languages than Python slow, so if you do it in C, the program will be slower than Python.
Don't use data structures if you can avoid them. You were building an array just to add the numbers up. Better keep a running total.
Keep a copy of the STL reference open so you can use it rather than writing your own functions.
void calcMain(int upperBound)
{
vector<int> known;
for(int i = 0; i <= upperBound; i++)
{
int current = i;
vector<int> history;
do
{
squaresum = 0
for ( ; current; current /= 10 )
{
int digit = current % 10;
squaresum += digit * digit;
}
current = squaresum;
history.push_back(current);
} while ( ! count(history.begin(), history.end() - 1, current) );
if(current == 1)
{
known.push_back(i);
//cout << i << "\t";
}
}
//cout << "\n\n";
}
Just to get a little more closure on this issue by seeing how fast I could truely find these numbers, I wrote a multithreaded C++ implementation of Dr_Asik's algorithm. There are two things that are important to realize about the fact that this implementation is multithreaded.
More threads does not necessarily lead to better execution times, there is a happy medium for every situation depending on the volume of numbers you want to calculate.
If you compare the times between this version running with one thread and the original version, the only factors that could cause a difference in time are the overhead from starting the thread and variable system performance issues. Otherwise, the algorithm is the same.
The code for this implementation (all credit for the algorithm goes to Dr_Asik) is here. Also, I wrote some speed tests with a double check for each test to help back up those 3 points.
Calculation of the first 100,000,000 happy numbers:
Original - 39.061 / 39.000 (Dr_Asik's original implementation)
1 Thread - 39.000 / 39.079
2 Threads - 19.750 / 19.890
10 Threads - 11.872 / 11.888
30 Threads - 10.764 / 10.827
50 Threads - 10.624 / 10.561 <--
100 Threads - 11.060 / 11.216
500 Threads - 13.385 / 12.527
From these results it looks like our happy medium is about 50 threads, plus or minus ten or so.
Other optimizations: by using arrays and direct access using the loop index rather than searching in a vector, and by caching prior sums, the following code (inspired by Dr Asik's answer but probably not optimized at all) runs 2445 times faster than the original C++ code, about 400 times faster than the Python code.
#include <iostream>
#include <windows.h>
#include <vector>
void calcMain(int upperBound, std::vector<int>& known)
{
int tempDigitCounter = upperBound;
int numDigits = 0;
while (tempDigitCounter > 0)
{
numDigits++;
tempDigitCounter /= 10;
}
int maxSlots = numDigits * 9 * 9;
int* history = new int[maxSlots + 1];
int* cache = new int[upperBound+1];
for (int jj = 0; jj <= upperBound; jj++)
{
cache[jj] = 0;
}
int current, sum, temp;
for(int i = 0; i <= upperBound; i++)
{
current = i;
while(true)
{
sum = 0;
temp = current;
bool inRange = temp <= upperBound;
if (inRange)
{
int cached = cache[temp];
if (cached)
{
sum = cached;
}
}
if (sum == 0)
{
while (temp > 0)
{
int tempMod = temp % 10;
sum += tempMod * tempMod;
temp /= 10;
}
if (inRange)
{
cache[current] = sum;
}
}
current = sum;
if(history[current] == i)
{
if(current == 1)
{
known.push_back(i);
}
break;
}
history[current] = i;
}
}
}
int main()
{
while(true)
{
int upperBound;
std::vector<int> known;
std::cout << "Pick an upper bound: ";
std::cin >> upperBound;
long start, end;
start = GetTickCount();
calcMain(upperBound, known);
end = GetTickCount();
for (size_t i = 0; i < known.size(); ++i) {
std::cout << known[i] << ", ";
}
double seconds = (double)(end-start) / 1000.0;
std::cout << std::endl << seconds << " seconds." << std::endl << std::endl;
}
return 0;
}
Stumbled over this page whilst bored and thought I'd golf it in js. The algorithm is my own, and I haven't checked it thoroughly against anything other than my own calculations (so it could be wrong). It calculates the first 1e7 happy numbers and stores them in h. If you want to change it, change both the 7s.
m=1e7,C=7*81,h=[1],t=true,U=[,,,,t],n=w=2;
while(n<m){
z=w,s=0;while(z)y=z%10,s+=y*y,z=0|z/10;w=s;
if(U[w]){if(n<C)U[n]=t;w=++n;}else if(w<n)h.push(n),w=++n;}
This will print the first 1000 items for you in console or a browser:
o=h.slice(0,m>1e3?1e3:m);
(!this.document?print(o):document.load=document.write(o.join('\n')));
155 characters for the functional part and it appears to be as fast* as Dr. Asik's offering on firefox or v8 (350-400 times as fast as the original python program on my system when running time d8 happygolf.js or js -a -j -p happygolf.js in spidermonkey).
I shall be in awe of the analytic skills anyone who can figure out why this algorithm is doing so well without referencing the longer, commented, fortran version.
I was intrigued by how fast it was, so I learned fortran to get a comparison of the same algorithm, be kind if there are any glaring newbie mistakes, it's my first fortran program. http://pastebin.com/q9WFaP5C
It's static memory wise, so to be fair to the others, it's in a self-compiling shell script, if you don't have gcc/bash/etc strip out the preprocessor and bash stuff at the top, set the macros manually and compile it as fortran95.
Even if you include compilation time it beats most of the others here. If you don't, it's about ~3000-3500 times as fast as the original python version (and by extension >40,000 times as fast as the C++*, although I didn't run any of the C++ programs).
Surprisingly many of the optimizations I tried in the fortran version (incl some like loop unrolling which I left out of the pasted version due to small effect and readability) were detrimental to the js version. This exercise shows that modern trace compilers are extremely good (within a factor of 7-10 of carefully optimized, static memory fortran) if you get out of their way and don't try any tricky stuff.
get out of their way, and trying to do tricky stuff
Finally, here's a much nicer, more recursive js version.
// to s, then integer divides x by 10.
// Repeats until x is 0.
function sumsq(x) {
var y,s=0;
while(x) {
y = x % 10;
s += y * y;
x = 0| x / 10;
}
return s;
}
// A boolean cache for happy().
// The terminating happy number and an unhappy number in
// the terminating sequence.
var H=[];
H[1] = true;
H[4] = false;
// Test if a number is happy.
// First check the cache, if that's empty
// Perform one round of sumsq, then check the cache
// For that. If that's empty, recurse.
function happy(x) {
// If it already exists.
if(H[x] !== undefined) {
// Return whatever is already in cache.
return H[x];
} else {
// Else calc sumsq, set and return cached val, or if undefined, recurse.
var w = sumsq(x);
return (H[x] = H[w] !== undefined? H[w]: happy(w));
}
}
//Main program loop.
var i, hN = [];
for(i = 1; i < 1e7; i++) {
if(happy(i)) { hN.push(i); }
}
Surprisingly, even though it is rather high level, it did almost exactly as well as the imperative algorithm in spidermonkey (with optimizations on), and close (1.2 times as long) in v8.
Moral of the story I guess, spend a bit of time thinking about your algorithm if it's important. Also high level languages already have a lot of overhead (and sometimes have tricks of their own to reduce it) so sometimes doing something more straightforwared or utilizing their high level features is just as fast. Also micro-optimization doesn't always help.
*Unless my python installation is unusually slow, direct times are somewhat meaningless as this is a first generation eee.
Times are:
12s for fortran version, no output, 1e8 happy numbers.
40s for fortran version, pipe output through gzip to disk.
8-12s for both js versions. 1e7 happy numbers, no output with full optimization
10-100s for both js versions 1e7 with less/no optimization (depending on definition of no optimization, the 100s was with eval()) no output
I'd be interested to see times for these programs on a real computer.
I am not an expert at C++ optimization, but I believe the speed difference may be due to the fact that Python lists have preallocated more space at the beginning while your C++ vectors must reallocate and possibly copy every time it grows.
As for GMan's comment about find, I believe that the Python "in" operator is also a linear search and is the same speed.
Edit
Also I just noticed that you rolled your own pow function. There is no need to do that and the stdlib is likely faster.
Here is another way that relies on memorising all the numbers already explored.
I obtain a factor x4-5, which is oddly stable against DrAsik's code for 1000 and 1000000, I expected the cache to be more efficient the more numbers we were exploring. Otherwise, the same kind of classic optimizations have been applied. BTW, if the compiler accepts NRVO (/RNVO ? I never remember the exact term) or rvalue references, we wouldn't need to pass the vector as an out parameter.
NB: micro-optimizations are still possible IMHO, and moreover the caching is naive as it allocates much more memory than really needed.
enum Status {
never_seen,
being_explored,
happy,
unhappy
};
char const* toString[] = { "never_seen", "being_explored", "happy", "unhappy" };
inline size_t sum_squares(size_t i) {
size_t s = 0;
while (i) {
const size_t digit = i%10;
s += digit * digit;
i /= 10;
}
return s ;
}
struct Cache {
Cache(size_t dim) : m_cache(dim, never_seen) {}
void set(size_t n, Status status) {
if (m_cache.size() <= n) {
m_cache.resize(n+1, never_seen);
}
m_cache[n] = status;
// std::cout << "(c[" << n << "]<-"<<toString[status] << ")";
}
Status operator[](size_t n) const {
if (m_cache.size() <= n) {
return never_seen;
} else {
return m_cache[n];
}
}
private:
std::vector<Status> m_cache;
};
void search_happy_lh(size_t upper_bound, std::vector<size_t> & happy_numbers)
{
happy_numbers.clear();
happy_numbers.reserve(upper_bound); // it doesn't improve much the performances
Cache cache(upper_bound+1);
std::vector<size_t> current_stack;
cache.set(1,happy);
happy_numbers.push_back(1);
for (size_t i = 2; i<=upper_bound ; ++i) {
// std::cout << "\r" << i << std::flush;
current_stack.clear();
size_t s= i;
while ( s != 1 && cache[s]==never_seen)
{
current_stack.push_back(s);
cache.set(s, being_explored);
s = sum_squares(s);
// std::cout << " - " << s << std::flush;
}
const Status update_with = (cache[s]==being_explored ||cache[s]==unhappy) ? unhappy : happy;
// std::cout << " => " << s << ":" << toString[update_with] << std::endl;
for (size_t j=0; j!=current_stack.size(); ++j) {
cache.set(current_stack[j], update_with);
}
if (cache[i] == happy) {
happy_numbers.push_back(i);
}
}
}
Here's a C# version:
using System;
using System.Collections.Generic;
using System.Text;
namespace CSharp
{
class Program
{
static void Main (string [] args)
{
while (true)
{
Console.Write ("Pick an upper bound: ");
String
input = Console.ReadLine ();
uint
upper_bound;
if (uint.TryParse (input, out upper_bound))
{
DateTime
start = DateTime.Now;
CalcHappyNumbers (upper_bound);
DateTime
end = DateTime.Now;
TimeSpan
span = end - start;
Console.WriteLine ("Time taken = " + span.TotalSeconds + " seconds.");
}
else
{
Console.WriteLine ("Error in input, unable to parse '" + input + "'.");
}
}
}
enum State
{
Happy,
Sad,
Unknown
}
static void CalcHappyNumbers (uint upper_bound)
{
SortedDictionary<uint, State>
happy = new SortedDictionary<uint, State> ();
SortedDictionary<uint, bool>
happy_numbers = new SortedDictionary<uint, bool> ();
happy [1] = State.Happy;
happy_numbers [1] = true;
for (uint current = 2 ; current < upper_bound ; ++current)
{
FindState (ref happy, ref happy_numbers, current);
}
//foreach (KeyValuePair<uint, bool> pair in happy_numbers)
//{
// Console.Write (pair.Key.ToString () + ", ");
//}
//Console.WriteLine ("");
}
static State FindState (ref SortedDictionary<uint, State> happy, ref SortedDictionary<uint,bool> happy_numbers, uint value)
{
State
current_state;
if (happy.TryGetValue (value, out current_state))
{
if (current_state == State.Unknown)
{
happy [value] = State.Sad;
}
}
else
{
happy [value] = current_state = State.Unknown;
uint
new_value = 0;
for (uint i = value ; i != 0 ; i /= 10)
{
uint
lsd = i % 10;
new_value += lsd * lsd;
}
if (new_value == 1)
{
current_state = State.Happy;
}
else
{
current_state = FindState (ref happy, ref happy_numbers, new_value);
}
if (current_state == State.Happy)
{
happy_numbers [value] = true;
}
happy [value] = current_state;
}
return current_state;
}
}
}
I compared it against Dr_Asik's C++ code. For an upper bound of 100000 the C++ version ran in about 2.9 seconds and the C# version in 0.35 seconds. Both were compiled using Dev Studio 2005 using default release build options and both were executed from a command prompt.
Here's some food for thought: If given the choice of running a 1979 algorithm for finding prime numbers in a 2009 computer or a 2009 algorithm on a 1979 computer, which would you choose?
The new algorithm on ancient hardware would be the better choice by a huge margin. Have a look at your "helper" functions.
There are quite a few optimizations possible:
(1) Use const references
bool inVector(int inQuestion, const vector<int>& known)
{
for(vector<int>::const_iterator it = known.begin(); it != known.end(); ++it)
if(*it == inQuestion)
return true;
return false;
}
int sum(const vector<int>& given)
{
int sum = 0;
for(vector<int>::const_iterator it = given.begin(); it != given.end(); ++it)
sum += *it;
return sum;
}
(2) Use counting down loops
int pow(int given, int power)
{
int current = 1;
while(power--)
current *= given;
return current;
}
Or, as others have said, use the standard library code.
(3) Don't allocate buffers where not required
vector<int> squares;
for (int temp = current; temp != 0; temp /= 10)
{
squares.push_back(pow(temp % 10, 2));
}
With similar optimizations as PotatoSwatter I got time for 10000 numbers down from 1.063 seconds to 0.062 seconds (except I replaced itoa with standard sprintf in the original).
With all the memory optimizations (don't pass containers by value - in C++ you have to explicitly decide whether you want a copy or a reference; move operations that allocate memory out of inner loops; if you already have the number in a char buffer, what's the point of copying it to std::string etc) I got it down to 0.532.
The rest of the time came from using %10 to access digits, rather than converting numbers to string.
I suppose there might be another algorithmic level optimization (numbers that you have encountered while finding a happy number are themselves also happy numbers?) but I don't know how much that gains (there is not that many happy numbers in the first place) and this optimization is not in the Python version either.
By the way, by not using string conversion and a list to square digits, I got the Python version from 0.825 seconds down to 0.33 too.
#!/usr/bin/env python
import timeit
upperBound = 0
def calcMain():
known = set()
for i in xrange(0,upperBound+1):
next = False
current = i
history = set()
while not next:
squaresum=0
while current > 0:
current, digit = divmod(current, 10)
squaresum += digit * digit
current = squaresum
if current in history:
next = True
if current == 1:
known.add(i)
history.add(current)
while True:
upperBound = input("Pick an upper bound: ")
result = timeit.Timer(calcMain).timeit(1)
print result, "seconds.\n"
I made a couple of minor changes to your original python code example that make a better than 16x improvement to the performance of the code.
The changes I made took the 100,000 case from about 9.64 seconds to about 3.38 seconds.
The major change was to make the mod 10 and accumulator changes to run in a while loop. I made a couple of other changes that improved execution time in only fractions of hundredths of seconds. The first minor change was changing the main for loop from a range list comprehension to an xrange iterator. The second minor change was substituting the set class for the list class for both the known and history variables.
I also experimented with iterator comprehensions and precalculating the squares but they both had negative effects on the efficiency.
I seem to be running a slower version of python or on a slower processor than some of the other contributers. I would be interest in the results of someone else's timing comparison of my python code against one of the optimized C++ versions of the same algorithm.
I also tried using the python -O and -OO optimizations but they had the reverse of the intended effect.
Why is everyone using a vector in the c++ version? Lookup time is O(N).
Even though it's not as efficient as the python set, use std::set. Lookup time is O(log(N)).