How to optimize a for loop? - c++

I have a problem, let's say:
Find all two pairs of numbers (x,y) and (z,t) such that x³ + y³ = z³ + t³, where (x, y) != (z, t) and x³ + y³ < 10,000.
Taking the cube root of 10,000 yeilds 21.544 -> round down to 21, so I got:
#include <iostream>
using namespace std;
int main() {
for( int x = 1; x <= 20; ++x ) {
for( int y = x + 1; y <= 21; ++y ) {
for( int z = x + 1; z <= y - 1; ++z ) {
for( int t = z; t <= y - 1; ++t ) {
if( x*x*x + y*y*y == z*z*z + t*t*t ) {
cout << x << ", " << y << ", " << z << ", " << t << endl;
}
}
}
}
}
return 0;
}
I know this code could be optimized more, and that's what I'm looking for. Plus, one of my friends told me that y could be x + 2 instead of x + 1, and I doubt this since if
x = 1, then we will never have y = 2, which in this case missed one possible solution.
Any thought?

Well there's one obvious algorithmic optimization that can be made given the current loop structure, you optimize quite rightly by limiting your ranges to the cube root of 10,000. However you can go farther and limit your range on y based on the cube root of 10,000 - x. That's one thing you can do.
The other optimization is that there's no reason on earth that this should be 4 loops. Simply do 2 loops and compute the values of x^3 + y^3 and check for duplicates. (This is as good as you're going to get without delving into features of cube roots.)
This isn't actually using the API correctly but you get the idea:
multimap<int, std::pair<int, int> > map;
for (int i = 1; i < 21; i++) {
(for int j = x; j < cube_root(10000 - i^3); j++ {
multimap.insert (i^3 + j^3, std::pair<int, int>(i,j);
Then you just iterate through the multimap and look for repeats.

Typical tradeoff: memory for speed.
First the bound on x is quite large: if we suppose that (x,y) is ordered with x <= y, then
x^3 + y^3 < N and x^3 < y^3 (for positive numbers)
=> x^3 + x^3 < N (by transitivity)
<=> x^3 < N/2
<=> x <= floor((N/2)^(1/3))
Thus x <= 17 here.
Now, let us memoize the result of x^3 + y^3 and build an associative table (sum -> pairs). By the way, is there a reason to discard (x,x) as a pair ?
int main(int argc, char* argv[])
{
typedef std::pair<unsigned short, unsigned short> Pair;
typedef std::vector<Pair> PairsList;
typedef std::unordered_map<unsigned short, PairsList> SumMap;
// Note: arbitrary limitation, N cannot exceed 2^16 on most architectures
// because of the choice of using an `unsigned short`
unsigned short N = 10000;
if (argc > 1) { N = boost::lexical_cast<unsigned short>(argv[1]); }
SumMap sumMap;
for (unsigned short x = 1; x*x*x <= N/2; ++x)
{
for (unsigned short y = x; x*x*x + y*y*y <= N; ++y)
{
sumMap[x*x*x + y*y*y].push_back(Pair(x,y));
}
}
for (SumMap::const_reference ref: sumMap)
{
if (ref.second.size() > 1)
{
std::cout << "Sum: " << ref.first
<< " can be achieved with " << ref.second << "\n";
// I'll let you overload the print operator for a vector of pairs
}
}
return 0;
}
We are O(N^2) here.

Make a list of all numbers and their operational result. Sort the list by the results. Test matching results for having different operands.

Use a table from sums to the set of pairs of numbers generating that sum.
You can generate that table by two nested for loops, and then run through the table collecting the sums with multiple solutions.

I'd suggest calculating the powers in outer loops (EDIT: Moved calculations out of the for loops):
int x3, y3, z3;
for( int x = 1; x <= 20; ++x ) {
x3 = x * x * x;
for( int y = x + 1; y <= 21; ++y ) {
y3 = y * y * y;
for( int z = x + 1; z <= y - 1; ++z ) {
z3 = z * z * z;
for( int t = z; t <= y - 1; ++t ) {
if( x3 + y3 == z3 + t*t*t ) {
cout << x << ", " << y << ", " << z << ", " << t << endl;
}
}
}
}
}
Anyway, why do you want to optimize (at least for this example)? This runs in 20 ms on my PC... So I guess you have similar problems at a larger scale.

As a general summary:
Calculate the cubes as you loop rather than at the end, thus int xcubed = x*x*x; just after the loop of x (similarly with y and z). This saves you calculating the same values multiple times. Put them in a table so you only calculate these once.
Create a table of sums of cubes, using a hash_table of some extent, and let it hold duplicates (not to be confused with a hashed-collision).
Any that has a duplicate is a solution.
1729 should come up as one of your solutions by the way. 1 cubed plus 12 cubed and also 9 cubed + 10 cubed.
To test performance you could of course pick a much higher value of maxsum (as well as run it several times).
The algorithm is strictly O(N^2/3). (2/3 because you go only to the cube-root of N and then it is O(m^2) on that smaller range).

Related

Can I obtain different results from iterative and recurive functions?

My code is supposed to calculate the 100th element of the sequence $x_0=1 ; x_i=\dfrac{x_{i-1}+1}{x_{i-1}+2}, i=1,2, \ldots$
I wrote iterative and recursive functions, but the results are not equal. Is it due to the lost of decimals?
Here is my driver code. The data from the file is i=100.
int main()
{
int i;
ifstream f ("data.txt");
f >> i;
double x_0= 1.00;
double x_100 = l(x_0, i);
ofstream g ("results.txt", ios::app);
g <<"\n100th element (by looping): " << x_100;
x_100 = r(x_0);
g <<"\n100th element (by recursion): " << x_100;
return 0;
}
l() is iterative function,
r() is recursive function
double l(double x, int i)
{
for (int j = 0; j<i ; j++){
x = (x + 1)/(x+2);
}
return x;
}
double r(double x)
{
if (x == 0)
return 1;
else
return (r(x-1) + 1) / (r(x-1) + 2);
}
Here are the results
100th element (by looping): 0.618034
100th element (by recursion): 0.666667
I the recursive function you do
(r(x-1) + 1) / (r(x-1) + 2)
With x == 1.0 that's equal to
(r(1-1) + 1) / (r(1-1) + 2)
That's of course equal to
(r(0) + 1) / (r(0) + 2)
And since r(0) will return 1 that equation is
(1.0 + 1) / (1.0 + 2)
There's no further recursion. The result is 2.0 / 3.0 which is 0.66667.
The iterative function l on the other hand will do 100 iterations where each iteration will change the value of x, making it even smaller and smaller.
The functions simply does different things, leading to different results.

Auto-correlation/correlation in C++

The following function computes the correlation between two vectors.
It doesn't give the same result as matlab function for small values:
I am really don't know if the bug becomes from this function or not ? the maximum lags by default is N-1 ? is this reasonable ?
inline int pow2i(int x) { return ((x < 0) ? 0 : (1 << x)); }`
vec xcorr(vec x, vec y,bool autoflag)
{
int maxlag=0;
int N = std::max(x.size(), y.size());
//Compute the FFT size as the "next power of 2" of the input vector's length (max)
int b = ceil(log2(2.0 * N - 1));
int fftsize = pow2i(b);
int e = fftsize - 1;
cx_vec temp2;
if (autoflag == true) {
//Take FFT of input vector
cx_vec X = cx_vec(x,zeros(x.size()));
X= fft(X,fftsize);
//Compute the abs(X).^2 and take the inverse FFT.
temp2 = ifft(X%conj(X));
}
else{
//Take FFT of input vectors
cx_vec X=cx_vec(x,zeros(x.size()));
cx_vec Y=cx_vec(y,zeros(y.size()));
X = fft(X,fftsize);
Y = fft(Y,fftsize);
//cout<< "Y " << Y << endl;
//cout<< "X " << X<< endl;
temp2 =ifft(X%conj(Y));
//cout<< "temp 2 " << temp2 << endl;
}
maxlag=N-1;
vec out=real(join_cols(temp2(span(e - maxlag + 1, e)),temp2(span(0,maxlag))));
return out;
}
Just implement autocorrelation in time-domain, as one of the comments mentioned.
Armadillo does not have cross-correlation (and autocorrelation) implemented, but one easy way to implement them is using convolution, which armadillo does have. You just need to invert the other of the elements in the second vector and arma::conv will be essentially be computing the cross-correlation.
That is, you can easily compute the autocorrelation with of an arma::vec a with
arma::vec result = arma::conv(a, arma::reverse(a));
This gives the same result that xcorr in MATLAB/Octave returns (when you pass just a single vector to xcorr it computes the autocorrelation).
Note that you might want to divide result by N or by N-1.

Calculating 3D cartesian coordinates inside a cone

So what I am essentially trying to do here is arranging the 3D cartesian coordinates of points inside an inverted cone (radius decreases with height). The basic approach I have taken here is to have an integrally reducing height, h, and plotting points (x,y) that fall within a circle formed at height h. Since the radius of this circle is variable, I am using a simple similarity condition to determine that at every iteration. The initial height I have taken is 1000, the radius ought to initially be 3500. Also, these circles as centred at (0,0) [the z-axis passes through the vertex of the cone, and is perpendicular to the base]. Code isn't running properly, showing me an exit status of -1. Can anyone help me figure out if my implementation is off due to some size errors or something?
#include<bits/stdc++.h>
#define ll long long int
using namespace std;
int main(){
float top[1010][9000][3];
ll i = 0;
for(ll h = 999; h >=0; h--){
float r=(h+1)*(3.5);
for (ll x = floor(r) * (-1); x <= floor(r); x++){
for (ll y = floor(r) *(-1); y <= floor(r); y++){
if(pow(x,2) + pow(y,2) <= pow(floor(r),2)){
top[h][i][0] = x;
top[h][i][1] = y;
top[h][i][2] = 9.8;
i++;
}
}
}
i=0;
}
cout << "done";
for (ll m = 0; m < 1000; m++){
for(ll n = 0; n < 7000; n++){
if(top[m][n][2] == 9.8){
cout << top[m][n][0] << top[m][n][1];
}
}
}
}
You don't need to declare ll as long long int. The indexes you are using will fit inside of int.
Here's your problem: Change the code to this to see what's going on:
for(ll h = 999; h >=0; h--){
float r=(h+1)*(3.5);
for (ll x = floor(r) * (-1); x <= floor(r); x++){
for (ll y = floor(r) *(-1); y <= floor(r); y++){
if(pow(x,2) + pow(y,2) <= pow(floor(r),2)){
/* top[h][i][0] = x;
top[h][i][1] = y;
top[h][i][2] = 9.8; //*/
i++; // this gets really big
}
}
}
cout << "max i: " << i << endl;
i=0;
}
i gets really big and is indexing into a dimension that is only 9000.
Criticism of the code...
It looks like you are scanning the entire x,y,z block and 'testing' if the point is inside. If yes, saving the x,y coordinate of the point along with 9.8 (some field value?).
Perhaps you could forgo the float buffer and just print the {x,y} coordinates directly to see how your code works before attempting to save the output. redirect the output to a file and inspect.
cout << "{" << x << "," << y <<"}," << (i % 5 == 0 ? "\n" : " ");
Also, read up on why comparing floats with == doesn't work.

Julia Set rendering code

I am working on escape-time fractals as my 12th grade project, to be written in c++ , using the simple graphics.h library that is outdated but seems sufficient.
The code for generating the Mandelbrot set seems to work, and I assumed that Julia sets would be a variation of the same. Here is the code:
(Here, fx and fy are simply functions to convert the actual complex co-ordinates like (-0.003,0.05) to an actual value of a pixel on the screen.)
int p;
x0=0, y0=0;
long double r, i;
cout<<"Enter c"<<endl;
cin>>r>>i;
for(int i= fx(-2); i<=fx(2); i++)
{
for(int j= fy(-2); j>=fy(2); j--)
{
long double x=0.0, y= 0.0,t;
x= gx(i), y= gy(j);
int k= -1;
while(( x*x + y*y <4)&& k<it-1)
{
t= x*x - y*y + r;
y= 2*x*y + i ;
x=t;
k++;
}
p= k*pd;
setcolor(COLOR(colour[p][0],colour[p][1],colour[p][2]));
putpixel(i,j,getcolor());
}
}
But this does not seem to be the case. The output window shows the entire circle of radius=2 with the colour corresponding to an escape time of 1 iteration.
Also, on trying to search for a solution to this problem, I've seen that all the algorithms others have used initializes the initial co-ordinates somewhat like this:
x = (col - width/2)*4.0/width;
y = (row - height/2)*4.0/width;
Could somebody explain what I'm missing out?
I guess that the main problem is that the variable i (imaginary part) is mistakenly overridden by the loop variable i. So the line
y= 2*x*y + i;
gives the incorrect result. This variable should be renamed as, say im. The corrected version is attached below, Since I don't have graphics.h, I used the screen as the output.
#include <iostream>
using namespace std;
#define WIDTH 40
#define HEIGHT 60
/* real to screen */
#define fx(x) ((int) ((x + 2)/4.0 * WIDTH))
#define fy(y) ((int) ((2 - y)/4.0 * HEIGHT))
/* screen to real */
#define gx(i) ((i)*4.0/WIDTH - 2)
#define gy(j) ((j)*4.0/HEIGHT - 2)
static void julia(int it, int pd)
{
int p;
long double re = -0.75, im = 0;
long double x0 = 0, y0 = 0;
cout << "Enter c" << endl;
cin >> re >> im;
for (int i = fx(-2.0); i <= fx(2.0); i++)
{
for (int j = fy(-2.0); j >= fy(2.0); j--)
{
long double x = gx(i), y = gy(j), t;
int k = 0;
while (x*x + y*y < 4 && k < it)
{
t = x*x - y*y + re;
y = 2*x*y + im;
x = t;
k++;
}
p = (int) (k * pd);
//setcolor(COLOR(colour[p][0],colour[p][1],colour[p][2]));
//putpixel(i,j,getcolor());
cout << p; // for ASCII output
}
cout << endl; // for ASCII output
}
}
int main(void)
{
julia(9, 1);
return 0;
}
and the output with input -0.75 0 is given below.
0000000000000000000000000000000000000000000000000000000000000
0000000000000000000001111111111111111111000000000000000000000
0000000000000000011111111111111111111111111100000000000000000
0000000000000001111111111111111111111111111111000000000000000
0000000000000111111111111122222222211111111111110000000000000
0000000000011111111111122222349432222211111111111100000000000
0000000001111111111112222233479743322222111111111111000000000
0000000011111111111222222334999994332222221111111111100000000
0000000111111111112222223345999995433222222111111111110000000
0000011111111111122222234479999999744322222211111111111100000
0000011111111111222222346899999999986432222221111111111100000
0000111111111111222223359999999999999533222221111111111110000
0001111111111112222233446999999999996443322222111111111111000
0011111111111112222233446999999999996443322222111111111111100
0011111111111122222333456899999999986543332222211111111111100
0111111111111122223334557999999999997554333222211111111111110
0111111111111122233345799999999999999975433322211111111111110
0111111111111122233457999999999999999997543322211111111111110
0111111111111122334469999999999999999999644332211111111111110
0111111111111122345999999999999999999999995432211111111111110
0111111111111122379999999999999999999999999732211111111111110
0111111111111122345999999999999999999999995432211111111111110
0111111111111122334469999999999999999999644332211111111111110
0111111111111122233457999999999999999997543322211111111111110
0111111111111122233345799999999999999975433322211111111111110
0111111111111122223334557999999999997554333222211111111111110
0011111111111122222333456899999999986543332222211111111111100
0011111111111112222233446999999999996443322222111111111111100
0001111111111112222233446999999999996443322222111111111111000
0000111111111111222223359999999999999533222221111111111110000
0000011111111111222222346899999999986432222221111111111100000
0000011111111111122222234479999999744322222211111111111100000
0000000111111111112222223345999995433222222111111111110000000
0000000011111111111222222334999994332222221111111111100000000
0000000001111111111112222233479743322222111111111111000000000
0000000000011111111111122222349432222211111111111100000000000
0000000000000111111111111122222222211111111111110000000000000
0000000000000001111111111111111111111111111111000000000000000
0000000000000000011111111111111111111111111100000000000000000
0000000000000000000001111111111111111111000000000000000000000
0000000000000000000000000000000000000000000000000000000000000
would you please tell how you display the image by using these graphics.h library
//setcolor(COLOR(colour[p][0],colour[p][1],colour[p][2]));
//putpixel(i,j,getcolor());

Exhaustive (brute force) algorithm improvement

I started with the requirements below:
m,n are integers. Search(x,y,z) with
x+y+z=n
x^3 + y^3 + z^3 = m
And my code
for(int x = 1; x<n; x++)
{
for(int y = 1; y<n; y++)
{
for(int z=1; z<n; z++)
{
if((x*x*x + y*y*y + z*z+z*z == m) &&(x+y+z==n))
{
cout<<x<<" "<<y<<" "<<z;
}
}
}
}
And BigO = n^3
With the block code above, the algorithm is very slow. Have you any idea how to boost speed?
There's no need for the inner loop; given x and y, you can take z = n-x-y. This reduces it to O(n^2).
The second loop only needs to loop while x+y<n, since beyond that there's no positive z such that x+y+z==n. This halves the remaining work.
Once you've done this, there's no need for the second test (since you've already chosen z to make that true); fix the typo in the first test and you get
for (int x = 1; x<n; x++) {
for (int y = 1; x+y<n; y++) {
int z = n-x-y;
if (x*x*x + y*y*y + z*z*z == m) {
// found it
}
}
}
You do not need the internal for z loop. Once you have x and y, you can easily determine z as n-x-y. This makes it O(N^2).
UPD: I think you can even make in O(N log N) using binary search approach.
Iterate over x. For a given x, you need to find such y and z that y+z=n-x and y^3+z^3=m-x^3. Assume n'=n-z and m'=m-x^3.
The problem is symmetric with respect to y and z, so we can safely assume y<=z. This makes y<=n'/2.
We need to find such y that y^3+(n'-y)^3=m'. I am almost sure (though not checked this) that the function f(y)=y^3+(n'-y)^3 is monotonic on the [1, n'/2] interval, so you can use binary search to find the root for f(y)=m' equation.
So, for a given x you can find needed y in O(log N) time, which makes O(N log N) running time in total.
This one is O(n^2)
for(int x = 1; x<n; x++)
{
int n1 = n - x ;
for(int y = 1; y<n1; y++)
{
int z = n - x - y ;
if (x*x*x + y*y*y+z*z*z==m)
{
cout<<x<<" "<<y<<" "<<z;
}
}
}