Overcoming an n^2 runtime program

Overcoming an n^2 runtime program - c++

Is there a way to overcome a nested loop recursion in C++11? My program has a slow runtime. Or rather, is there a more efficient way to solve for the following formula z=|a-b|*|x-y|,with a, b, x and y being elements in a 10000 integer array?
Here is the code:
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;
ifstream in("int.in");
int main()
{
long long n, n1, z, x, y, in2=0;
in>>n
long long l[n], p[n];
for(x=0;x!=n;x++)
in>>l[x]>>p[x];
for(x=0;x!=n;x++)
{
for(y=x+1;y<n;y++)
{
ineq+=(abs(l[x]-l[y])*abs(p[x]-p[y]))); //executes slow
/*n1=l[x]-l[y]; //Alternative algorithm
if(n1<0)
n1*=-1;
z=p[x]-p[y];
if(z<0)
z*=-1;
in2+=n1*z;*/
}
}
cout<<in2<<"\n";
}
I tried to change the data types to short int, long, long long and unsigned, but it either dumps garbage values or executes ``Segmentation Core Fault` errors.
For the absolute value formula, I originally tried using a hard-coded approach (commented out), but it seemingly outputs garbage values. I've also tried to optimize the abs solution with the abs() function ineq+=abs(l[x]-l[y])*abs(p[x]-p[y]));, but it seems to execute slower. I do not know of any other optimizations I can implement, so please do recommend some.
Linux-friendly solution preferred. Thank you.
Side Note: the values of a, b, x and y are all within the range 1<=a,b,x,y<=10000.
Side Note: this program reads from a file "int.in", takes the first integer (the number of items) and reads each new line by pair (l[x] and p[x] are pairs).
Side Note: I also tried using only a multidimensional array, but I read somewhere that a one dimension array is in the CPU cache, while multidimensions are scattered in the memory and is slower.

The problem can be drawn in another way: you're looking for c and d (both positive) in the equation z=c*d (of course c is |a-b| and d is |x-y|).
So first order your arrays. Then look for solution of z=c*d then find which a and b make c == a - b true and x and y that make d == x - y true.
Once it's done you've got all the values that makes your equation true since abs(a-b) is the same as abs(b-a)

Related

How can I create an array inside a function then use this array to create another array?

I'm a newbie in C++, I was learning to code in python.
I believe the solution is simple but I have no idea how to do it.
Here is what I was trying to do in C++ (not working):
int createBoard(int x, int y) {
int l[x];
int board[y, l[x]];
return board;
}
int main() {
int x = 5;
int y = 6;
board = createBoard(x,y);
return 0;
}
Here is what I wanted to replicate (working, but in python):
def createBoard(x,y):
length = [i for i in range(0,10)]
area = [y,length]
return area
area = createBoard(5,6)
Basically I want to create a function that returns an array with the y value and an array counting until x.

As far as I understood from your Python code, you want to create a 2D array. For a complete beginner in C++ that might be a challenging task. Many recommend to use std::vector and they are right but 2D "array" using such container could be very slow. So this example will work but undesirable in the future case when you gain more experience in C++:
#include <vector>
std::vector< std::vector<int> > createBoard(size_t x, size_t y)
{
return std::vector< std::vector<int> >(x, std::vector<int>(y));
}
So if you want to use a more efficient way of creating 2D arrays, see this example:
LINK

Translating code line by line is almost guaranteed to fail. You better do it in two steps: 1) fully understand what code in language A does. 1a) forget about the code in language A. 2) Write the same in language B.
I am not very proficient with python so I start from this:
Basically I want to create a function that returns an array with the y
value and an array counting until x.
You declared the function to return a single int. A single int is not two arrays.
Next, this
int l[x];
Is not standard c++ because x is not a compile time constant. Some compilers offer it as an extension, but there is no reason to use it because c++ has std::vector.
Then, this
int board[y, l[x]];
is problematic in multiple ways. l[x] is accessing an element in the array l that is out of bounds. Valid indexes are 0 till x-1 because l has x elements. Accessing the array out of bounds is undefined behaviour. We could stop at this point, because in the presece of undefined behaviour anything can happen. However, y, l[x] invokes the comma operator. It evaluates both sides and results in the right operand. Then you again have the same problem, l[x] is no compile time constant.
In this place I had code in c++, but it turned out that I completely misunderstood what your code is supposed to do. I'll leave the answer and refer you to others for the solution.

There are several problems with your code. The main one is that the Python array area contains objects of two different types: The first is the integer y, the second is the array length. All elements of a C++ array must have the same type.
Depending on what you want to use it for, you can replace the board array with a std::pair. This is an object containing two elements of different types.
Also, in C++ arrays with non-constant lengths must be dynamically created. Either using the new operator or (better) using std::unique_ptr. (Or you might want to use std::vector instead.)
Here's a small C++ program that does something like what you want to do:
#include <utility>
#include <memory>
auto createBoard(int x, int y) {
return std::make_pair(y, std::make_unique<int[]>(x));
}
int main() {
auto board = createBoard(5,6);
return 0;
}
(This will only work if your compiler supports C++14 or newer.)
But this is actually rather much above "newbie" level, and I doubt that you will find it very useful.
It would be better to start with a specification of what your program should do, rather than try to translate code from Python.
EDIT
Same code with std::vector instead of a dynamic array:
#include <utility>
#include <vector>
auto createBoard(int x, int y) {
return std::make_pair(y, std::vector<int>(x));
}
int main() {
auto board = createBoard(5,6);
return 0;
}

std::uniform_real_distribution - get all possible numbers

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?

uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.

Is it possible to micro-optimize "x = max(a,b); y = min(a,b);"?

I had an algorithm that started out like
int sumLargest2 ( int * arr, size_t n )
{
int largest(max(arr[0], arr[1])), secondLargest(min(arr[0],arr[1]));
// ...
and I realized that the first is probably not optimal because calling max and then min is repetitious when you consider that the information required to know the minimum is already there once you've found the maximum. So I figured out that I could do
int largest = max(arr[0], arr[1]);
int secondLargest = arr[0] == largest ? arr[1] : arr[0];
to shave off the useless invocation of min, but I'm not sure that actually saves any number of operations. Are there any fancy bit-shifting algorithms that can do the equivalent of
int largest(max(arr[0], arr[1])), secondLargest(min(arr[0],arr[1]));
?????

In C++, you can use std::minmax to produce a std::pair of the minimum and the maximum. This is particularly easy in combination with std::tie:
#include <algorithm>
#include <utility>
int largest, secondLargest;
std::tie(secondLargest, largest) = std::minmax(arr[0], arr[1]);
GCC, at least, is capable of optimizing the call to minmax into a single comparison, identical to the result of the C code below.
In C, you could write the test out yourself:
int largest, secondLargest;
if (arr[0] < arr[1]) {
largest = arr[1];
secondLargest = arr[0];
} else {
largest = arr[0];
secondLargest = arr[1];
}

How about:
int largestIndex = arr[1] > arr[0];
int largest = arr[largestIndex];
int secondLargest = arr[1 - largestIndex];
The first line relies on an implicit cast of a boolean result to 1 in the case of true and 0 in the case of false.

I'm going to assume that you'd rather solve the larger problem... That is, getting the sum of the largest two numbers in an array.
What you are trying to do is a std::partial_sort().
Let's implement it.
int sumLargest2(int * arr, size_t n) {
int * first = arr;
int * middle = arr + 2;
int * last = arr + n;
std::partial_sort(first, middle, last, std::greater<int>());
return arr[0] + arr[1];
}
And if you're unable to modify arr, then I'd recommend looking into std::partial_sort_copy().

x = max(a, b);
y = a + b - x;
It won't necessarily be faster, but it will be different.
Also beware of overflows.

If your intention is to reduce the function call to find min mad max you can try std::minmax_element. This is available since C++11.
auto result = std::minmax_element(arr, arr+n);
std::cout<< "min:"<< *result.first<<"\n";
std::cout<< "max :" <<*result.second << "\n";

If you just want to find the bigger of two values go:
if(a > b)
{
largest = a;
second = b;
}
else
{
largest = b;
second = a;
}
No function calls, one comparison, two assignments.

I'm assuming C++...
Short answer, use std::minmax and compile with the right optimizations and the right instruction set parameters.
Long ugly answer, The compiler cannot make all the assumptions necessary to make it really, really fast. You can. In this case, you can change the algorithm to process all data first and you can force alignment on the data. Doing all this, you can use intrinsics to make it faster.
Although I haven't tested it in this particular case, I've seen enormous performance improvements using these guidelines.
Since you're not passing 2 integers to the function, I'm assuming your using an array and want to iterate it somehow. You now have a choice to make: make 2 arrays and use min/max or use 1 array with both a and b. This decision alone can already influence the performance.
If you have 2 arrays, these can be allocated on 32-byte boundaries with aligned malloc's and then processed using intrinsics. If you are going for real, raw performance - this is the way to go.
F.ex, let's assume you have AVX2. (NOTE: I'm not sure if you do and you SHOULD check this using CPU id's!). Go to the cheat sheet here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and pick your poison.
The intrinsics you're looking for are in this case probably:
_mm256_min_epi32
_mm256_max_epi32
_mm256_stream_load_si256
If you have to do this for the entire array, you probably want to keep all the stuff in a single __mm256 register before merging the individual items. E.g.: do a min/max per 256-bit vector, and when the loop is done, extract the 32-bit items and do a min/max on that.
Long nicer answer: So ... as for the compiler. Compilers do attempt to optimize these kinds of things, but run into problems.
If you have 2 different arrays that you process, the compiler has to know that they are different in order to be able to optimize it. This is the reason why stuff like restrict exists, which tells the compiler exactly this little thing you probably already knew while writing the code.
Also, the compiler doesn't know your memory is aligned, so it has to check this and branch... for each call. We don't want this; which means we want it to inline its stuff. So, add inline, put it in a header file and that's that. You can also use aligned to give him a hint.
Your compiler also didn't get the hint that the int* won't change over time. If it cannot change, it's a good idea to tell him that using the const keyword.
A compiler uses an instruction set to do the compilation. Normally, they already use SSE, but AVX2 can help a lot (as I've shown with the intrinsics above). If you can compile it with those flags, make sure to use them - they help a lot.
Run in release mode, compile with optimizations on 'fast' and see what happens under the hood. If you do all this, you should see vpmax... instructions appearing in the inner loops, which means that the compiler uses the intrinsics just fine.
I don't know what else you want to do in the loop... if you use all these instructions you should hit the memory speed on big arrays.

How about a time-space trade-off?
#include <utility>
template<typename T>
std::pair<T, T>
minmax(T const& a, T const& b)
{ return b < a ? std::make_pair(b, a) : std::make_pair(a, b); }
//main
std::pair<int, int> values = minmax(a[0], a[1]);
int largest = values.second;
int secondLargest = values.first;

Finding all possible pairs of subsets using recursion

I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.

I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.

Is this reinterpret_cast OK to do

I am a EE, not a code expert, so please bear with me here.
I am using Embarcadero C++ Builder (XE3).
I have an FFT algorithm which does a fair number of operations on complex numbers. I found out that if I bypass Embarcadero's complex math library, and do all the calculations in my own code, my FFT will run about 4.5 times faster. The 4 operations shown here all require an inordinate amount of time.
#include <dinkumware\complex>
#define ComplexD std::complex<double>
ComplexD X, Y, Z, FFTInput[1024];
double x, y;
Z = X * Y;
x = X.real();
y = X.imag();
Z = ComplexD(x,y);
Replacing the multiplication with my own cross multiply cut my execution times in half. My concern however is with the way I am accessing the real and imaginary parts of the input array. I am doing this:
double *Input;
Input = reinterpret_cast<double *>(FFTInput);
// Then these statements are equivalent.
x = FFTInput[n].real();
y = FFTInput[n].imag();
x = Input[2*n];
y = Input[2*n+1];
Doing this cut my execution times in half again, but I don't know if this reinterpret_cast is a wise thing to do. I could change the input array to two doubles instead of a complex, but I am using this FFT in numerous programs and don't want to rewrite everything.
Is this reinterpret_cast OK, or will I have memory problems? Also, is there a way to get the Embarcadero complex math functions to run faster? And finally, although its not terribly important to me, is this reinterpret_cast portable?

This is allowed. Whilst this isn't a standard quote, cppreference has this to say:
For any pointer to an element of an array of complex numbers p and any
valid array index i, reinterpret_cast<T*>(p)[2*i] is the real part of
the complex number p[i], and reinterpret_cast<T*>(p)[2*i + 1] is the
imaginary part of the complex number p[i].
I will look for the quote from the actual standard soon.

From here it says at the bottom of the page:
For any complex number z, reinterpret_cast<T(&)[2]>(z)[0] is the real part of z and reinterpret_cast<T(&)[2]>(z)[1] is the imaginary part of z.
For any pointer to an element of an array of complex numbers p and any valid array index i, reinterpret_cast<T*>(p)[2*i] is the real part of the complex number p[i], and reinterpret_cast<T*>(p)[2*i + 1] is the imaginary part of the complex number p[i]. (Since C++11)
These requirements essentially limit implementation of each of the three specializations of std::complex to declaring two and only two non-static data members, of type value_type, with the same member access, which hold the real and the imaginary components, respectively.
So what you are doing is guaranteed to work in C++11, but not before that. It may still work with your library's implementation, but you need to check that your library's implementation does not define any more non-static data members as per the third paragraph.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js