Finding all possible pairs of subsets using recursion

Finding all possible pairs of subsets using recursion - c++

I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.

I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.

Related

How is binary search applicable here (since the values are not monotonic)?

I am solving a LeetCode problem Search in Rotated Sorted Array, in order to learn Binary Search better. The problem statement is:
There is an integer array nums sorted in ascending order (with distinct values). Prior to being passed to your function, nums is possibly rotated at an unknown pivot index. For example, [0,1,2,4,5,6,7] might be rotated at pivot index 3 and become [4,5,6,7,0,1,2]. Given the array nums after the possible rotation and an integer target, return the index of target if it is in nums, or -1 if it is not in nums.
With some online help, I came up with the solution below, which I mostly understand:
class Solution {
public:
int search(vector<int>& nums, int target) {
int l=0, r=nums.size()-1;
while(l<r) { // 1st loop; how is BS applicable here, since array is NOT sorted?
int m=l+(r-l)/2;
if(nums[m]>nums[r]) l=m+1;
else r=m;
}
// cout<<"Lowest at: "<<r<<"\n";
if(nums[r]==target) return r; //target==lowest number
int start, end;
if(target<=nums[nums.size()-1]) {
start=r;
end=nums.size()-1;
} else {
start=0;
end=r;
}
l=start, r=end;
while(l<r) {
int m=l+(r-l)/2;
if(nums[m]==target) return m;
if(nums[m]>target) r=m;
else l=m+1;
}
return nums[l]==target ? l : -1;
}
};
My question: Are we searching over a parabola in the first while loop, trying to find the lowest point of a parabola, unlike a linear array in traditional binary search? Are we finding the minimum of a convex function? I understand how the values of l, m and r change leading to the right answer - but I do not fully follow how we can be guaranteed that if(nums[m]>nums[r]), our lowest value would be on the right.

You actually skipped something important by “getting help”.
Once, when I was struggling to integrate something tricky for Calculus Ⅰ, I went for help and the advisor said, “Oh, I know how to do this” and solved it. I learned nothing from him. It took me another week of going over it (and other problems) myself to understand it sufficient that I could do it myself.
The purpose of these assignments is to solve the problem yourself. Even if your solution is faulty, you have learned more than simply reading and understanding the basics of one example problem someone else has solved.
In this particular case...
Since you already have a solution, let’s take a look at it: Notice that it contains two binary search loops. Why?
As you observed at the beginning, the offset shift makes the array discontinuous (not convex). However, the subarrays either side of the discontinuity remain monotonic.
Take a moment to convince yourself that this is true.
Knowing this, what would be a good way to find and determine which of the two subarrays to search?
Hints:
A binary search as ( n ⟶ ∞ ) is O(log n)
O(log n) ≡ O(2 log n)
I should also observe to you that the prompt gives as example an arithmetic progression with a common difference of 1, but the prompt itself imposes no such restriction. All it says is that you start with a strictly increasing sequence (no duplicate values). You could have as input [19 74 512 513 3 7 12].
Does the supplied solution handle this possibility?
Why or why not?

(effectively) storing a polynomial dynamically

What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?

Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.

A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.

Find median of coordinates to build kd tree (2D) - C++

I have a nearest neighbor problem in a 2D problem and I found out that kd-trees were the best solution.
I couldn't find a ready implementation for the structure I am working with, so I decided to create my own.
The structure I work with is:
struct Point{
int id;
double x;
double y;
};
I have nearly 100000 points, my question is: How to proceed to find the median point each time I want to partition my points, and how to define the left and right partitions in the same time?
Another question would be: Is there a more efficient way to proceed ? (The less time consuming possible).

How to compute median? Is there a more efficient way to proceed?
I will give one answer for both questions: You can use std::nth_element, like this:
std::vector<float> v; // data vector (global)
bool myfunction (int i,int j) { return (v[i]<v[j]); }
int find_median(std::vector<int> &v_i)
{
size_t n = v_i.size() / 2;
nth_element(v_i.begin(), v_i.begin()+n, v_i.end(), myfunction);
return v_i[n];
}
You can also check my question for more.
how to define the left and right partitions in the same time?
Every value less from the median, would belong in the left partition, while every value greater than the median, would belong in the right partition.
It's up to you to decide where the equal value to the median would go. Just pick left or right and remember your decision.

C++ algorithm to find 'maximal difference' in an array

I am asking for your ideas regarding this problem:
I have one array A, with N elements of type double (or alternatively integer). I would like to find an algorithm with complexity less than O(N2) to find:
max A[i] - A[j]
For 1 < j <= i < n. Please notice that there is no abs(). I thought of:
dynamic programming
dichotomic method, divide and conquer
some treatment after a sort keeping track of indices
Would you have some comments or ideas? Could you point at some good ref to train or make progress to solve such algorithm questions?

Make three sweeps through the array. First from j=2 up, filling an auxiliary array a with minimal element so far. Then, do the sweep from the top i=n-1 down, filling (also from the top down) another auxiliary array, b, with maximal element so far (from the top). Now do the sweep of the both auxiliary arrays, looking for a maximal difference of b[i]-a[i].
That will be the answer. O(n) in total. You could say it's a dynamic programming algorithm.
edit: As an optimization, you can eliminate the third sweep and the second array, and find the answer in the second sweep by maintaining two loop variables, max-so-far-from-the-top and max-difference.
As for "pointers" about how to solve such problems in general, you usually try some general methods just like you wrote - divide and conquer, memoization/dynamic programming, etc. First of all look closely at your problem and concepts involved. Here, it's maximum/minimum. Take these concepts apart and see how these parts combine in the context of the problem, possibly changing order in which they're calculated. Another one is looking for hidden order/symmetries in your problem.
Specifically, fixing an arbitrary inner point k along the list, this problem is reduced to finding the difference between the minimal element among all js such that 1<j<=k, and the maximal element among is: k<=i<n. You see divide-and-conquer here, as well as taking apart the concepts of max/min (i.e. their progressive calculation), and the interaction between the parts. The hidden order is revealed (k goes along the array), and memoization helps save the interim results for max/min values.
The fixing of arbitrary point k could be seen as solving a smaller sub-problem first ("for a given k..."), and seeing whether there is anything special about it and it can be abolished - generalized - abstracted over.
There is a technique of trying to formulate and solve a bigger problem first, such that an original problem is a part of this bigger one. Here, we think of find all the differences for each k, and then finding the maximal one from them.
The double use for interim results (used both in comparison for specific k point, and in calculating the next interim result each in its direction) usually mean some considerable savings. So,
divide-and-conquer
memoization / dynamic programing
hidden order / symmetries
taking concepts apart - seeing how the parts combine
double use - find parts with double use and memoize them
solving a bigger problem
trying arbitrary sub-problem and abstracting over it

This should be possible in a single iteration. max(a[i] - a[j]) for 1 < j <= i should be the same as max[i=2..n](a[i] - min[j=2..i](a[j])), right? So you'd have to keep track of the smallest a[j] while iterating over the array, looking for the largest a[i] - min(a[j]). That way you only have one iteration and j will be less than or equal to i.

You just need go over the array find the max and min then get the difference, so the worst case is linear time . If the array is sorted, you can find the diff in constant time, or do I miss something?

Java implementation runs in linear time
public class MaxDiference {
public static void main(String[] args) {
System.out.println(betweenTwoElements(2, 3, 10, 6, 4, 8, 1));
}
private static int betweenTwoElements(int... nums) {
int maxDifference = nums[1] - nums[0];
int minElement = nums[0];
for (int i = 1; i < nums.length; i++) {
if (nums[i] - minElement > maxDifference) {
maxDifference = nums[i] - minElement;
}
if (nums[i] < minElement) {
minElement = nums[i];
}
}
return maxDifference;
}
}

Optimization of Point to Voxel mapping

I used a profiler to look over some code which does not yet run fast enough. It found that the following function took most of the time, and half of the time in this function was spent in floor. Now, there are two possibilities: optimizing this function or going one level above and reducing the calls to this function. I wonder, if the first one is possible.
int Sph::gridIndex (Vector3 position) const {
int mx = ((int)floor(position.x / _gridIntervalSize) % _gridSize);
int my = ((int)floor(position.y / _gridIntervalSize) % _gridSize);
int mz = ((int)floor(position.z / _gridIntervalSize) % _gridSize);
if (mx < 0) {
mx += _gridSize;
}
if (my < 0) {
my += _gridSize;
}
if (mz < 0) {
mz += _gridSize;
}
int x = mx * _gridSize * _gridSize;
int y = my * _gridSize;
int z = mz * 1;
return x + y + z;
}
Vector3 is just some simple class which stores three floats and provides some overloaded operators. _gridSize is of type int and _gridIntervalSize is a float. There are _gridSize ^ 3 buckets.
The purpose of the function is to provide hash table support. Every 3d-point is mapped to an index, and points which lie in the same voxel of size _gridIntervalSize ^ 3 should land in the same bucket.

First rule of optimization when there is math involved: Eliminate division, square roots, and trig functions.
inverse_size = 1 / _gridIntervalSize;
....that should be done only once, not once per call.
int mx = ((int)floor(position.x * inverse_size) % _gridSize);
int my = ((int)floor(position.y * inverse_size) % _gridSize);
int mz = ((int)floor(position.z * inverse_size) % _gridSize);
I would also recommend dropping the mod operation because that's another division - if your grid size is a power of 2 you can use & (gridsize-1) which will also allow you to delete the conditional code at the bottom which is another big savings.
On another note, using overloaded operators may be hurting you. This is a touchy subject here so I'll let you experiment with it and decide for yourself.

I assume you use floor because negative values are possible, and because you don't want an anomaly due to the default truncation when you cast to int (values rounding toward zero from both sides, making some oversized voxels).
If you can specify a safe most-negative value for each value in the vector, you could subtract that (negative) value, or rather the nearest more-negative multiple of _gridIntervalSize, before the cast, and drop the floor.
Using fmod may ensure you have a safe most-negative value, and replace the integer %, but it's probably an anti-optimisation. Still, as a quick change, it may be worth checking.
Also, check whether your platform supports vector instructions, and whether your compiler can easily be encouraged to use them. x86 chips certainly have integer vector instructions as well as float (the old Pentium 1 MMX instructions, for a start) and might be able to handle this much more efficiently than the "normal" CPU instruction set. This may even be a case for digging out the list of vector instruction intrinsics for your compiler and doing some hand-optimisation. Just check what the compiler can do for you first - I'm not sure how much of this kind of optimisation compilers will do for you already.
One probably trivial piece of micro-optimisation...
return (mx * _gridSize + my) * _gridSize + mz;
Saves one integer multiplication. Trivial, of course, and the compiler may catch it anyway, but this is an old habitual thing.
Oh - watch the leading underscores. Those are reserved identifiers. Not likely to cause a problem, but you can't complain if they do.
EDIT
Another way to avoid the floor is to handle positive and negative separately. If you are willing to accept that items bang-on-the-edge of a grid cell may be in the wrong cell (possible anyway since floats should be considered approximate). Just apply a -1 offset in the negative case, to pull it away from the zero by almost exactly right amount to compensate for the truncation. You might consider a bit-fiddling increment-the-mantissa afterwards (to get already integer values in the cell you'd expect) but this is probably unnecessary.
If you can impose power-of-two limitations to your sizes, there may be a bit-fiddling way to efficiently extract the grid position from a float, avoiding some or all of the multiply, floor and % for each of x, y and z, assuming a standard floating point representation (ie this is non-portable). Again, handle positive and negative separately. Extract the exponent, bit-shift the mantissa accordingly, then mask out unwanted bits.

I think you need to look higher up the hierarchy to get real speed improvements. That is, is storing points in a hash-map really the most efficent solution? I assume you have an array of Vector3 arrays, i.e:
Vector3 *points [size][size][size]
where each element in the 3D array is an array of Vector3.
The algorithm you're using doesn't guarantee uniform distribution of points in each Vector3 array, which may be a problem. A cluster of points within _gridIntervalSize will map to the same array.
An alternative method would be to use oct-trees, which are like binary trees but each node has eight child nodes. Each node requires the min/max x/y/z values to define the volume the node covers. To add values to the tree:
Recursive search tree to find smallest node that can contain point
Add point to node
If number of points in node > upper limit to number of points in a node
Create child nodes and move points to child nodes
You may want to use quad-trees if there is little variation in values along a particular axis. Another method is to use BSPs - divide the world into two halves and recurse to find the container to add your point to. Again, these can be dynamic.
Converting the floats to ints and having the division planes lie on integer values will speed up the process as well.
Googling the above terms will lead you to more in depth analysis of the algorithms.
Finally, using floats (or doubles) for co-ordinates in an infinite plane is a bad idea - the further you get from (0,0,0) the less precision you have (the gaps between floating point values increases as the value increases). You will need to 'reset' the floating point values to keep the precision. One method is to 'tile' the space and change the co-ordinates to use integer and floating point parts. The integer part defines the 'tile' and the floating point part defines the position in the tile. This method gets you a much simpler hashing method - just use the integer parts, no call to floor required and only integer calculations required. Another approach is to use fixed-point values rather than floating point values, but this would constrain your precision. This would make calculations accross tile boundaries much easier.
If you could expand on what the top-level requriements of your coordinate system is, there are probably better algorithms available to you.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js