using std::nth_element in eigen and a related interrogation - c++

I'm teaching myself c++ and eigen in one go,
so maybe this is an easy question.
Given n and 0 "<" m "<" n, and an n-vector d of floats. To make it concrete:
VectorXf d = VectorXf::Random(n)
i would like to have a m-vector d_prim onf integers that contains
the indexes of all the entries of d that are smaller or equal than
the m-th largest entry of d. Efficiency matters. if there are draws
in the data, then filling d_prim the first m entries of d that are
smaller than its m-th largest entry is fine (i really need the
index of m numbers that are not larger than the m^th largest entry
of d).
I've tried (naively):
float hst(VectorXf& d,int& m){
// VectorXf d = VectorXf::Random(n);
std::nth_element(d.data().begin(),d.data().begin()+m,d.data().end());
return d(m);
}
but there is two problems with it:
it doesn't work
even if it did work, i still have to pass over (a copy) of d once to find the indices
of those entries that are smaller than d(m). Is this necessary?
Best,

std::nth_element is what you want (contrary to what I said before). It does a partial so that the elements in the range [first, mth) are less than those in the range [mth, last). So after running nth_element all you have to do copy the first m elements to the new vector.
VextorXf d = VectorXf::Random(n);
VectorXi d_prim(m);
std::nth_element(d.data().begin(), d.data.begin() + m, d.data().end());
std::copy(d.data().begin(), d.data().begin() + m, d_prim.begin());
This answer has more info on algorithms to do this.

Putting together David Brown's and Kerrek SB answers i got this as "the most efficient proposal":
VectorXi hst(VectorXf& d,int& h){
VectorXf e = d;
VectorXi f(h);
int j=0;
std::nth_element(d.data(),d.data()+h,d.data()+d.size());
for(int i=0;i<d.size();i++){
if(e(i)<=d(h)){
f(j)=i;
j++;
if(j==h) break;
}
}
return f;
}

Related

Nearest permutation to given array

Question
I have two arrays of integers A[] and B[]. Array B[] is fixed, I need to to find the permutation of A[] which is lexiographically smaller than B[] and the permutation is nearest to B[]. Here what I mean is:
for i in (0 <= i < n)
abs(B[i]-A[i]) is minimum and A[] should be smaller than B[] lexiographically.
For Example:
A[]={1,3,5,6,7}
B[]={7,3,2,4,6}
So,possible nearest permutation of A[] to B[] is
A[]={7,3,1,6,5}
My Approach
Try all permutation of A[] and then compare that with B[]. But the time complexity would be (n! * n)
So is there any way to optimize this?
EDIT
n can be as large as 10^5
For better understanding
First, build an ordered map of the counts of the distinct elements of A.
Then, iterate forward through array indices (0 to n−1), "withdrawing" elements from this map. At each point, there are three possibilities:
If i < n-1, and it's possible to choose A[i] == B[i], do so and continue iterating forward.
Otherwise, if it's possible to choose A[i] < B[i], choose the greatest possible value for A[i] < B[i]. Then proceed by choosing the largest available values for all subsequent array indices. (At this point you no longer need to worry about maintaining A[i] <= B[i], because we're already after an index where A[i] < B[i].) Return the result.
Otherwise, we need to backtrack to the last index where it was possible to choose A[i] < B[i], then use the approach in the previous bullet-point.
Note that, despite the need for backtracking, the very worst case here is three passes: one forward pass using the logic in the first bullet-point, one backward pass in backtracking to find the last index where A[i] < B[i] was possible, and then a final forward pass using the logic in the second bullet-point.
Because of the overhead of maintaining the ordered map, this requires O(n log m) time and O(m) extra space, where n is the total number of elements of A and m is the number of distinct elements. (Since m ≤ n, we can also express this as O(n log n) time and O(n) extra space.)
Note that if there's no solution, then the backtracking step will reach all the way to i == -1. You'll probably want to raise an exception if that happens.
Edited to add (2019-02-01):
In a now-deleted answer, גלעד ברקן summarizes the goal this way:
To be lexicographically smaller, the array must have an initial optional section from left to right where A[i] = B[i] that ends with an element A[j] < B[j]. To be closest to B, we want to maximise the length of that section, and then maximise the remaining part of the array.
So, with that summary in mind, another approach is to do two separate loops, where the first loop determines the length of the initial section, and the second loop actually populates A. This is equivalent to the above approach, but may make for cleaner code. So:
Build an ordered map of the counts of the distinct elements of A.
Initialize initial_section_length := -1.
Iterate through the array indices 0 to n−1, "withdrawing" elements from this map. For each index:
If it's possible to choose an as-yet-unused element of A that's less than the current element of B, set initial_section_length equal to the current array index. (Otherwise, don't.)
If it's not possible to choose an as-yet-unused element of A that's equal to the current element of B, break out of this loop. (Otherwise, continue looping.)
If initial_section_length == -1, then there's no solution; raise an exception.
Repeat step #1: re-build the ordered map.
Iterate through the array indices from 0 to initial_section_length-1, "withdrawing" elements from the map. For each index, choose an as-yet-unused element of A that's equal to the current element of B. (The existence of such an element is ensured by the first loop.)
For array index initial_section_length, choose the greatest as-yet-unused element of A that's less than the current element of B (and "withdraw" it from the map). (The existence of such an element is ensured by the first loop.)
Iterate through the array indices from initial_section_length+1 to n−1, continuing to "withdraw" elements from the map. For each index, choose the greatest element of A that hasn't been used yet.
This approach has the same time and space complexities as the backtracking-based approach.
There are n! permutations of A[n] (less if there are repeating elements).
Use binary search over range 0..n!-1 to determine k-th lexicographic permutation of A[] (arbitrary found example) which is closest lower one to B[].
Perhaps in C++ you can exploit std::lower_bound
Based on the discussion in the comment section to your question, you seek an array made up entirely of elements of the vector A that is -- in lexicographic ordering -- closest to the vector B.
For this scenario, the algorithm becomes quite straightforward. The idea is the same as as already mentioned in the answer of #ruakh (although his answer refers to an earlier and more complicated version of your question -- that is still displayed in the OP -- and is therefore more complicated):
Sort A
Loop over B and select the element of A that is closest to B[i]. Remove that element from the list.
If no element in A is smaller-or-equal than B[i], pick the largest element.
Here is the basic implementation:
#include <string>
#include <vector>
#include <algorithm>
auto get_closest_array(std::vector<int> A, std::vector<int> const& B)
{
std::sort(std::begin(A), std::end(A), std::greater<>{});
auto select_closest_and_remove = [&](int i)
{
auto it = std::find_if(std::begin(A), std::end(A), [&](auto x) { return x<=i;});
if(it==std::end(A))
{
it = std::max_element(std::begin(A), std::end(A));
}
auto ret = *it;
A.erase(it);
return ret;
};
std::vector<int> ret(B.size());
for(int i=0;i<(int)B.size();++i)
{
ret[i] = select_closest_and_remove(B[i]);
}
return ret;
}
Applied to the problem in the OP one gets:
int main()
{
std::vector<int> A ={1,3,5,6,7};
std::vector<int> B ={7,3,2,4,6};
auto C = get_closest_array(A, B);
for(auto i : C)
{
std::cout<<i<<" ";
}
std::cout<<std::endl;
}
and it displays
7 3 1 6 5
which seems to be the desired result.

Getting minimum values and index from a matrix without repeating index C++

I have a vector< vector <int> > matrix of size n and I want to get the minimum value for each i and it indexs [i][j] and put it on a vector but I don't want to get any indexs repeated.
I've found a theoretical way but I cannot write it in code.
Make 2 vectors U←{1,...,n}, L←{1,...,n}
Repeat n times
Be (u,l)∈U×L from matrix[u,l] ≤ matrix[i,j], ∀i∈U, ∀j∈L
S[u] ← l
Do U←U-{u} y L←L-{l}
You can code this algorithm directly
typedef vector<vector<int>> Matrix;
typedef pair<size_t, size_t> Index;
typedef vector<Index> IndexList;
IndexList MinimalSequence(const Matrix& matrix) {
IndexList result;
set<size_t> U, L;
for (size_t i = 0; i < matrix.size(); ++i) { // consider square
U.insert(i);
L.insert(i);
}
while (U.size()) { // same as L.size()
int min = numeric_limits<int>::max();
Index minIndex;
for (auto u: U)
for (auto l: L)
if (matrix[u][l] < min) {
minIndex = make_pair(u, l);
min = matrix[u][l];
}
U.erase(minIndex.first);
L.erase(minIndex.second);
result.push_back(minIndex);
}
return result;
}
also your question is not clear in this way: do you want to start from the overall smallest element of the matrix (as your formula said) and then move to the next smallest?
or do you want to move through the columns from left to right? I implemented it according to formulas.
Note that set of non-negative integers in your formula is set<size_t> on which insert() and erase() are available. For all is while-loop
I would also suggest to try alternative algorithm - sort a list of matrix indices by there corresponding values and then iterate over it removing indices you dont want anymore.
edit: code actually differs from algorithm in few ways to be precise. That seemed more practical.
process is repeated until set of indices is exhausted - that is equal to n
return structure is list of 2d indices and encodes more information than array
You already have accepted an answer, but aren't you facing the Assignment problem that can be solved using the Hugarian algorithm, and maybe even more efficient algorithms that exists and are already implemented?

Recursive Divide and Conquer Algorithm Modification

So in my textbook there is this block of code to find the maximum element in an array by using the divide and conquer recursive algorithm:
Item max(Item a[], int l, int r)
{
if (l == r) return a[1];
int m = (l+r)/2;
Item u = max(a, l, m);
Item v = max(a, m+1, r);
if (u > v) return u; else return v;
}
For one of the questions following the code, it asks me to modify that program so that I find the maximum element in an array by dividing an array of size N into one part of size k = 2^((lgN)-1) and another of size N-k (so that the size of at least one of the parts is a power of 2.
So I'm trying to solve that, and I just realized I wouldn't be able to do an exponent in code. How am I supposed to implement dividing one array into size k = 2^((lgN)-1)?
Both logs and exponentials can be computed using functions in the standard library.
But a simple solution would be to start at 1 and keep doubling until you reach a number bigger than desired. Going back one step then give you your answer.
(Of course the whole idea is mad - this algorithm is much more complex and slower than the obvious linear scan. But I'll assume there is some method in the madness.)
This finds maximum k being a power of 2 and less than the number of array items (so the array part is divided into two non-empty parts):
Item max(Item a[], int l, int r)
{
if (l == r) return a[r];
int s = r-l, k = 1;
while (2*k <= s)
k = 2*k;
Item u = max(a, l, l+k-1);
Item v = max(a, l+k, r);
return u > v ? u : v;
}
However this is not necessarily the best possible choice. For example you might want to seek such k which is closest to the half of the array's length (for 10 items that would be k=4 instead of 8).
Or you may try to partition the array into two parts both with lengths being powers of 2 (if possible, for 10 items it would be 8+2)...

Finding the efficiency of a search? algorithm. C++

I've been told to find the efficiency of this code, and we've been ~1 hour (me and my partner) trying to find out what this code really does.
We supposed this is a search algorithm, but we can't really find a way to make it work w/o getting into an infinite loop:
int busq(int *v, int x, int b, int a){
int m1, m2;
int result;
m1 = (b+a) / 3;
m2 = 2*m1;
if (v[m1] == x)
result = m1;
else
if (v[m2] == x)
result = m2;
else
if (x<v[m1])
result = busq(v, x, b, m1-1);
else
if (x>v[m2])
result = busq(v, x, m2+1, a);
else
result = busq(v, x, m1+1, m2-1);
return result;
}
That's all we are given, no value for the parameters a,b or x, not the size of *v (the vector) or the content of the vector.
It's supposed to be possible to solve it like this.
If anything we want to know what this code does, but if you can tell us the efficiency, it will be appreciated as well. (We use the O() notation E.J.: O(1), O(n^2)...)
It's basically ternary search. v has to be a sorted array, x is the value searched for and b ist the begin of the range and a is the end (exclusive).
The function attempts to divide the range into three about equal partitions at m1, m2 (which are both calculated wrong and only work if you search for the first element) and checks whether x lies on the bounds. If not, it recurses with the partition x has to lie in.
The code can be fixed with
m1=b+(a-b)/3;
m2=b+(a-b)*2/3;
Then, the efficiency should be O(log n)

Combinations of Multiple Vector's Elements Without Repetition

I have n amount of vectors, say 3, and they have n amount of elements (not necessarily the same amount). I need to choose x amount of combinations between them. Like choose 2 from vectors[n].
Example:
std::vector<int> v1(3), v2(5), v3(2);
There cannot be combinations from one vector itself, like v1[0] and v1[1]. How can I do this?
I've tried everything, but cannot figure this out.
If I understand you correctly you have N vectors, each with a different number of elements (call the size of the ith vector Si) and you which to choose M combinations of elements from these vectors without repetition. Each combination would be N elements, one element from each vector.
In this case the number of possible permutations is the product of the sizes of the vectors, which, for lack of some form of equation setting I'll call P and compute in C++:
std::vector<size_t> S(N);
// ...populate S...
size_t P = 1;
for(size_t i=0;i<S.size();++i)
P *= S[i];
So now the problem becomes one of picking M distinct numbers between 0 and P-1, then converting each of those M numbers into N indices into the original vectors. I can think of a few ways to compute those M numbers, perhaps the easiest is to keep drawing random numbers until you get M distinct ones (effectively rejection sampling from the distribution).
The slightly more convoluted part is to turn each of your M numbers into a vector of indices. We can do this with
size_t m = /* ... one of the M permutations */;
std::vector<size_t> indices_m(N);
for(size_t i=0; i<N; ++i)
{
indices[i] = m % S[i];
m /= S[i];
}
which basically chops m up into chunks for each index, much like you would when indexing a 2D array represented as a 1D array.
Now if we take your N=3 example we can get the 3 elements of our permutation with
v1[indices[0]]
v2[indices[1]]
v3[indices[2]]
generating as many distinct values of m as required.
Probably the confusion rises from improper definition of the problem. Guessing that you need to N times pick 1 element from 1 of V vectors, you can do this:
select N of the V vectors you want to pick from (N <= V)
for each of the selected vectors, select 1 of the vector.size() elements.