If I want to find a median (it is equivalent to minimize a function |z - xi|), I can use the following code snippet:
std::vector<int> v{5, 6, 4, 3, 2, 6, 7, 9, 3};
std::nth_element(v.begin(), v.begin() + v.size()/2, v.end());
std::cout << "The median is " << v[v.size()/2] << '\n';
Is there something like this, to find "median" for minimization of (z-xi)^2? That is, I want to find an element of the array in which the sum of these functions will be minimal.
If you want to find the nth_element() according to a predicate comparing (z - xi) ^ 2 you could just add the corresponding logic to the binary predicate you can optionally pass to nth_element():
auto trans = [=](int xi){ return (z - xi) * (z - xi); };
std::nth_element(v.begin(), v.begin() + v.size() / 2, v.end(),
[&](int v0, int v1) { return trans(v0) < trans(v1); });
From the question it isn't clearly whether z or xi is the changing variable. From the looks of it I assumed xi is meant to be xi. If z is changing, just rename the argument in the lambda trans (which I just also gave a = in the capture...).
Your question works on at least two different levels: You're asking how to implement a certain algorithm idiomatically in C++11, and at the same time you're asking for an efficient algorithm for computing the mean of a list of integers.
You correctly observe that to compute the median, all we have to do is run the QuickSelect algorithm with k set equal to n/2. In the C++ standard library, QuickSelect is spelled std::nth_element:
int v[] = { 5, 6, 4, 3, 2, 6, 7, 9, 3 };
const int k = std::size(v) / 2;
std::nth_element(std::begin(v), &v[k], std::end(v)); // mutate in-place
int median = v[v.size()/2]; // now the k'th element is
(For std::size, see proposal N4280, coming soon to a C++17 near you! Until then, use your favorite NELEM macro, or go back to using heap-allocated vector.)
This QuickSelect implementation doesn't really have anything to do with "finding array element xk such that ∑i |xi − xk| is minimized." I mean, it's mathematically equivalent, yes, but there's nothing in the code that corresponds to summing or subtracting integers.
The naïve algorithm to "find array element xk such that ∑i |xi − xk| is minimized" is simply
int v[] = { 5, 6, 4, 3, 2, 6, 7, 9, 3 };
auto sum_of_differences = [v](int xk) {
int result = 0;
for (auto&& xi : v) {
result += std::abs(xi - xk);
}
return result;
};
int median =
std::min_element(std::begin(v), std::end(v), [](int xa, int xb) {
return sum_of_differences(xa) < sum_of_differences(xb);
});
This is a horribly inefficient algorithm, given that QuickSelect does the same job.
However, it's trivial to extend this code to work with any mathematical function you want to "minimize the sum of". Here's the same skeleton of code, but with the function "squared difference" instead of "difference":
int v[] = { 5, 6, 4, 3, 2, 6, 7, 9, 3 };
auto sum_of_squared_differences = [v](int xk) {
int result = 0;
for (auto&& xi : v) {
result += (xi - xk) * (xi - xk);
}
return result;
};
int closest_element_to_the_mean =
std::min_element(std::begin(v), std::end(v), [](int xa, int xb) {
return sum_of_squared_differences(xa) < sum_of_squared_differences(xb);
});
In this case we can also find an improved algorithm; namely, compute the mean up front and only afterward scan the array looking for the element that's closest to that mean:
int v[] = { 5, 6, 4, 3, 2, 6, 7, 9, 3 };
double actual_mean = std::accumulate(std::begin(v), std::end(v), 0.0) / std::size(v);
auto distance_to_actual_mean = [=](int xk) {
return std::abs(xk - actual_mean);
};
int closest_element_to_the_mean =
std::min_element(std::begin(v), std::end(v), [](int xa, int xb) {
return distance_to_actual_mean(xa) < distance_to_actual_mean(xb);
});
(P.S. – remember that none of the above code snippets should be used in practice, unless you're absolutely sure you don't need to care about integer overflow, floating-point rounding error, and a host of other mathy issues.)
Given an array x1, x2, …, xn of integers, the real number z that minimizes ∑i∈{1,2,…,n} (z - xi)2 is the mean z* = (1/n) ∑i∈{1,2,…,n} xi. You want to call std::min_element with a comparator that treats xi as less than xj if and only if |n xi - n z*| < |n xj - n z*| (we use n z* = ∑i∈{1,2,…,n} xi to avoid floating-point arithmetic; there are ways to reduce the extra precision required).
Related
I have a (quite large) standard C++ array of type double, with ~50,000,000 rows and 20 columns. The array is filled with random data, according to some Gaussian distribution (if that's of any use in answering this question).
I've written an algorithm to solve a problem using this array. A significant part of this algorithm's time is spent iterating, row by row (and sometimes over the same row more than once) and returning, for each row, the index of every element in that row such that the absolute value of that element exceeds some value (also of type double).
Unfortunately, the algorithm is quite slow. As it's rather large, and the problem being solved is a bit complex for simply dumping the code here on SO, I'd like to start by tacking this issue. What is the most efficient (or, at least, a more efficient way) to grab the index of every element in a row of a multidimensional array?
What I've tried:
I've tried simply iterating through each row (with an iterator), passing each value to fabs(), and using std::distance() to get the index. I then store it in an std::set() (I don't care much about how the indices are stored, unless that is a significant speed factor, so long as they are "easily accessible").
I.e.:
for(auto it = row.begin(); it != row.end(); ++it){
auto &element = *it;
if(fabs(element) >= threshold){
cache.insert(std::distance(row.begin(), it));
}
}
I've also tried using std::find_if, and similarly through std::range. Neither gave measurable speed improvements (admittedly, I haven't used particularly scientific benchmarks, however I'm going for a visibly noticeable improvement).
I.e. something like this:
auto exceeds_thresh = [](double x){ return x > threshold}
it = ranges::find_if(row, exceeds_thresh);
while(it != end(row)){
resuts.emplace_back(distance(begin(row), it));
it = ranges::find_if(std::next(it), std::end(row), exceeds_thresh)
}
Note that, by efficiency, I'm focusing on speed
Here, 11.3, 9.8, 17.5 satisfy the condition, so their indices 1,3,6 should be printed. Note that, in practice, each array is a row in a far larger array (as above), and with far greater number of elements in each row:
double row_of_array[5] = {1.4, 11.3, 4.2, 9.8, 0.1, 3.2, 17.5};
double threshold = 8;
for(auto it = row_of_array.begin(); it != row_of_array.end(); ++it){
auto &element = *it;
if(fabs(element) > threshold){
std::cout << std::distance(row_of_array.begin(), it) << "\n";
}
}
You can try loop unrolling
double row_of_array[] = {1, 11, 4, 9, 0, 3, 17};
constexpr double threshold = 8;
std::vector<int> results;
results.reserve(20);
for(int i{}, e = std::ssize(row_of_array); i < e; i += 4)
{
if(std::abs(row_of_array[i]) > threshold)
results.push_back(i);
if(i + 1 < e && std::abs(row_of_array[i + 1]) > threshold)
results.push_back(i + 1);
if(i + 2 < e && std::abs(row_of_array[i + 2]) > threshold)
results.push_back(i + 2);
if(i + 3 < e && std::abs(row_of_array[i + 3]) > threshold)
results.push_back(i + 3);
}
EDIT:
or the riskier
double row_of_array[20] = {1, 11, 4, 9, 0, 3, 17};
constexpr double threshold = 8;
std::vector<int> results;
results.reserve(20);
static_assert(std::ssize(row_of_array) % 4 == 0, "only works for mul of 4");
for(int i{}, e = std::ssize(row_of_array); i < e; i += 4)
{
if(std::abs(row_of_array[i]) > threshold) results.push_back(i);
if(std::abs(row_of_array[i + 1]) > threshold) results.push_back(i + 1);
if(std::abs(row_of_array[i + 2]) > threshold) results.push_back(i + 2);
if(std::abs(row_of_array[i + 3]) > threshold) results.push_back(i + 3);
}
I am studying Dynamic Programming on GeeksForGeeks and have a problem with Tiles Stacking Problem and the way it is solved
A stable tower of height n is a tower consisting of exactly n tiles of unit height stacked vertically in such a way, that no bigger tile is placed on a smaller tile. An example is shown below :
We have infinite number of tiles of sizes 1, 2, …, m. The task is calculate the number of different stable tower of height n that can be built from these tiles, with a restriction that you can use at most k tiles of each size in the tower.
Note: Two tower of height n are different if and only if there exists a height h (1 <= h <= n), such that the towers have tiles of different sizes at height h.
For example:
Input : n = 3, m = 3, k = 1.
Output : 1
Possible sequences: { 1, 2, 3}.
Hence answer is 1.
Input : n = 3, m = 3, k = 2.
Output : 7
{1, 1, 2}, {1, 1, 3}, {1, 2, 2},
{1, 2, 3}, {1, 3, 3}, {2, 2, 3},
{2, 3, 3}.
The way to solve is to count number of decreasing sequences of length n using numbers from 1 to m where every number can be used at most k times. We can recursively compute count for n using count for n-1.
Declare a 2D array dp[][], where each state dp[i][j] denotes the number of decreasing sequences of length i using numbers from j to m. We need to take care of the fact that a number can be used a most k times. This can be done by considering 1 to k occurrences of a number. Hence our recurrence relation becomes:
Also, we can use the fact that for a fixed j we are using the consecutive values of previous k values of i. Hence, we can maintain a prefix sum array for each state. Now we have got rid of the k factor for each state.
I have read this algorithm for many times but I don't understand it and how to prove the accuracy of it. I have tried to find the guide on the internet but only its variations. Please help me to explain it.
Observe that the largest size tile (m) can appear only at the bottom.
Its appearances are consecutive
Your recurrence becomes:
T(n,m,k) = SIGMA_{i=0,...,k} T(n-i,m-1,k)
Then you have to define the base cases of the recurrence:
T(n,m,1) = // can you tell what this is?
T(n,1,k) = // can you tell what this is?
T(1,m,k) = m // this is easy
We can prove it by forming a logical recurrence:
(A) If the maximum stack height, given m and k is smaller than n, we cannot create any stack.
(B) If only one tile is allowed, we can choose m different sizes for that tile.
(C) If only one size is allowed, if k is greater than or equal to n, we can construct one stack of n tiles of size 1; otherwise, zero stacks.
(D) For each possible count, x, of tiles of size m stacked, we have one way that is multiplied by the number of ways to stack (n - x) tiles, using sizes of at most (m - 1) since we used m.
To convert the recurrence to bottom-up dynamic programming, we initialise the matrix using the base cases of the recurrence, and fill in subsequent entries using its general-case logical branch.
Here's a demonstration of the recurrence in JavaScript (sorry I'm not versed in C++ but the first function, f, which calculates just the count, should be very easy to convert):
// Returns the count
function f(n, m, k){
if (n > m * k)
return 0;
if (n == 1)
return m;
if (m == 1)
return n <= k ? 1 : 0;
let result = 0;
for (let x=0; x<=k; x++)
result += f(n - x, m - 1, k);
return result;
}
// Returns the sequences
function g(n, m, k){
if (n > m * k)
return [];
if (n == 1)
return new Array(m).fill(0).map((_, i) => [i + 1]);
if (m == 1)
return n <= k ? [new Array(n).fill(1)] : [];
let result = [];
for (let x=0; x<=k; x++){
const pfx = new Array(x).fill(m);
const prev = g(n - x, m - 1, k);
for (let s of prev)
result.push(pfx.concat(s));
}
return result;
}
var inputs = [
[3, 3, 1],
[3, 3, 2],
[1, 2, 2]
];
for (let args of inputs){
console.log('' + args);
console.log(f(...args));
console.log(JSON.stringify(g(...args)));
console.log('');
}
I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.
A vertex may not contain any point. So every point has coordinates in {0, 1}D. I am only interested in query time, as long as the memory cost is reasonable ( not exponential in N for example :) ).
Given a query that lies on one of the cube's vertices and an input parameter r, find all the vertices (thus points) that have hamming distance <= r with the query.
What's the way to go in a c++ environment?
I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative, would be appreciated! Since hamming distance comes into play, bitwise manipulations should help (e.g. XOR).
There is a nice bithack to go from one bitmask with k bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with k bits set. XORing these masks with an initial value gives all the values at hamming distance exactly k away from it.
So for D dimensions, where D is less than 32 (otherwise change the types),
uint32_t limit = (1u << D) - 1;
for (int k = 1; k <= r; k++) {
uint32_t diff = (1u << k) - 1;
while (diff <= limit) {
// v is the input vertex
uint32_t vertex = v ^ diff;
// use it
diff = nextBitPermutation(diff);
}
}
Where nextBitPermutation may be implemented in C++ as something like (if you have __builtin_ctz)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
Or for MSVC (not tested)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
unsigned long tzc;
_BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
}
If D is really low, 4 or lower, the old popcnt-with-pshufb works really well and generally everything just lines up well, like this:
uint16_t query(int vertex, int r, int8_t* validmask)
{
// validmask should be array of 16 int8_t's,
// 0 for a vertex that doesn't exist, -1 if it does
__m128i valid = _mm_loadu_si128((__m128i*)validmask);
__m128i t0 = _mm_set1_epi8(vertex);
__m128i r0 = _mm_set1_epi8(r + 1);
__m128i all = _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
__m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4);
__m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
__m128i close_enough = _mm_cmpgt_epi8(r0, dist);
__m128i result = _mm_and_si128(close_enough, valid);
return _mm_movemask_epi8(result);
}
This should be fairly fast; fast compared to the bithack above (nextBitPermutation, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin popcnt, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.
This also scales down to D=3 of course, just make none of the points >= 8 valid. D>4 can be done similarly but it takes more code then, and since this is really a brute force solution that is only fast due to parallelism it fundamentally gets slower exponentially in D.
I have a vector of size n; n is power of 2. I need to treat this vector as a matrix n = R*C. Then I need to transpose the matrix.
For example, I have vector: [1,2,3,4,5,6,7,8]
I need to find R and C. In this case it would be: 4,2. And treat vector as matrix:
[1,2]
[3,4]
[5,6]
[7,8]
Transpose it to:
[1, 3, 5, 7]
[2, 4, 6, 8]
After transposition vector should be: [1, 3, 5, 7, 2, 4, 6, 8]
Is there existing algorithms to perform in-place non-square matrix transposition? I don't want to reinvent a wheel.
My vector is very big so I don't want to create intermediate matrix. I need an in-place algorithm. Performance is very important.
All modofications should be done in oroginal vector. Ideally algorithm should work with chunks that will fit in CPU cache.
I can't use iterator because of memory locality. So I need real transposition.
It does not matter if matrix would be 2x4 or 4x2
The problem can be divided in two parts. First, find R and C and then, reshape the matrix. Here is something I would try to do:
Since n is a power of 2, i.e. n = 2^k then if k is even, we have: R=C=sqrt(n). And if k is odd, then R = 2^((k+1)/2) and C=2^((k-1)/2).
Note: Since you mentioned you want to avoid using extra memory, I have made some editions to my original answer.
The code to calculate R and C would be something like:
void getRandC(const size_t& n, size_t& R, size_t& C)
{
int k = (int)log2(double(n)),
i, j;
if (k & 1) // k is odd
i = (j = (k + 1) / 2) - 1;
else
i = j = k / 2;
R = (size_t)exp2(i);
C = (size_t)exp2(j);
}
Which needs C++11. For the second part, in case you want to keep the original vector:
void transposeVector(const std::vector<int>& vec, std::vector<int>& mat)
{
size_t R, C;
getRandC(vec.size(), R, C);
// first, reserve the memory
mat.resize(vec.size());
// now, do the transposition directly
for (size_t i = 0; i < R; i++)
{
for (size_t j = 0; j < C; j++)
{
mat[i * C + j] = vec[i + R * j];
}
}
}
And, if you want to modify the original vector and avoid using extra memory, you can write:
void transposeInPlace(std::vector<int>& vec)
{
size_t R, C;
getRandC(vec.size(), R, C);
for (size_t j = 0; R > 1; j += C, R--)
{
for (size_t i = j + R, k = j + 1; i < vec.size(); i += R)
{
vec.insert(vec.begin() + k++, vec[i]);
vec.erase(vec.begin() + i + 1);
}
}
}
See the live example
Since you haven't provided us with any of your code, can I suggest a different approach (that I don't know will work for your particular situation)?
I would use an algorithm based on your matrix to transpose your values into the new matrix yourself. Since performance is an issue this will help even more so since you don't have to create another matrix. If this is applicable for you.
Have a vector
[1, 2, 3, 4, 5, 6, 7, 8]
Create your matrix
[1, 2]
[3, 4]
[5, 6]
[7, 8]
Reorder vector without another matrix
[1, 3, 5, 7, 2, 4, 6, 8]
Overwrite the values in the current matrix (so you don't have to create a new one) and reorder the values based on your current matrix.
Add values in order
R1 and C1 to transposed_vector[0]
R2 and C1 to transposed_vector[1]
R3 and C1 to transposed_vector[2]
R4 and C1 to transposed_vector[3]
R1 and C2 to transposed_vector[4]
And so on.
For non square matrix representation, I think it may be tricky, and not worth the effort to make the transpose of your flat vector without creating another one. Here is a snippet of what I came up with:
chrono::steady_clock::time_point start = chrono::steady_clock::now();
int i, j, p, k;
vector<int> t_matrix(matrix.size());
for(k=0; k< R*C ;++k)
{
i = k/C;
j = k - i*C;
p = j*R + i;
t_matrix[p] = matrix[k];
}
cout << chrono::duration_cast<chrono::milliseconds> chrono::steady_clock::now() - start).count() << endl;
Here, matrix is your flat vector, t_matrix is the "transposed" flat vector, and R and C are, respectively rows and vector you found for your matrix representation.
I know there is std::swap that can swap two element in a vector, or iteratively swap two segments with the same length. For example, I can write a loop to swap bcd and efg in the vector abcdefgh , resulting in aefgbcdh. But can I do swap bcd and ef (different length)? Or is there any other function in std::vector that can achieve this?
If the segments are adjacent, you can use std::rotate. For example:
std::vector<int> v = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
Let's say I want to swap the { 1, 2, 3 } with the { 4, 5, 6, 7 }. I can do this:
std::rotate(v.begin() + 1, v.begin() + 4, v.begin() + 8);
If the segments are not adjacent, you can do it by using rotate twice, but it may do more work than is strictly necessary. For example, to swap { 1, 2 } with { 4, 5, 6, 7 }
std::rotate(v.begin() + 1, v.begin() + 4, v.begin() + 8);
std::rotate(v.begin() + 5, v.begin() + 7, v.begin() + 8);
This reduces to an equal length interval swap followed by a rotate.
The equal length interval swap reduces the problem to 'move an interval to another spot', which can be implemented in-place with std::rotate.
If the half open interval [a,b) is moving forward to x, then:
std::rotate( a,b,x );
if moving backward to y then:
std::rotate( y,a,b );
If we assume that we can get the intervals in the correct order (left interval before the right in the sequence), then we can do this:
template<class I>
void swap(I a, I b, I x, I y){
using std::distance; using std::next; using std::advance;
auto d_ab=distance(a,b);
auto d_xy=distance(x,y)
auto d=(std::min)(d_ab,d_xy);
std::swap_ranges(a,next(a,d),x);
advance(a,d);
advance(x,d);
if (a==b){
std::rotate(b,x,y);
}else{
std::rotate(a,b,x);
}
}
micro-optimizations that avoid going over elements more than once can be done, but for vector iterators the extra advances shoukd be next to free.
An industrial strength one would do tag dispatching, write a slick one for random access iterators, and for other iterators do a double iterator loop swap elements (manual swap ranges that bounds checks), with a similar rotate at the end.
If we do not know that the order is a, b, x, y we need to require random access iterators (and gain access to <). Then we need to wrap the std::rotate calls above to change the order of the arguments.
With access to an end iterator, we can do it inefficiently with forward iterators still.
void swap(std::vector<int>& v, int start1, int end1, int start2, int end2)
{
int size2 = end2 - start2 + 1;
int size1 = end1 - start1 + 1;
auto begin = v.begin() + start1;
auto end = v.begin() + end2 + 1;
std::rotate(begin, v.begin() + start2, end);
std::rotate(begin + size2, begin + size2 + size1, end);
}
This is a generic version to do the work using double rotate trick :)