Computing truncated mean between two forward indicators - c++

I have already computed the truncated mean of a vector via the function truncated_mean(std::vector& v, double trimming fraction). This function takes as inputs the vector v and the fraction that we want to remove to calculate the mean (e.g. 10% so we remove the highest and lowest 10% values and then we compute the mean), I created it using the Standard Library.
For example, v = [0,1,2....,9], then truncated_mean(v, 0.10) = 4.5.
Now, I want to reuse the same function but instead of having v as input, I want to have 2 forward iterators, v.begin() and v.end(). I am provided with the template of typename forward that I should use to check if its value_type (accessed via std::iterator_traits) meets a certain criteria. My understanding of the problem is that first I need to check if the inputs belong to a vector and from there I should access the vector in itself to compute the truncated mean.
How can I adapt my function to take as input the beginning and end of the vector rather than the vector itself?

Assuming sequence passed in is sorted you could simply use std::distance to figure out the length and skip the appropriate number of elements at the start and the end:
Edit: Extended code to use std::accumulate for random access iterators; Use concepts instead of distinguishing iterator types vis additional parameter, if you're allowed to use C++20 features.
template<typename RandomAccessIterator>
double truncated_mean_impl(RandomAccessIterator begin, RandomAccessIterator end, double trimming_fraction, std::random_access_iterator_tag)
{
if (trimming_fraction < 0)
{
throw std::range_error("trimming_fraction must not be negative");
}
if(trimming_fraction >= 0.5)
{
return std::numeric_limits<double>::quiet_NaN(); // no elements left after trimming
}
auto const count = std::distance(begin, end);
auto const skippedElementCountFront = static_cast<decltype(count)>(count * trimming_fraction);
auto const summandCount = count - 2 * skippedElementCountFront;
return std::accumulate<RandomAccessIterator, double>(begin + skippedElementCountFront, end - skippedElementCountFront, 0) / summandCount;
}
template<typename ForwardIterator>
double truncated_mean_impl(ForwardIterator begin, ForwardIterator end, double trimming_fraction, std::forward_iterator_tag)
{
if (trimming_fraction < 0)
{
throw std::range_error("trimming_fraction must not be negative");
}
if(trimming_fraction >= 0.5)
{
return std::numeric_limits<double>::quiet_NaN(); // no elements left after trimming
}
auto const count = std::distance(begin, end);
auto const skippedElementCountFront = static_cast<decltype(count)>(count * trimming_fraction);
// skip elements in the front
for (auto i = skippedElementCountFront; i != 0; --i, ++begin) {}
auto const summandCount = count - 2 * skippedElementCountFront;
double sum = 0;
for (auto i = summandCount; i != 0; --i, ++begin)
{
sum += *begin;
}
return sum / summandCount;
}
template<typename ForwardIterator>
double truncated_mean(ForwardIterator begin, ForwardIterator end, double trimming_fraction)
{
return truncated_mean_impl<ForwardIterator>(begin, end, trimming_fraction, typename std::iterator_traits<ForwardIterator>::iterator_category());
}
int main()
{
std::vector<int> const values { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
std::cout << truncated_mean(values.cbegin(), values.cend(), 0.1) << '\n';
}
If the input sequence is not sorted and you cannot or don't want to sort the input copying the elements to a new vector and applying your original algorithm to this vector would probably be best.

Related

Binary Search Vector for Closest Value C++

Like the title says I am trying to use a binary search method to search a sorted vector for the closest given value and return its index. I have attempted to use lower/upper_bound() but the returned value is either the first or last vector value, or "0". Below is the txt file which i have read the temp and voltage into vectors.
1.4 1.644290 -12.5
1.5 1.642990 -13.6
1.6 1.641570 -14.8
1.7 1.640030 -16.0
1.8 1.638370 -17.1
This is my current linear search that works
double Convert::convertmVtoK(double value) const
{
assert(!mV.empty());
auto it = std::min_element(mV.begin(), mV.end(), [value] (double a, double b) {
return std::abs(value - a) < std::abs(value - b);
});
assert(it != mV.end());
int index = std::distance(mV.begin(), it);
std::cout<<kelvin[index];
return kelvin[index];
}
This is the algorithm I am currently trying to get working to improve performance.
double Convert::convertmVtoK(double value)
{
auto it = lower_bound(mV.begin(), mV.end(), value);
if (it == mV.begin())
{
it = mV.begin();
}
else
{
--it;
}
auto jt = upper_bound(mV.begin(), mV.end(), value), out = it;
if (it == mV.end() || jt != mV.end() && value - *it > *jt - value)
{
out = jt;
}
cout<<"This is conversion mV to K"<<" "<< *out;
Any suggestions would be much appreciated. I believe the issue may lie with the vector being sorted in descending order but i need the order to remain the same in order to compare the values.
SOLVED thanks to #John. For anyone who needs this in the future here is what works.
double Convert::convertmVtoK(double value) const
{
auto it = lower_bound(mV.begin(), mV.end(), value, [](double a, double b){ return a > b; });
int index = std::distance(mV.begin(), it);
std::cout<<kelvin[index];
}
Since you have a non-increasing range (sorted in descending order), you can use std::lower_bound with a greater than operator, as mentioned in comments. However, this only gets you the first result past or equal to your number. It doesn't mean it's the "closest", which is what you asked for.
Instead, I would use std::upper_bound, so you don't have to check for equality (on double just to make it worse) and then drop back one to get the other bounding data point, and compute which one is actually closer. Along with some boundary checks:
#include <vector>
#include <algorithm>
#include <iostream>
#include <functional>
#include <iterator>
// for nonincreasing range of double, find closest to value, return its index
int index_closest(std::vector<double>::iterator begin, std::vector<double>::iterator end, double value) {
if (begin == end){
// we're boned
throw std::exception("index_closest has no valid index to return");
}
auto it = std::upper_bound(begin, end, value, std::greater<double>());
// first member is closest
if (begin == it)
return 0;
// last member is closest. end is one past that.
if (end == it)
return std::distance(begin, end) - 1;
// between two, need to see which is closer
double diff1 = abs(value - *it);
double diff2 = abs(value - *(it-1));
if (diff2 < diff1)
--it;
return std::distance(begin, it);
}
int main()
{
std::vector<double> data{ -12.5, -13.6, -14.8, -16.0, -17.1 };
for (double value = -12.0; value > -18.99; value = value - 1.0) {
int index = index_closest(data.begin(), data.end(), value);
std::cout << value << " is closest to " << data[index] << " at index " << index << std::endl;
}
}
output
-12 is closest to -12.5 at index 0
-13 is closest to -12.5 at index 0
-14 is closest to -13.6 at index 1
-15 is closest to -14.8 at index 2
-16 is closest to -16 at index 3
-17 is closest to -17.1 at index 4
-18 is closest to -17.1 at index 4
Note that, e.g. -14 is closer to -13.6 than -14.8, as a specific counterexample to your current working point. Also note the importance of inputs at both end points.
From there you are welcome to take kelvin[i]. I wasn't happy with using an external data array for the function's return value when you don't need to do that, so I just returned the index.
You might use the following to get the iterator with closest value:
auto FindClosest(const std::vector<double>& v, double value)
{
// assert(std::is_sorted(v.begin(), v.end(), std::greater<>{}));
auto it = std::lower_bound(v.begin(), v.end(), value, std::greater<>{});
if (it == v.begin()) {
return it;
} else if (it == v.end()) {
return it - 1;
} else {
return std::abs(value - *it) < std::abs(value - *(it - 1)) ?
it : it - 1;
}
}
This method works but am not 100% sure it always gives closest value. Incorporated part of #KennyOstrom 's method.
double Convert::convertmVtoK(double value) const
{
auto it = lower_bound(mV.begin(), mV.end(), value, [](double a, double b){ return a > b; });
int index = std::distance(mV.begin(), it);
if(value>mV[0] || value < mV.back())
{
std::cout<<"Warning: Voltage Out of Range"<<"\n";
}
else if(value==mV[0] || value==mV.back()
||fabs(value - mV[index]) <= 0.0001 * fabs(value))
{
std::cout<<kelvin[index];
return kelvin[index];
}
else
{
double diff1 = std::abs(value - mV[index]);
double diff2 = std::abs(value - mV[index-1]);
if (diff2 < diff1)
{
std::cout<<kelvin[index-1];
return kelvin[index-1];
}
else
{
std::cout<<kelvin[index];
return kelvin[index];
}
}
}

Averaging and decreasing the array (vector) C++

I've got an array (actually std::vector) size ~ 7k elements.
If you draw this data, there will be a diagram of the combustion of the fuel. But I want to minimize this vector from 7k elements to 721 (every 0.5 degree) elements or ~ 1200 (every 0.3 degree). Of course I want save diagram the same. How can I do it?
Now I am getting every 9 element from big vector to new and cutting other evenly from front and back of vector to get 721 size.
QVector <double> newVMTVector;
for(QVector <double>::iterator itv = oldVmtDataVector.begin(); itv < oldVmtDataVector.end() - 9; itv+=9){
newVMTVector.push_back(*itv);
}
auto useless = newVMTVector.size() - 721;
if(useless%2 == 0){
newVMTVector.erase(newVMTVector.begin(), newVMTVector.begin() + useless/2);
newVMTVector.erase(newVMTVector.end() - useless/2, newVMTVector.end());
}
else{
newVMTVector.erase(newVMTVector.begin(), newVMTVector.begin() + useless/2+1);
newVMTVector.erase(newVMTVector.end() - useless/2, newVMTVector.end());
}
newVMTVector.squeeze();
oldVmtDataVector.clear();
oldVmtDataVector = newVMTVector;
I can swear there is an algorithm that averages and reduces the array.
The way I understand it you want to pick the elements [0, k, 2k, 3k ... ] where n is 10 or n is 6.
Here's a simple take:
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride) {
It out = it;
if (stride < 1) return last;
while (it < last)
{
*out++ = *it;
std::advance(it, stride);
}
return out;
}
Generalizing a bit for non-random-access iterators:
Live On Coliru
#include <iterator>
namespace detail {
// version for random access iterators
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride, std::random_access_iterator_tag) {
It out = it;
if (stride < 1) return last;
while (it < last)
{
*out++ = *it;
std::advance(it, stride);
}
return out;
}
// other iterator categories
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride, ...) {
It out = it;
if (stride < 1) return last;
while (it != last) {
*out++ = *it;
for (size_t n = stride; n && it != last; --n)
{
it = std::next(it);
}
}
return out;
}
}
template <typename Range>
auto strided_inplace_reduce(Range& range, size_t stride) {
using std::begin;
using std::end;
using It = decltype(begin(range));
It it = begin(range), last = end(range);
return detail::strided_inplace_reduce(it, last, stride, typename std::iterator_traits<It>::iterator_category{});
}
#include <vector>
#include <list>
#include <iostream>
int main() {
{
std::vector<int> v { 1,2,3,4,5,6,7,8,9 };
v.erase(strided_inplace_reduce(v, 2), v.end());
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout << "\nv: ", " "));
}
{
std::list<int> l { 1,2,3,4,5,6,7,8,9 };
l.erase(strided_inplace_reduce(l, 4), l.end());
std::copy(l.begin(), l.end(), std::ostream_iterator<int>(std::cout << "\nl: ", " "));
}
}
Prints
v: 1 3 5 7 9
l: 1 5 9
What you need is an interpolation. There are many libraries providing many types of interpolation. This one is very lightweight and easy to setup and run:
http://kluge.in-chemnitz.de/opensource/spline/
All you need to do is create the second vector that contains the X values, pass both vectors to generate spline, and generate interpolated results every 0.5 degrees or whatever:
std::vector<double> Y; // Y is your current vector of fuel combustion values with ~7k elements
std::vector<double> X;
X.reserve(Y.size());
double step_x = 360 / (double)Y.size();
for (int i = 0; i < X.size(); ++i)
X[i] = i*step_x;
tk::spline s;
s.set_points(X, Y);
double interpolation_step = 0.5;
std::vector<double> interpolated_results;
interpolated_results.reserve(std::ceil(360/interpolation_step) + 1);
for (double i = 0.0, int j = 0; i <= 360; i += interpolation_step, ++j) // <= in order to obtain range <0;360>
interpolated_results[j] = s(i);
if (fmod(360, interpolation_step) != 0.0) // for steps that don't divide 360 evenly, e.g. 0.7 deg, we need to close the range
interpolated_results.back() = s(360);
// now interpolated_results contain values every 0.5 degrees
This should give you and idea how to use this kind of libraries. If you need some other interpolation type, just find the one that suits your needs. The usage should be similar.

Best way to to average duplicate properties in C++ vector

I have a std::vector<PLY> that holds a number of structs:
struct PLY {
int x;
int y;
int greyscale;
}
Some of the PLY's could be duplicates in terms of their position x and y but not necessarily in terms of their greyscale value. What is the best way to find those (position-) duplicates and replace them with a single PLY instace which has a greyscale value that represents the average greyscale of all duplicates?
E.g: PLY a{1,1,188} is a duplicate of PLY b{1,1,255}. Same (x,y) position possibly different greyscale.
Based on your description of Ply you need these operators:
auto operator==(const Ply& a, const Ply& b)
{
return a.x == b.x && a.y == b.y;
}
auto operator<(const Ply& a, const Ply& b)
{
// whenever you can be lazy!
return std::make_pair(a.x, a.y) < std::make_pair(b.x, b.y);
}
Very important: if the definition "Two Ply are identical if their x and y are identical" is not general valid, then defining comparator operators that ignore greyscale is a bad ideea. In that case you should define separate function objects or non-operator functions and pass them around to function.
There is a nice rule of thumb that a function should not have more than a loop. So instead of a nested 2 for loops, we define this helper function which computes the average of consecutive duplicates and also returns the end of the consecutive duplicates range:
// prereq: [begin, end) has at least one element
// i.e. begin != end
template <class It>
auto compute_average_duplicates(It begin, It end) -> std::pair<int, It>
// (sadly not C++17) concepts:
//requires requires(It i) { {*i} -> Ply; }
{
auto it = begin + 1;
int sum = begin->greyscale;
for (; it != end && *begin == *it; ++it) {
sum += it->greyscale;
}
// you might need rounding instead of truncation:
return std::make_pair(sum / std::distance(begin, it), it);
}
With this we can have our algorithm:
auto foo()
{
std::vector<Ply> v = {{1, 5, 10}, {2, 4, 6}, {1, 5, 2}};
std::sort(std::begin(v), std::end(v));
for (auto i = std::begin(v); i != std::end(v); ++i) {
decltype(i) j;
int average;
std::tie(average, j) = compute_average_duplicates(i, std::end(v));
// C++17 (coming soon in a compiler near you):
// auto [average, j] = compute_average_duplicates(i, std::end(v));
if (i + 1 == j)
continue;
i->greyscale = average;
v.erase(i + 1, j);
// std::vector::erase Invalidates iterators and references
// at or after the point of the erase
// which means i remains valid, and `++i` (from the for) is correct
}
}
You can apply lexicographical sorting first. During sorting you should take care of overflowing greyscale. With current approach you will have some roundoff error, but it will be small as i first sum and only then average.
In the second part you need to remove duplicates from the array. I used additional array of indices to copy every element not more than once. If you have some forbidden value for x, y or greyscale you can use it and thus get along without additional array.
struct PLY {
int x;
int y;
int greyscale;
};
int main()
{
struct comp
{
bool operator()(const PLY &a, const PLY &b) { return a.x != b.x ? a.x < b.x : a.y < b.y; }
};
vector<PLY> v{ {1,1,1}, {1,2,2}, {1,1,2}, {1,3,5}, {1,2,7} };
sort(begin(v), end(v), comp());
vector<bool> ind(v.size(), true);
int s = 0;
for (int i = 1; i < v.size(); ++i)
{
if (v[i].x == v[i - 1].x &&v[i].y == v[i - 1].y)
{
v[s].greyscale += v[i].greyscale;
ind[i] = false;
}
else
{
int d = i - s;
if (d != 1)
{
v[s].greyscale /= d;
}
s = i;
}
}
s = 0;
for (int i = 0; i < v.size(); ++i)
{
if (ind[i])
{
if (s != i)
{
v[s] = v[i];
}
++s;
}
}
v.resize(s);
}
So you need to check, is PLY a1 { 1,1,1 }; duplicates PLY a2 {2,2,1};
So simple method is to override operator == to check a1.x == a2.x and a1.y == a2.y. After you can write own function removeDuplicates(std::vector<PLU>& mPLY); which will use iterators of this vector, compare and remove. But better to use std::list if you want to remove from middle of array too frequently.

Find the minimum number +ve number in c++?

I want to find the minimum number using STL in C++, I know the syntax should be min(x,y). But I want to find the minimum +ve numbers in the list. Not inlcuding the -ves. How do I do that?
P.S My numbers are in an array
For finding the minimum number, it makes sense to use std::min_element. Fortunately, it comes with an optional comparison parameter, which we can make use of: (sample here)
auto pos = std::min_element(std::begin(arr), std::end(arr),
[](const T &t1, const T &t2) {return t1 > 0 && (t2 <= 0 || t1 < t2);}
);
You just have to be careful to take into account that if it's comparing a positive t1 to a negative number, it should always be true. If none of the elements are positive, this will give the location of the first number in the array. If 0 should be treated as part of the positives, change t1 > 0 to t1 >= 0 and t2 <= 0 to t2 < 0.
I'd use std::accumulate with a suitable operation:
auto minpos = std::accumulate(myrange.begin(), myrange.end(), MAX_VALUE,
[](T acc, T x)
{ return (x > 0 && x < acc) ? x : acc; });
Here T is the type of your elements and MAX_VALUE is the maximal value of that type (e.g. defined as std::numeric_limits<T>::max()).
First use the remove_if algorithm to move all the negative numbers to the end of the collection, then call min_element on the positive range. In C++11
auto pos = remove_if(coll.begin(), coll.end(), [](int x){ return x < 0; });
auto min = *min_element(coll.begin(), pos);
If you're not using C++11 just replace the lambda with a pre-canned functor from like less<>
You may use std::min_element with Boost::filter_iterator
Something like:
struct is_positive_number {
bool operator()(int x) const { return 0 < x; }
};
void foo(const std::vector<int>& numbers)
{
typedef boost::filter_iterator<is_positive_number, base_iterator> FilterIter;
is_positive_number predicate;
FilterIter filter_iter_begin(predicate, begin(numbers), end(numbers + N));
FilterIter filter_iter_end(predicate, end(numbers + N), end(numbers + N));
FilterIter it = std::min_element(filter_iter_begin, filter_iter_end);
if (it != filter_iter_end) {
// *it is the min elem
} else {
// no positive numbers.
}
}

Get maximum element of std::vector with strides

I have a std::vector<float> with the following layout of data
x1 | y1 | z1 | x2 | y2 | z2 | .... | xn | yn | zn
I'm trying to figure out an STL-ish way to get the maximum x element as well as y or z
The obvious
double xyzmax = *std::max_element(myvector.begin(),myvector.end() );
picks the absolute maximum and does not allow me to specify the stride.
Is there some trick with no for loops?
You could use the Boost.Iterator library and the boost::iterator_facade to create a strided iterator that can be initalized with a std::vector<float>::iterator and for which ++it does it += 3; on the underlying iterator.
Given such an iterator of type StrideIt, you could write
maxX = *std::max_element(StrideIt(v.begin() + 0), StrideIt(v.end() - 2));
maxY = *std::max_element(StrideIt(v.begin() + 1), StrideIt(v.end() - 1));
maxZ = *std::max_element(StrideIt(v.begin() + 2), StrideIt(v.end() - 0));
This is preferably to redefining the algorithms because there are many more algorithms than iterator types.
If you want maximum flexibility, you could make StrideIt a class template taking the type (float in your case) and a runtime construtor argument defining the stride (3 in your case).
Here is a reference implementation of std::max_element.
template<class ForwardIt>
ForwardIt max_element(ForwardIt first, ForwardIt last)
{
if (first == last) {
return last;
}
ForwardIt largest = first;
++first;
for (; first != last; ++first) {
if (*largest < *first) {
largest = first;
}
}
return largest;
}
You can create your own algorithm by modifying this in the following way:
template<class ForwardIt>
ForwardIt max_element_nth(ForwardIt first, ForwardIt last, int n)
{
if (first == last) {
return last;
}
ForwardIt largest = first;
first += n;
for (; first < last; first += n) {
if (*largest < *first) {
largest = first;
}
}
return largest;
}
Of course it has the limitation of working only with random access iterators, but it certainly works for vector.
double xmax = *max_element_nth(myvector.begin(),myvector.end(), 3);
double ymax = *max_element_nth(myvector.begin()+1,myvector.end(), 3);
double zmax = *max_element_nth(myvector.begin()+2,myvector.end(), 3);
But I'd rather do it by storing the (x, y, z) values in a structure, and take a vector of that. Then, you can use the standard max_element with a custom comparator.