C++ Variadic Function To Sort Arguments In-place - c++

While implementing a variation of a binary search problem, I needed to reorder slicing points (i.e start, mid, end) so that they are stored within the corresponding variable (e.g. (1,5,2) -> (1,2,5)). This is fairly simple to do with a few if statements and swaps. However as a thought experiment, I'm now interested in generalizing this to work with n many T type variables. I started experimenting with some intuitive solutions and as a starting place, I came up with this template function:
template<typename T>
void
sortInPlace(
std::function<bool (const T&, const T&)> compareFunc,
T& start,
T& mid,
T& end)
{
std::vector<T> packed {start, mid, end};
std::sort(packed.begin(), packed.end(), compareFunc);
auto packedAsTuple = make_tuple(packed[0], packed[1], packed[2]);
std::tie(start, mid, end) = packedAsTuple;
}
And when I ran the following, using typedef std::pair<int,int> Pivot:
//Comparison function to sort by pair.first, ascending:
std::function<bool(const Pivot&, const Pivot&)>
comp =[](const Pivot & a, const Pivot & b) {
return std::get < 0 > (a) < std::get < 0 > (b);
};
int main(){
Pivot a(8,1);
Pivot b(2,3);
Pivot c(4,6);
sortInPlace(comp,a,b,c);
}
This turns out to work as intended:
a after sort: 2, 3
b after sort: 4, 6
c after sort: 8, 1
Ideally, the following step is to convert this template into a variadic template but I'm having trouble achieving this. I also have a few things that bother me regarding the current version:
The usage of std::vector was an arbitrary decision. I've done this because it's unclear to me what structure/container is the best to use in packing of these values. Seems to me that the choice of structure needs to be constructed, sorted and unpacked/converted to tuples easily and I'm not sure if there is such a magical structure out there.
Just to get something going, I had to settle with manually packing/unpacking the arguments. It is also unclear to me how to use std::tie or any other unpacking/moving operation with variable number (i.e. unknown in compile-time) of elements.
While there is no particular reason to exclusively use stl functions/structures, I'd be surprised to learn that there is no intuitive way to achieve this using the abstractions provided within stl. As a result, I'm more interested in achieving my goal using minimal help outside of stl.
I embarked on this thought experiment expecting to end up with a syntactically-correct version of std::move(std::sort({x,y,z}, comp), {x,y,z}) and considering where my research has led me so far, I'm starting to think I'm over-complicating this problem. Any help, insight or suggestion would be much appreciated!

One possible C++17 solution with std::sort that generalizes your example:
template<class Comp, class... Ts>
void my_sort(Comp comp, Ts&... values) {
using T = std::common_type_t<Ts...>;
T vals[]{std::move(values)...};
std::sort(std::begin(vals), std::end(vals), comp);
auto it = std::begin(vals);
((values = std::move(*it++)), ...);
}
using Pivot = std::pair<int, int>;
const auto comp = [](Pivot a, Pivot b) {
return std::get<0>(a) < std::get<0>(b);
};
Pivot a(8, 1);
Pivot b(2, 3);
Pivot c(4, 6);
my_sort(comp, a, b, c);
If the number N of parameters in the pack is small, you don't need std::sort at all. Just a series of (hardcoded) comparisons (and for small N the minimum number of comparisons is known exactly) will do the job - see sec. 5.3 Optimum sorting of Knuth's TAOCP vol. 3.

Related

how to sum up a vector of vector int in C++ without loops

I try to implement that summing up all elements of a vector<vector<int>> in a non-loop ways.
I have checked some relevant questions before, How to sum up elements of a C++ vector?.
So I try to use std::accumulate to implement it but I find it is hard for me to overload a Binary Operator in std::accumulate and implement it.
So I am confused about how to implement it with std::accumulate or is there a better way?
If not mind could anyone help me?
Thanks in advance.
You need to use std::accumulate twice, once for the outer vector with a binary operator that knows how to sum the inner vector using an additional call to std::accumulate:
int sum = std::accumulate(
vec.begin(), vec.end(), // iterators for the outer vector
0, // initial value for summation - 0
[](int init, const std::vector<int>& intvec){ // binaryOp that sums a single vector<int>
return std::accumulate(
intvec.begin(), intvec.end(), // iterators for the inner vector
init); // current sum
// use the default binaryOp here
}
);
In this case, I do not suggest using std::accumulate as it would greatly impair readability. Moreover, this function use loops internally, so you would not save anything. Just compare the following loop-based solution with the other answers that use std::accumulate:
int result = 0 ;
for (auto const & subvector : your_vector)
for (int element : subvector)
result += element;
Does using a combination of iterators, STL functions, and lambda functions makes your code easier to understand and faster? For me, the answer is clear. Loops are not evil, especially for such simple application.
According to https://en.cppreference.com/w/cpp/algorithm/accumulate , looks like BinaryOp has the current sum on the left hand, and the next range element on the right. So you should run std::accumulate on the right hand side argument, and then just sum it with left hand side argument and return the result. If you use C++14 or later,
auto binary_op = [&](auto cur_sum, const auto& el){
auto rhs_sum = std::accumulate(el.begin(), el.end(), 0);
return cur_sum + rhs_sum;
};
I didn't try to compile the code though :). If i messed up the order of arguments, just replace them.
Edit: wrong terminology - you don't overload BinaryOp, you just pass it.
Signature of std::accumulate is:
T accumulate( InputIt first, InputIt last, T init,
BinaryOperation op );
Note that the return value is deduced from the init parameter (it is not necessarily the value_type of InputIt).
The binary operation is:
Ret binary_op(const Type1 &a, const Type2 &b);
where... (from cppreference)...
The type Type1 must be such that an object of type T can be implicitly converted to Type1. The type Type2 must be such that an object of type InputIt can be dereferenced and then implicitly converted to Type2. The type Ret must be such that an object of type T can be assigned a value of type Ret.
However, when T is the value_type of InputIt, the above is simpler and you have:
using value_type = std::iterator_traits<InputIt>::value_type;
T binary_op(T,value_type&).
Your final result is supposed to be an int, hence T is int. You need two calls two std::accumulate, one for the outer vector (where value_type == std::vector<int>) and one for the inner vectors (where value_type == int):
#include <iostream>
#include <numeric>
#include <iterator>
#include <vector>
template <typename IT, typename T>
T accumulate2d(IT outer_begin, IT outer_end,const T& init){
using value_type = typename std::iterator_traits<IT>::value_type;
return std::accumulate( outer_begin,outer_end,init,
[](T accu,const value_type& inner){
return std::accumulate( inner.begin(),inner.end(),accu);
});
}
int main() {
std::vector<std::vector<int>> x{ {1,2} , {1,2,3} };
std::cout << accumulate2d(x.begin(),x.end(),0);
}
Solutions based on nesting std::accumulate may be difficult to understand.
By using a 1D array of intermediate sums, the solution can be more straightforward (but possibly less efficient).
int main()
{
// create a unary operator for 'std::transform'
auto accumulate = []( vector<int> const & v ) -> int
{
return std::accumulate(v.begin(),v.end(),int{});
};
vector<vector<int>> data = {{1,2,3},{4,5},{6,7,8,9}}; // 2D array
vector<int> temp; // 1D array of intermediate sums
transform( data.begin(), data.end(), back_inserter(temp), accumulate );
int result = accumulate(temp);
cerr<<"result="<<result<<"\n";
}
The call to transform accumulates each of the inner arrays to initialize the 1D temp array.
To avoid loops, you'll have to specifically add each element:
std::vector<int> database = {1, 2, 3, 4};
int sum = 0;
int index = 0;
// Start the accumulation
sum = database[index++];
sum = database[index++];
sum = database[index++];
sum = database[index++];
There is no guarantee that std::accumulate will be non-loop (no loops). If you need to avoid loops, then don't use it.
IMHO, there is nothing wrong with using loops: for, while or do-while. Processors that have specialized instructions for summing arrays use loops. Loops are a convenient method for conserving code space. However, there may be times when loops want to be unrolled (for performance reasons). You can have a loop with expanded or unrolled content in it.
With range-v3 (and soon with C++20), you might do
const std::vector<std::vector<int>> v{{1, 2}, {3, 4, 5, 6}};
auto flat = v | ranges::view::join;
std::cout << std::accumulate(begin(flat), end(flat), 0);
Demo

select a set of values from tuple with run-time index

Short introduction to my questions:
i'm trying to implement a "sort of" relational database using stl containers. This is just for fun/educational purpose, so no need for answers like "use this library", "this is absolutely useless" and so on.
I know title is a little bit confusing at this point, but we will reach the point (suggestions for improvement to title are really welcome).
I proceeded with little steps:
i can build table as vector of maps from columns name to their values => std::vector<std::map<std::string, some_variant>>. It's simple and it represents what i need.
wait, i can just store column's names once and access values with their index. => std::vector<std::vector<some_variant>>.As simple as point 1, but faster than that.
wait wait, in a database a table is literrally a sequence of tuple => std::vector<std::tuple<args...>>. This is cool, it represents exactly what i'm doing, correct type without variant and even faster than the other.
Note: the "faster than" was measured for
1000000 records with a simple loop like this:
std::random_device dev;
std::mt19937 gen(dev());
std::uniform_int_distribution<long> rand1_1000(1, 1000);
std::uniform_real_distribution<double> rand1_10(1.0, 10.0);
void fill_1()
{
using my_variant = std::variant<long, long long, double, std::string>;
using values = std::map<std::string, my_variant>;
using table = std::vector<values>;
table t;
for (int i = 0; i < 1000000; ++i)
t.push_back({ {"col_1", rand1_1000(gen)}, {"col_2", rand1_1000(gen)}, {"col_3", rand1_10(gen)} });
std::cout << "size:" << t.size() << "\n";//just to prevent optimization
}
2234101600ns - avg:2234
446344100ns - avg:446
132075400ns - avg:132
INSERT:
No problem with any of these solutions, insert are as simple as pushing back elements as in the example.
SELECT:
1 and 2 are simple, but 3 is tricky.
So, finally, questions:
Memory usage: there is a lot of overhead using solution 1 and 2 in term of used memory. So, 3 seems to be again the right choice here.
For the example with 1 million records of 2 longs and a double i was expecteing something near 4MB*2 for longs and 8MB for doubles plus some overhead for vectors, maps and variants where used. Instead we have (measured with windows task manager, not extremely accurate, i know):
1.340 MB
2.120 MB
3.31 MB
Am i missing something? Other than reserving the right size in advance or shrink_to_fit after the insert loop?
Is there a way to run-time retrieve some tuple field as in the case of a select statement?
using my_tuple = std::tuple<long, long, string, double>;
std::vector<my_tuple> table;
int to_select;//this could be a vector of columns to select obviosly
std::cin>>to_select;
auto result = select (table, to_select);
Do you see any chance to implement this last line in any way?
We have two problem for what i see: the result type should take the the type from the starting tuple and then, actually perform the selection of desired fields.
I read a lot of answers about that, they all talk about contiguous indexes using make_index_sequence or complile-time known index.
I also found this article, very interesting, but not really useful for this case.
This is doable but it is strange:
template<size_t candidate, typename ...T>
constexpr std::variant<T...> helperTupleValueAt(const std::tuple<T...>& t, size_t index)
{
if constexpr (candidate >= sizeof...(T)) {
throw std::logic_error("out of bounds");
} else {
if (candidate == index) {
return std::variant<T...>{ std::in_place_index<candidate>, std::get<candidate>(t) };
} else {
return helperTupleValueAt<candidate + 1>(t, index);
}
}
}
template<typename ...T>
std::variant<T...> tupleValueAt(const std::tuple<T...>& t, size_t index)
{
return helperTupleValueAt<0>(t, index);
}
https://wandbox.org/permlink/FQJd4chAFVSg5eSy

Fast way to do lexicographical comparing 2 numbers

I'm trying to sort a vector of unsigned int in lexicographical order.
The std::lexicographical_compare function only supports iterators so I'm not sure how to compare two numbers.
This is the code I'm trying to use:
std::sort(myVector->begin(),myVector->end(), [](const unsigned int& x, const unsigned int& y){
std::vector<unsigned int> tmp1(x);
std::vector<unsigned int> tmp2(y);
return lexicographical_compare(tmp1.begin(),tmp1.end(),tmp2.begin(),tmp2.end());
} );
C++11 introduces std::to_string
You can use from to_string as below:
std::sort(myVector->begin(),myVector->end(), [](const unsigned int& x, const unsigned int& y){
std::string tmp1 = std::to_string(x);
std::string tmp2 = std::to_string(y);
return lexicographical_compare(tmp1.begin(),tmp1.end(),tmp2.begin(),tmp2.end());
} );
I assume you have some good reasons, but allow me to ask: Why are you sorting two int's by using the std::lexicographical order? In which scenario is 0 not less than 1, for example?
I suggest for comparing the scalars you want to use std::less . Same as std lib itself does.
Your code (from the question) might contain a lambda that will use std::less and that will work perfectly. But let us go one step further and deliver some reusable code ready for pasting into your code. Here is one example:
/// sort a range in place
template< typename T>
inline void dbj_sort( T & range_ )
{
// the type of elements range contains
using ET = typename T::value_type;
// use of the std::less type
using LT = std::less<ET>;
// make its instance whose 'operator ()'
// we will use
LT less{};
std::sort(
range_.begin(),
range_.end(),
[&]( const ET & a, const ET & b) {
return less(a, b);
});
}
The above is using std::less<> internally. And it will sort anything that has begin() and end() and public type of the elements it contains. In other words implementation of the range concept.
Example usage:
std::vector<int> iv_ = { 13, 42, 2 };
dbj_sort(iv_);
std::array<int,3> ia_ = { 13, 42, 2 };
dbj_sort(ia_);
std:: generics in action ...
Why is std::less working here? Among other obvious things, because it compares two scalars. std::lexicographical_compare compares two ordinals.
std::lexicographical_compare might be used two compare two vectors, not two elements from one vector containing scalars.
HTH

Very fast sorting of fixed length arrays using comparator networks

I have some performance critical code that involves sorting a very short fixed-length array with between around 3 and 10 elements in C++ (the parameter changes at compile time).
It occurred to me that a static sorting network specialised to each possible input size would perhaps be a very efficient way to do this: We do all the comparisons necessary to figure out which case we are in, then do the optimal number of swaps to sort the array.
To apply this, we use a bit of template magic to deduce the array length and apply the correct network:
#include <iostream>
using namespace std;
template< int K >
void static_sort(const double(&array)[K])
{
cout << "General static sort\n" << endl;
}
template<>
void static_sort<3>(const double(&array)[3])
{
cout << "Static sort for K=3" << endl;
}
int main()
{
double array[3];
// performance critical code.
// ...
static_sort(array);
// ...
}
Obviously it's quite a hassle to code all this up, so:
Does anyone have any opinions on whether or not this is worth the effort?
Does anyone know if this optimisation exists in any standard implementations of, for example, std::sort?
Is there an easy place to get hold of code implementing this kind of sorting network?
Perhaps it would be possible to generate a sorting network like this statically using template magic..
For now I just use insertion sort with a static template parameter (as above), in the hope that it will encourage unrolling and other compile-time optimisations.
Your thoughts welcome.
Update:
I wrote some testing code to compare a 'static' insertion short and std::sort. (When I say static, I mean that the array size is fixed and deduced at compile time (presumably allowing loop unrolling etc).
I get at least a 20% NET improvement (note that the generation is included in the timing). Platform: clang, OS X 10.9.
The code is here https://github.com/rosshemsley/static_sorting if you would like to compare it to your implementations of stdlib.
I have still yet to find a nice set of implementations for comparator network sorters.
Here is a little class that uses the Bose-Nelson algorithm to generate a sorting network on compile time.
/**
* A Functor class to create a sort for fixed sized arrays/containers with a
* compile time generated Bose-Nelson sorting network.
* \tparam NumElements The number of elements in the array or container to sort.
* \tparam T The element type.
* \tparam Compare A comparator functor class that returns true if lhs < rhs.
*/
template <unsigned NumElements, class Compare = void> class StaticSort
{
template <class A, class C> struct Swap
{
template <class T> inline void s(T &v0, T &v1)
{
T t = Compare()(v0, v1) ? v0 : v1; // Min
v1 = Compare()(v0, v1) ? v1 : v0; // Max
v0 = t;
}
inline Swap(A &a, const int &i0, const int &i1) { s(a[i0], a[i1]); }
};
template <class A> struct Swap <A, void>
{
template <class T> inline void s(T &v0, T &v1)
{
// Explicitly code out the Min and Max to nudge the compiler
// to generate branchless code.
T t = v0 < v1 ? v0 : v1; // Min
v1 = v0 < v1 ? v1 : v0; // Max
v0 = t;
}
inline Swap(A &a, const int &i0, const int &i1) { s(a[i0], a[i1]); }
};
template <class A, class C, int I, int J, int X, int Y> struct PB
{
inline PB(A &a)
{
enum { L = X >> 1, M = (X & 1 ? Y : Y + 1) >> 1, IAddL = I + L, XSubL = X - L };
PB<A, C, I, J, L, M> p0(a);
PB<A, C, IAddL, J + M, XSubL, Y - M> p1(a);
PB<A, C, IAddL, J, XSubL, M> p2(a);
}
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 1>
{
inline PB(A &a) { Swap<A, C> s(a, I - 1, J - 1); }
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 2>
{
inline PB(A &a) { Swap<A, C> s0(a, I - 1, J); Swap<A, C> s1(a, I - 1, J - 1); }
};
template <class A, class C, int I, int J> struct PB <A, C, I, J, 2, 1>
{
inline PB(A &a) { Swap<A, C> s0(a, I - 1, J - 1); Swap<A, C> s1(a, I, J - 1); }
};
template <class A, class C, int I, int M, bool Stop = false> struct PS
{
inline PS(A &a)
{
enum { L = M >> 1, IAddL = I + L, MSubL = M - L};
PS<A, C, I, L, (L <= 1)> ps0(a);
PS<A, C, IAddL, MSubL, (MSubL <= 1)> ps1(a);
PB<A, C, I, IAddL, L, MSubL> pb(a);
}
};
template <class A, class C, int I, int M> struct PS <A, C, I, M, true>
{
inline PS(A &a) {}
};
public:
/**
* Sorts the array/container arr.
* \param arr The array/container to be sorted.
*/
template <class Container> inline void operator() (Container &arr) const
{
PS<Container, Compare, 1, NumElements, (NumElements <= 1)> ps(arr);
};
/**
* Sorts the array arr.
* \param arr The array to be sorted.
*/
template <class T> inline void operator() (T *arr) const
{
PS<T*, Compare, 1, NumElements, (NumElements <= 1)> ps(arr);
};
};
#include <iostream>
#include <vector>
int main(int argc, const char * argv[])
{
enum { NumValues = 32 };
// Arrays
{
int rands[NumValues];
for (int i = 0; i < NumValues; ++i) rands[i] = rand() % 100;
std::cout << "Before Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
StaticSort<NumValues> staticSort;
staticSort(rands);
std::cout << "After Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
}
std::cout << "\n";
// STL Vector
{
std::vector<int> rands(NumValues);
for (int i = 0; i < NumValues; ++i) rands[i] = rand() % 100;
std::cout << "Before Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
StaticSort<NumValues> staticSort;
staticSort(rands);
std::cout << "After Sort: \t";
for (int i = 0; i < NumValues; ++i) std::cout << rands[i] << " ";
std::cout << "\n";
}
return 0;
}
Benchmarks
The following benchmarks are compiled with clang -O3 and ran on my mid-2012 macbook air.
Time (in milliseconds) to sort 1 million arrays.
The number of milliseconds for arrays of size 2, 4, 8 are 1.943, 8.655, 20.246 respectively.
Here are the average clocks per sort for small arrays of 6 elements. The benchmark code and examples can be found at this question:
Fastest sort of fixed length 6 int array
Direct call to qsort library function : 342.26
Naive implementation (insertion sort) : 136.76
Insertion Sort (Daniel Stutzbach) : 101.37
Insertion Sort Unrolled : 110.27
Rank Order : 90.88
Rank Order with registers : 90.29
Sorting Networks (Daniel Stutzbach) : 93.66
Sorting Networks (Paul R) : 31.54
Sorting Networks 12 with Fast Swap : 32.06
Sorting Networks 12 reordered Swap : 29.74
Reordered Sorting Network w/ fast swap : 25.28
Templated Sorting Network (this class) : 25.01
It performs as fast as the fastest example in the question for 6 elements.
The code used for the benchmarks can be found here.
It includes more features and further optimizations for more robust performance on real-world data.
The other answers are interesting and fairly good, but I believe that I can provide some additional elements of answer, point per point:
Is it worth the effort? Well, if you need to sort small collections of integers and the sorting networks are tuned to take advantage of some instructions as much as possible, it might be worth the effort. The following graph presents the results of sorting a million arrays of int of size 0-14 with different sorting algorithms. As you can see, the sorting networks can provide a significant speedup if you really need it.
No standard implementation of std::sort I know of use sorting networks; when they are not fine-tuned, they might be slower than a straight insertion sort. libc++'s std::sort has dedicated algorithms to sort 0 thru 5 values at once but they it doesn't use sorting networks either. The only sorting algorithm I know of which uses sorting networks to sort a few values is Wikisort. That said, the research paper Applying Sorting Networks to Synthesize Optimized Sorting Libraries suggests that sorting networks could be used to sort small arrays or to improve recursive sorting algorithms such as quicksort, but only if they are fine-tuned to take advantage of specific hardware instructions.
The access aligned sort algorithm is some kind of bottom-up mergesort that apparently uses bitonic sorting networks implemented with SIMD instructions for the first pass. Apparently, the algorithm could be faster than the standard library one for some scalar types.
I can actually provide such information for the simple reason that I developed a C++14 sorting library that happens to provide efficient sorting networks of size 0 thru 32 that implement the optimizations described in the previous section. I used it to generate the graph in the first section. I am still working on the sorting networks part of the library to provide size-optimal, depth-optimal and swaps-optimal networks. Small optimal sorting networks are found with brute force while bigger sorting networks use results from the litterature.
Note that none of the sorting algorithms in the library directly use sorting networks, but you can adapt them so that a sorting network will be picked whenever the sorting algorithm is given a small std::array or a small fixed-size C array:
using namespace cppsort;
// Sorters are function objects that can be
// adapted with sorter adapters from the
// library
using sorter = small_array_adapter<
std_sorter,
sorting_network_sorter
>;
// Now you can use it as a function
sorter sort;
// Instead of a size-agnostic sorting algorithm,
// sort will use an optimal sorting network for
// 5 inputs since the bound of the array can be
// deduced at compile time
int arr[] = { 2, 4, 7, 9, 3 };
sort(arr);
As mentioned above, the library provides efficient sorting networks for built-in integers, but you're probably out of luck if you need to sort small arrays of something else (e.g. my latest benchmarks show that they are not better than a straight insertion sort even for long long int).
You could probably use template metaprogramming to generate sorting networks of any size, but no known algorithm can generate the best sorting networks, so you might as well write the best ones by hand. I don't think the ones generated by simple algorithms can actually provide usable and efficient networks anyway (Batcher's odd-even sort and pairwise sorting networks might be the only usable ones) [Another answer seems to show that generated networks could actually work].
There are known optimal or at least best length comparator networks for N<16, so there's at least a fairly good starting point. Fairly, since the optimal networks are not necessarily designed for maximum level of parallelism achievable with e.g. SSE or other vector arithmetics.
Another point is that already some optimal networks for some N are degenerate versions for a slightly larger optimal network for N+1.
From wikipedia:
The optimal depths for up to 10 inputs are known and they are
respectively 0, 1, 3, 3, 5, 5, 6, 6, 7, 7.
This said, I'd pursuit for implementing networks for N={4, 6, 8 and 10}, since the depth constraint cannot be simulated by extra parallelism (I think). I also think, that the ability to work in registers of SSE (also using some min/max instructions) or even some relatively large register set in RISC architecture will provide noticeable performance advantage compared to "well known" sorting methods such as quicksort due to absence of pointer arithmetic and other overhead.
Additionally, I'd pursuit to implement the parallel network using the infamous loop unrolling trick Duff's device.
EDIT
When the input values are known to be positive IEEE-754 floats or doubles, it's also worth to mention that the comparison can also be performed as integers. (float and int must have same endianness)
Let me share some thoughts.
Does anyone have any opinions on whether or not this is worth the
effort?
It is impossible to give a correct answer. You have to profile your actual code to find that out.
In my practice, when it comes to low-level profiling, the bottleneck was always not where I thought.
Does anyone know if this optimisation exists in any standard
implementations of, for example, std::sort?
For example, Visual C++ implementation of std::sort uses insertion sort for small vectors. I'm not aware of an implementation which uses optimal sorting networks.
Perhaps it would be possible to generate a sorting network like this
statically using template magic
There are algorithms for generating sorting networks, such as Bose-Nelson, Hibbard, and Batcher's algorithms. As C++ templates are Turing-complete, you can implement them using TMP. However, those algorithms are not guaranteed to give the theoretically minimal number of comparators, so you may want to hardcode the optimal network.

map/fold operators (in c++)

I am writing library which can do map/fold operations on ranges. I need to do these with operators. I am not very familiar with functional programming and I've tentatively selected * for map and || for fold. So to find (brute force algorithm) maximum of cos(x) in interval: 8 < x < 9:
double maximum = ro::range(8, 9, 0.01) * std::cos || std::max;
In above, ro::range can be replaced with any STL container.
I don't want to be different if there is any convention for map/fold operators. My question is: is there a math notation or does any language uses operators for map/fold?
** EDIT **
For those who asked, below is small demo of what RO currently can do. scc is small utility which can evaluate C++ snippets.
// Can print ranges, container, tuples, etc directly (vint is vector<int>) :
scc 'vint V{1,2,3}; V'
{1,2,3}
// Classic pipe. Alogorithms are from std::
scc 'vint{3,1,2,3} | sort | unique | reverse'
{3, 2, 1}
// Assign 42 to [2..5)
scc 'vint V=range(0,9); range(V/2, V/5) = 42; V'
{0, 1, 42, 42, 42, 5, 6, 7, 8, 9}
// concatenate vector of strings ('add' is shotcut for std::plus<T>()):
scc 'vstr V{"aaa", "bb", "cccc"}; V || add'
aaabbcccc
// Total length of strings in vector of strings
scc 'vstr V{"aaa", "bb", "cccc"}; V * size || (_1+_2)'
9
// Assign to c-string, then append `"XYZ"` and then remove `"bc"` substring :
scc 'char s[99]; range(s) = "abc"; (range(s) << "XYZ") - "bc"'
aXYZ
// Remove non alpha-num characters and convert to upper case
scc '(range("abc-123, xyz/") | isalnum) * toupper'
ABC123XYZ
// Hide phone number:
scc "str S=\"John Q Public (650)1234567\"; S|isdigit='X'; S"
John Q Public (XXX)XXXXXXX
This is really more a comment than a true answer, but it's too long to fit in a comment.
At least if my memory for the terminology serves correctly, map is essentially std::transform, and fold is std::accumulate. Assuming that's correct, I think trying to write your own would be ill-advised at best.
If you want to use map/fold style semantics, you could do something like this:
std::transform(std::begin(sto), std::end(sto), ::cos);
double maximum = *std::max_element(std::begin(sto), std::end(sto));
Although std::accumulate is more like a general-purpose fold, std::max_element is basically a fold(..., max); If you prefer a single operation, you could do something like:
double maximum = *(std::max_element(std::begin(sto), std::end(sto),
[](double a, double b) { return cos(a) < cos(b); });
I urge you to reconsider overloading operators for this purpose. Either example I've given above should be clear to almost any reasonable C++ programmer. The example you've given will be utterly opaque to most.
On a more general level, I'd urge extreme caution when overloading operators. Operator overloading is great when used correctly -- being able to overload operators for things like arbitrary precision integers, matrices, complex numbers, etc., renders code using those types much more readable and understandable than code without overloaded operators.
Unfortunately, when you use operators in unexpected ways, precisely the opposite is true -- and these uses are certainly extremely unexpected -- in fact, well into the range of "quite surprising". There might be question (but at least a little justification) if these operators were well understood in specific areas, but contrary to other uses in C++. In this case, however, you seem to be inventing a notation "out of whole cloth" -- I'm not aware of anybody using any operator C++ supports overloading to mean either fold or map (nor anything visually similar or analogous in any other way). In short, using overloading this way is a poor and unjustified idea.
Of the languages I know, there is no standard way for folding. Scala uses operators /: and :\ as well as metthod names, Lisp has reduce, Haskell has foldl.
map on the other hand is more common to find simply as map in all the languages I know.
Below is an implementation of fold in quasi-human-readable infix C++ syntax. Note that the code is not very robust and only serves to demonstrate the point. It is made to support the more usual 3-argument fold operators (the range, the binary operation, and the neutral element).
This is easily the funnies way to abuse (have you just said "rape"?) operator overloading, and one of the best ways to shoot yourself in the foot with a 900 pound artillery shell.
enum { fold } fold_t;
template <typename Op>
struct fold_intermediate_1
{
Op op;
fold_intermediate_1 (Op op) : op(op) {}
};
template <typename Cont, typename Op, bool>
struct fold_intermediate_2
{
const Cont& cont;
Op op;
fold_intermediate_2 (const Cont& cont, Op op) : cont(cont), op(op) {}
};
template <typename Op>
fold_intermediate_1<Op> operator/(fold_t, Op op)
{
return fold_intermediate_1<Op>(op);
}
template <typename Cont, typename Op>
fold_intermediate_2<Cont, Op, true> operator<(const Cont& cont, fold_intermediate_1<Op> f)
{
return fold_intermediate_2<Cont, Op, true>(cont, f.op);
}
template <typename Cont, typename Op, typename Init>
Init operator< (fold_intermediate_2<Cont, Op, true> f, Init init)
{
return foldl_func(f.op, init, std::begin(f.cont), std::end(f.cont));
}
template <typename Cont, typename Op>
fold_intermediate_2<Cont, Op, false> operator>(const Cont& cont, fold_intermediate_1<Op> f)
{
return fold_intermediate_2<Cont, Op, false>(cont, f.op);
}
template <typename Cont, typename Op, typename Init>
Init operator> (fold_intermediate_2<Cont, Op, false> f, Init init)
{
return foldr_func(f.op, init, std::begin(f.cont), std::end(f.cont));
}
foldr_func and foldl_func (the actual algorithms of left and right folds) are defined elsewhere.
Use it like this:
foo myfunc(foo, foo);
container<foo> cont;
foo zero, acc;
acc = cont >fold/myfunc> zero; // right fold
acc = cont <fold/myfunc< zero; // left fold
The word fold is used as a kind of poor man's new reserved word here. One can define several variations of this syntax, including
<<fold/myfunc<< >>fold/myfunc>>
<foldl/myfunc> <foldr/myfunc>
|fold<myfunc| |fold>myfunc|
The inner operator must have the same or greater precedence as the outer one(s). It's the limitation of C++ grammar.
For map, only one intermediate is needed and the syntax could be e.g.
mapped = cont |map| myfunc;
Implementing it is a simple exercise.
Oh, and please don't use this syntax in production, unless you know very well what you are doing, and probably even if you do ;)