A good sorting algo for this particular usecase - c++

struct Object {
int16_t order = 0;
};
I have a std::list of Object instances, which I want to sort based on an 'order' member variable.
smaller order values are placed earlier in the list.
While looping, if the current order value is the same as an existing one, I think it should be placed before that existing one, so that I don't have to continue looking at the rest elements of the list.
The list can have a maximum of 1024 items.
I'm looking for an algo that will allow me to sort the list in the least amount of iterations, or something close to that. A naive approach that I have now results in a triangular amount of iterations, which for 1024 is:
(1024(1024 + 1)) / 2 = 524,288

Use a member sort method - std::list::sort with appropriate comparator:
int main() {
std::list<Object> objects{
Object{4}, Object{2}, Object{6}, Object{7}, Object{42}
};
objects.sort([](const auto& lhs, const auto& rhs) {
return lhs.order < rhs.order;
});
}

Related

map comparator for pair of objects in c++

I want to use a map to count pairs of objects based on member input vectors. If there is a better data structure for this purpose, please tell me.
My program returns a list of int vectors. Each int vector is the output of a comparison between two int vectors ( a pair of int vectors). It is, however, possible, that the output of the comparison differs, though the two int vectors are the same (maybe in different order). I want to store how many different outputs (int vectors) each pair of int vectors has produced.
Assuming that I can access the int vector of my object with .inp()
Two pairs (a1,b1) and (a2,b2) should be considered equal, when (a1.inp() == a2.inp() && b2.inp() == b1.inp()) or (a1.inp() == b2.inp() and b1.inp() == a2.inp()).
This answer says:
The keys in a map a and b are equivalent by definition when neither a
< b nor b < a is true.
class SomeClass
{
vector <int> m_inputs;
public:
//constructor, setter...
vector<int> inp() {return m_inputs};
}
typedef pair < SomeClass, SomeClass > InputsPair;
typedef map < InputsPair, size_t, MyPairComparator > InputsPairCounter;
So the question is, how can I define equivalency of two pairs with a map comparator. I tried to concatenate the two vectors of a pair, but that leads to (010,1) == (01,01), which is not what I want.
struct MyPairComparator
{
bool operator() (const InputsPair & pair1, const InputsPair pair2) const
{
vector<int> itrc1 = pair1.first->inp();
vector<int> itrc2 = pair1.second->inp();
vector<int> itrc3 = pair2.first->inp();
vector<int> itrc4 = pair2.second->inp();
// ?
return itrc1 < itrc3;
}
};
I want to use a map to count pairs of input vectors. If there is a better data structure for this purpose, please tell me.
Using std::unordered_map can be considered instead due to 2 reasons:
if hash implemented properly it could be faster than std::map
you only need to implement hash and operator== instead of operator<, and operator== is trivial in this case
Details on how implement hash for std::vector can be found here. In your case possible solution could be to join both vectors into one, sort it and then use that method to calculate the hash. This is straightforward solution, but can produce to many hash collisions and lead to worse performance. To suggest better alternative would require knowledge of the data used.
As I understand, you want:
struct MyPairComparator
{
bool operator() (const InputsPair& lhs, const InputsPair pair2) const
{
return std::minmax(std::get<0>(lhs), std::get<1>(lhs))
< std::minmax(std::get<0>(rhs), std::get<1>(rhs));
}
};
we order the pair {a, b} so that a < b, then we use regular comparison.

Algorithm for hash/crc of unordered multiset

Let's say I would like to create a unordered set of unordered multisets of unsigned int. For this, I need to create a hash function to calculate a hash of the unordered multiset. In fact, it has to be good for CRC as well.
One obvious solution is to put the items in vector, sort them and return a hash of the result. This seems to work, but it is expensive.
Another approach is to xor the values, but obviously if I have one item twice or none the result will be the same - which is not good.
Any ideas how I can implement this cheaper - I have an application that will be doing this thousand for thousands of sets, and relatively big ones.
Since it is a multiset, you would like for the hash value to be the same for identical multisets, whose representation might have the same elements presented, added, or deleted in a different order. You would then like for the hash value to be commutative, easy to update, and change for each change in elements. You would also like for two changes to not readily cancel their effect on the hash.
One operation that meets all but the last criteria is addition. Just sum the elements. To keep the sum bounded, do the sum modulo the size of your hash value. (E.g. modulo 264 for a 64-bit hash.) To make sure that inserting or deleting zero values changes the hash, add one to each value first.
A drawback of the sum is that two changes can readily cancel. E.g. replacing 1 3 with 2 2. To address that, you can use the same approach and sum a polynomial of the entries, still retaining commutativity. E.g. instead of summing x+1, you can sum x2+x+1. Now it is more difficult to contrive sets of changes with the same sum.
Here's a reasonable hash function for std::unordered_multiset<int> it would be better if the computations were taken mod a large prime but the idea stands.
#include <iostream>
#include <unordered_set>
namespace std {
template<>
struct hash<unordered_multiset<int>> {
typedef unordered_multiset<int> argument_type;
typedef std::size_t result_type;
const result_type BASE = static_cast<result_type>(0xA67);
result_type log_pow(result_type ex) const {
result_type res = 1;
result_type base = BASE;
while (ex > 0) {
if (ex % 2) {
res = res * base;
}
base *= base;
ex /= 2;
}
return res;
}
result_type operator()(argument_type const & val) const {
result_type h = 0;
for (const int& el : val) {
h += log_pow(el);
}
return h;
}
};
};
int main() {
std::unordered_set<std::unordered_multiset<int>> mySet;
std::unordered_multiset<int> set1{1,2,3,4};
std::unordered_multiset<int> set2{1,1,2,2,3,3,4,4};
std::cout << "Hash 1: " << std::hash<std::unordered_multiset<int>>()(set1)
<< std::endl;
std::cout << "Hash 2: " << std::hash<std::unordered_multiset<int>>()(set2)
<< std::endl;
return 0;
}
Output:
Hash 1: 2290886192
Hash 2: 286805088
When it's a prime p, the number of collisions is proportional to 1/p. I'm not sure what the analysis is for powers of two. You can make updates to the hash efficient by adding/subtracting BASE^x when you insert/remove the integer x.
Implement the inner multiset as a value->count hash map.
This will allow you to avoid the problem that an even number of elements cancels out via xor in the following way: Instead of xor-ing each element, you construct a new number from the count and the value (e.g. multiplying them), and then you can build the full hash using xor.

Efficient way to search a vector of structs for a specific ID?

I have a vector of structs Data which has an integer data member ID. I need to search if it contains an instance of a specific ID. I had to do it this way:
int DataSize = 0;
for(unsigned count = 0; count < Data.size(); count++)
{
if(ID == Data[count].ID)
DataSize++;
}
Where ID is previously defined. Any more efficient way to search a vector of objects ? Especially when it is a part of an embedded application.
Use std::count_if.
std::count_if(Data.begin(), Data.end(), [&ID](const DataType& data){return ID == data.ID; };
where DataType is the type of elements contained in Data.
Note that there are no real efficiency gains to be had unless Data satisfied some more conditions, for example, being sorted by ID. However, using a standard algorithm improves readability.
With C++11 and lambdas could write a little more expresive as:
If you want to count the struct with ID:
std::count_if(std::cbegin(dataArray), std::cend(dataArray), [ID](const Data& data) {
return data.ID == ID;
});
If you want to known if there is at least one:
bool found_ID = std::cend(dataArray) != std::find_if(std::cbegin(dataArray), std::cend(dataArray), [ID](const Data& data) {
return data.ID == ID;
});
The other algorithms of the STD it's always good to have at hand, some are used only very few time, but could save a lot of debugging (with edge case) and performance problems if implemented by hand.

dot product of vector < vector < int > > over the first dimension

I have
vector < vector < int > > data_mat ( 3, vector < int > (4) );
vector < int > data_vec ( 3 );
where data_mat can be thought of as a matrix and data_vec as a column vector, and I'm looking for a way to compute the inner product of every column of data_mat with data_vec, and store it in another vector < int > data_out (4).
The example http://liveworkspace.org/code/2bW3X5%241 using for_each and transform, can be used to compute column sums of a matrix:
sum=vector<int> (data_mat[0].size());
for_each(data_mat.begin(), data_mat.end(),
[&](const std::vector<int>& c) {
std::transform(c.begin(), c.end(), sum.begin(), sum.begin(),
[](int d1, double d2)
{ return d1 + d2; }
);
}
);
Is it possible, in a similar way (or in a slightly different way that uses STL functions), to compute column dot products of matrix columns with a vector?
The problem is that the 'd2 = d1 + d2' trick does not work here in the column inner product case -- if there is a way to include a d3 as well that would solve it ( d3 = d3 + d1 * d2 ) but ternary functions do not seem to exist in transform.
In fact you can use your existing column sum approach nearly one to one. You don't need a ternary std::transform as inner loop because the factor you scale the matrix rows with before summing them up is constant for each row, since it is the row value from the column vector and that iterates together with the matrix rows and thus the outer std::for_each.
So what we need to do is iterate over the rows of the matrix and multiply each complete row by the corresponding value in the column vector and add that scaled row to the sum vector. But unfortunately for this we would need a std::for_each function that simultaneously iterates over two ranges, the rows of the matrix and the rows of the column vector. To achieve this, we could use the usual unary std::for_each and just do the iteration over the column vector manually, using an additional iterator:
std::vector<int> sum(data_mat[0].size());
auto vec_iter = data_vec.begin();
std::for_each(data_mat.begin(), data_mat.end(),
[&](const std::vector<int>& row) {
int vec_value = *vec_iter++; //manually advance vector row
std::transform(row.begin(), row.end(), sum.begin(), sum.begin(),
[=](int a, int b) { return a*vec_value + b; });
});
The additional manual iteration inside the std::for_each isn't really that idiomatic use of the standard library algorithms, but unfortunately there is no binary std::for_each we could use.
Another option would be to use std::transform as outer loop (which can iterate over two ranges), but we don't really compute a single value in each outer iteration to return, so we would have to just return some dummy value from the outer lambda and throw it away by using some kind of dummy output iterator. That wouldn't be the cleanest solution either:
//output iterator that just discards any output
struct discard_iterator : std::iterator<std::output_iterator_tag,
void, void, void, void>
{
discard_iterator& operator*() { return *this; }
discard_iterator& operator++() { return *this; }
discard_iterator& operator++(int) { return *this; }
template<typename T> discard_iterator& operator=(T&&) { return *this; }
};
//iterate over rows of matrix and vector, misusing transform as binary for_each
std::vector<int> sum(data_mat[0].size());
std::transform(data_mat.begin(), data_mat.end(),
data_vec.begin(), discard_iterator(),
[&](const std::vector<int>& row, int vec_value) {
return std::transform(row.begin(), row.end(),
sum.begin(), sum.begin(),
[=](int a, int b) {
return a*vec_value + b;
});
});
EDIT: Although this has already been discussed in comments and I understand (and appreciate) the theoretic nature of the question, I will still include the suggestion that in practice a dynamic array of dynamic arrays is an awfull way to represent such a structurally well-defined 2D array like a matrix. A proper matrix data structure (which stores its contents contigously) with the appropriate operators is nearly always a better choice. But nevertheless due to their genericity you can still use the standard library algorithms for working with such a custom datastructure (maybe even by letting the matrix type provide its own iterators).

Understanding std::accumulate

I want to know why std::accumulate (aka reduce) 3rd parameter is needed. For those who do not know what accumulate is, it's used like so:
vector<int> V{1,2,3};
int sum = accumulate(V.begin(), V.end(), 0);
// sum == 6
Call to accumulate is equivalent to:
sum = 0; // 0 - value of 3rd param
for (auto x : V) sum += x;
There is also optional 4th parameter, which allow to replace addition with any other operation.
Rationale that I've heard is that if you need let say not to add up, but multiply elements of a vector, we need other (non-zero) initial value:
vector<int> V{1,2,3};
int product = accumulate(V.begin(), V.end(), 1, multiplies<int>());
But why not do like Python - set initial value for V.begin(), and use range starting from V.begin()+1. Something like this:
int sum = accumulate(V.begin()+1, V.end(), V.begin());
This will work for any op. Why is 3rd parameter needed at all?
You're making a mistaken assumption: that type T is of the same type as the InputIterator.
But std::accumulate is generic, and allows all different kinds of creative accumulations and reductions.
Example #1: Accumulate salary across Employees
Here's a simple example: an Employee class, with many data fields.
class Employee {
/** All kinds of data: name, ID number, phone, email address... */
public:
int monthlyPay() const;
};
You can't meaningfully "accumulate" a set of employees. That makes no sense; it's undefined. But, you can define an accumulation regarding the employees. Let's say we want to sum up all the monthly pay of all employees. std::accumulate can do that:
/** Simple class defining how to add a single Employee's
* monthly pay to our existing tally */
auto accumulate_func = [](int accumulator, const Employee& emp) {
return accumulator + emp.monthlyPay();
};
// And here's how you call the actual calculation:
int TotalMonthlyPayrollCost(const vector<Employee>& V)
{
return std::accumulate(V.begin(), V.end(), 0, accumulate_func);
}
So in this example, we're accumulating an int value over a collection of Employee objects. Here, the accumulation sum isn't the same type of variable that we're actually summing over.
Example #2: Accumulating an average
You can use accumulate for more complex types of accumulations as well - maybe want to append values to a vector; maybe you have some arcane statistic you're tracking across the input; etc. What you accumulate doesn't have to be just a number; it can be something more complex.
For example, here's a simple example of using accumulate to calculate the average of a vector of ints:
// This time our accumulator isn't an int -- it's a structure that lets us
// accumulate an average.
struct average_accumulate_t
{
int sum;
size_t n;
double GetAverage() const { return ((double)sum)/n; }
};
// Here's HOW we add a value to the average:
auto func_accumulate_average =
[](average_accumulate_t accAverage, int value) {
return average_accumulate_t(
{accAverage.sum+value, // value is added to the total sum
accAverage.n+1}); // increment number of values seen
};
double CalculateAverage(const vector<int>& V)
{
average_accumulate_t res =
std::accumulate(V.begin(), V.end(), average_accumulate_t({0,0}), func_accumulate_average)
return res.GetAverage();
}
Example #3: Accumulate a running average
Another reason you need the initial value is because that value isn't always the default/neutral value for the calculation you're making.
Let's build on the average example we've already seen. But now, we want a class that can hold a running average -- that is, we can keep feeding in new values, and check the average so far, across multiple calls.
class RunningAverage
{
average_accumulate_t _avg;
public:
RunningAverage():_avg({0,0}){} // initialize to empty average
double AverageSoFar() const { return _avg.GetAverage(); }
void AddValues(const vector<int>& v)
{
_avg = std::accumulate(v.begin(), v.end(),
_avg, // NOT the default initial {0,0}!
func_accumulate_average);
}
};
int main()
{
RunningAverage r;
r.AddValues(vector<int>({1,1,1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 1.0
r.AddValues(vector<int>({-1,-1,-1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 0.0
}
This is a case where we absolutely rely on being able to set that initial value for std::accumulate - we need to be able to initialize the accumulation from different starting points.
In summary, std::accumulate is good for any time you're iterating over an input range, and building up one single result across that range. But the result doesn't need to be the same type as the range, and you can't make any assumptions about what initial value to use -- which is why you must have an initial instance to use as the accumulating result.
The way things are, it is annoying for code that knows for sure a range isn't empty and that wants to start accumulating from the first element of the range on. Depending on the operation that is used to accumulate with, it's not always obvious what the 'zero' value to use is.
If on the other hand you only provide a version that requires non-empty ranges, it's annoying for callers that don't know for sure that their ranges aren't empty. An additional burden is put on them.
One perspective is that the best of both worlds is of course to provide both functionality. As an example, Haskell provides both foldl1 and foldr1 (which require non-empty lists) alongside foldl and foldr (which mirror std::transform).
Another perspective is that since the one can be implemented in terms of the other with a trivial transformation (as you've demonstrated: std::transform(std::next(b), e, *b, f) -- std::next is C++11 but the point still stands), it is preferable to make the interface as minimal as it can be with no real loss of expressive power.
Because standard library algorithms are supposed to work for arbitrary ranges of (compatible) iterators. So the first argument to accumulate doesn't have to be begin(), it could be any iterator between begin() and one before end(). It could also be using reverse iterators.
The whole idea is to decouple algorithms from data. Your suggestion, if I understand it correctly, requires a certain structure in the data.
If you wanted accumulate(V.begin()+1, V.end(), V.begin()) you could just write that. But what if you thought v.begin() might be v.end() (i.e. v is empty)? What if v.begin() + 1 is not implemented (because v only implements ++, not generized addition)? What if the type of the accumulator is not the type of the elements? Eg.
std::accumulate(v.begin(), v.end(), 0, [](long count, char c){
return isalpha(c) ? count + 1 : count
});
It's indeed not needed. Our codebase has 2 and 3-argument overloads which use a T{} value.
However, std::accumulate is pretty old; it comes from the original STL. Our codebase has fancy std::enable_if logic to distinguish between "2 iterators and initial value" and "2 iterators and reduction operator". That requires C++11. Our code also uses a trailing return type (auto accumulate(...) -> ...) to calculate the return type, another C++11 feature.