Modifiable set of multisets: two-level sorting container - c++

I have the following three-level data structure (bottom to top):
object C: {string, T, float} (where T is also an object)
sorted container B of objects C with same string by highest float first
sorted container A of objects B by lowest max(C.float) (i.e. B[0]) first
So at the beginning I'll have a bunch of C with pre-computed float values and my data structure should look like this:
A:
B:
C:
string: "one"
T: {object}
float: 10
C:
string: "one"
T: {object} # different from the above of course
float: 8.3
C:
string: "one"
T: {object}
float: -4
B:
C:
string: "two"
T: {object}
float: 15
C:
string: "two"
T: {object}
float: 2
C:
string: "two"
T: {object}
float: 0
No difficult problem up to now, as I could just simply put all of this into a set of sets (/multisets) and be done with it. Here is where it get's difficult: I will have to extract a subset of these to compute a solution of my problem (the first C of every B). If there is no solution, then remove the topmost C and extract the new subset to try again. In python pseudo-code:
def get_list():
c_list = []
for b in A:
c_list.append(b[0]) # element in B with highest float value
return c_list
def solve():
for i in range(1, 3): # three tries
c_list = get_list()
# do stuff with c_list
if fail:
del A[0][0] # the topmost C element in the first B
continue
But when I delete this A[0][0] (i.e. the C with {"one", T, 10}), I need the whole thing to re-sort itself. Therefore I can't use sets as I'd be modifying A[0], which a STL set/multiset doesn't allow.
The other solution would be to create classes, define the operator() or operator< for each of the two levels I need to do a comparison on (B and C) for std::sort() and stuff everything into a two-level STL vector. However this just seems overly complicated/my non-professional C++ intuition tells the there should be an easier way to code this neatly.
Performance is important as it's a "real-time" application for a robot, but not of topmost priority as I'll only have say up to 30 C.

It seems that you are indexing by position, rather than value. The simplest data structure to use when you don't need efficient lookup by key is std::vector. I believe that would address your problems.
The std::set container is an ordered set of values.

Following David's suggestion to use std::vector nevertheless and this post, the simplest solution seems to be using the std::sort with a lambda expression. That way one doesn't need to define any additional classes (like B in this case) or any operator overloads.
class C {
string type;
T obj;
float sat;
};
And then:
vector<vector<C>> c_list = getData();
// Sort Bs
auto b_sort = [] (const C& lhs, const C& rhs) { return lhs.sat > rhs.sat; };
for (auto&& b : c_set) {
sort(b.begin(), b.end(), b_sort);
}
// Sort A
sort(c_set.begin(), c_set.end(),
[] (const vector<C>& lhs, const vector<C>& rhs) { return lhs[0].sat < rhs[0].sat; });
The main caveat in this solution is that the container is not inherently sorted when a C is removed. So the sorting above has to be called manually.

Related

Counting how many decision variables are equal

I'm a beginner user of Google OR-Tools, especially the CP-SAT. I'm using version 9.3, and I'm interested in the C++ version.
I'm modeling a problem where I need to count how many pairs of decision variables have the same (assigned) value. So, let's suppose I have a set of integer variables like this:
std::vector<IntVar> my_vars;
I also have a set of pairs like this:
std::vector<std::pair<size_t, size_t>> my_pairs;
Assume that all bounds are valid, size, etc, are valid. Now, I want to compute how many of these pairs have the same value. Using IBM Ilog Concert, I can do it very straightforward using:
// Using Ilog Concert technology.
IloIntVar count(env, 0, MY_UPPER_BOUND);
IloIntExpr expr_count(env);
for(const auto& [u, v] : my_pairs) {
expr_count += (my_vars[u] == my_vars[v]);
}
model.add(count == expr_count);
Here, count is a decision variable that holds how many pairs have the same value in a given solution. The expression is a sum of boolean values comparing the actual decision variable's values, not the variable objects themselves (i.e., is the object representing variable u is the same object representing variable v).
Using OR-Tools, the equality operator ==, compares whether the variable objects (or representation of them) are equal, not the decision variable values. So, the following fails by generating an empty expression:
// Using Google Or-Tools CP-SAT.
IntVar count = cp_model
.NewIntVar(Domain(0, my_pairs.size()))
.WithName("count");
LinearExpr expr_count;
for(const auto& [u, v] : my_pairs) {
expr_count += (my_vars[u] == my_vars[v]);
}
cp_model.AddEquality(count, expr_count);
Note that, according to Google OR-Tools code (here), we have that:
class IntVar {
//...
bool operator==(const IntVar& other) const {
return other.builder_ == builder_ && other.index_ == index_;
}
//...
};
i.e., comparing if the variables are the same, but not the value assigned to them. Therefore, we cannot compare decision variables directly using CP-SAT, and we need to recur to another method.
Obviously, I can change the model using some big-M notation and linearize such expressions. However, can I do count without to recur to "remodeling"? I.e., is there a construct I can use "more or less" easily so that I address such cases?
I must mention while I only depict one case here, I have quite a few counting variables of several sets like that. So, remodeling using big-M will be a big headache. I would prefer a simpler and straightforward approach like Ilog Concert.
(Update) Little extension
Now, I want do the same but comparing decision variables with scalars. For example:
std::vector<int> my_scalars;
for(size_t i = 0; i < my_scalars.size(); ++i) {
expr_count += (my_vars[i] == my_scalars[i]);
}
While this can be done using Ilog, it even did not compile on OR-Tools.
THanks,
Carlos
here is a tentative code:
IntVar count = model.NewIntVar(0, MY_UPPER_BOUND);
LinearExpr expr_count;
for(const auto& [u, v] : my_pairs) {
BoolVar is_equal = model.NewBoolVar();
model.AddEquality(my_vars[u], my_vars[v]).OnlyEnforceIf(is_equal);
model.AddNotEqual(my_vars[u], my_vars[v]).OnlyEnforceIf(is_equal.Not());
expr_count += is_equal;
}
model.AddEquality(expr_count, count);
With help of #sascha and #Laurent, my solution is this one:
vector<BoolVar> is_equal;
is_equal.reserve(my_pairs.size());
for(const auto& [u, v] : my_pairs) {
is_remainder_equal.push_back(cp_model.NewBoolVar());
cp_model
.AddEquality(my_vars[u], my_vars[v])
.OnlyEnforceIf(is_equal.back());
cp_model
.AddNotEqual(my_vars[u], my_vars[v])
.OnlyEnforceIf(Not(is_equal.back()));
}
cp_model.AddEquality(LinearExpr::Sum(is_equal), count);
It is the same as #Laurent in the very end, but I save the boolean vars for late use.
For scalars, it looks like I don't need to make a constant, just compare directly with the expression.
Thanks, #Laurent and #sascha. You guys were very helpful.

map comparator for pair of objects in c++

I want to use a map to count pairs of objects based on member input vectors. If there is a better data structure for this purpose, please tell me.
My program returns a list of int vectors. Each int vector is the output of a comparison between two int vectors ( a pair of int vectors). It is, however, possible, that the output of the comparison differs, though the two int vectors are the same (maybe in different order). I want to store how many different outputs (int vectors) each pair of int vectors has produced.
Assuming that I can access the int vector of my object with .inp()
Two pairs (a1,b1) and (a2,b2) should be considered equal, when (a1.inp() == a2.inp() && b2.inp() == b1.inp()) or (a1.inp() == b2.inp() and b1.inp() == a2.inp()).
This answer says:
The keys in a map a and b are equivalent by definition when neither a
< b nor b < a is true.
class SomeClass
{
vector <int> m_inputs;
public:
//constructor, setter...
vector<int> inp() {return m_inputs};
}
typedef pair < SomeClass, SomeClass > InputsPair;
typedef map < InputsPair, size_t, MyPairComparator > InputsPairCounter;
So the question is, how can I define equivalency of two pairs with a map comparator. I tried to concatenate the two vectors of a pair, but that leads to (010,1) == (01,01), which is not what I want.
struct MyPairComparator
{
bool operator() (const InputsPair & pair1, const InputsPair pair2) const
{
vector<int> itrc1 = pair1.first->inp();
vector<int> itrc2 = pair1.second->inp();
vector<int> itrc3 = pair2.first->inp();
vector<int> itrc4 = pair2.second->inp();
// ?
return itrc1 < itrc3;
}
};
I want to use a map to count pairs of input vectors. If there is a better data structure for this purpose, please tell me.
Using std::unordered_map can be considered instead due to 2 reasons:
if hash implemented properly it could be faster than std::map
you only need to implement hash and operator== instead of operator<, and operator== is trivial in this case
Details on how implement hash for std::vector can be found here. In your case possible solution could be to join both vectors into one, sort it and then use that method to calculate the hash. This is straightforward solution, but can produce to many hash collisions and lead to worse performance. To suggest better alternative would require knowledge of the data used.
As I understand, you want:
struct MyPairComparator
{
bool operator() (const InputsPair& lhs, const InputsPair pair2) const
{
return std::minmax(std::get<0>(lhs), std::get<1>(lhs))
< std::minmax(std::get<0>(rhs), std::get<1>(rhs));
}
};
we order the pair {a, b} so that a < b, then we use regular comparison.

How do I sort a vector with respect to another vector?

I have few vectors with same data type as.
v < int > = {5,4,1,2}
v2 < int > = {2,4,3,5,1,6,8,7}
v3 < int > = {1,4,2,3}
There is any way to sort vector v2 , v3 ... with respect to vector v using STL of C++(algorithm) so that
after sorting v2 would be {5,4,1,2,3,6,7,8} when it's sorted with respect to v and v3 would be {4,1,2,3} when it's sorted with respect to v .
Edit:
It may be unclear for some people.
let me explain ..
sorted vector has two parts , one is A and another one is B .
A contains element of vector v i.e. A is subset of v ,it follows same order as it's in v
B contains remaining element{v_i - A} of given vector(v_i) and it's sorted .
so for vector v2 after sorting it would be
v2 = A union B
A = {5,4,1,2}
B = {3,6,7,8}
class StrangeComparison {
public:
StrangeComparison(const vector<int>& ordering) : ordering_(ordering) {}
bool operator()(int a, int b) const {
auto index_a = find(ordering_.begin(), ordering_.end(), a);
auto index_b = find(ordering_.begin(), ordering_.end(), b);
return make_pair(index_a, a) < make_pair(index_b, b);
}
private:
const vector<int>& ordering_;
};
sort(v2.begin(), v2.end(), StrangeComparison(v));
Working example. Improving efficiency is left as an exercise for the reader (hint: look at std::find calls).
You only need to augment your comparison function with the following rules:
if the first value exists in the vector to sort with respect to, but the second value does not exist in it, then first < second
if the second value exists but the first does not, then second < first
if both values exist, compare their index values within that vector
If none of those conditions are true, the existing comparison logic would run.

How to index and assign elements in a tensor using identical call signatures?

OK, I've been googling around for too long, I'm just not sure what to call this technique, so I figured it's better to just ask here on SO. Please point me in the right direction if this has an obvious name and/or solution I've overlooked.
For the laymen: a tensor is the logical extension of the matrix, in the same way a matrix is the logical extension of the vector. A vector is a rank-1 tensor (in programming terms, a 1D array of numbers), a matrix is a rank-2 tensor (a 2D array of numbers), and a rank-N tensor is then simply an N-D array of numbers.
Now, suppose I have something like this Tensor class:
template<typename T = double> // possibly also with size parameters
class Tensor
{
private:
T *M; // Tensor data (C-array)
// alternatively, std::vector<T> *M
// or std::array<T> *M
// etc., or possibly their constant-sized versions
// using Tensor<>'s template parameters
public:
... // insert trivial fluffy stuff here
// read elements
const T & operator() (size_t a, size_t b) const {
... // error checks etc.
return M[a + rows*b];
}
// write elements
T & operator() (size_t a, size_t b) {
... // error checks etc.
return M[a + rows*b];
}
...
};
With these definitions of operator()(...), indexing/assign individual elements then has the same call signature:
Tensor<> B(5,5);
double a = B(3,4); // operator() (size_t,size_t) used to both GET elements
B(3,4) = 5.5; // and SET elements
It is fairly trivial to extend this up to arbitrary tensor rank. But what I'd like to be able to implement is a more high-level way of indexing/assigning elements:
Tensor<> B(5,5);
Tensor<> C = B( Slice(0,4,2), 2 ); // operator() (Slice(),size_t) used to GET elements
B( Slice(0,4,2), 2 ) = C; // and SET elements
// (C is another tensor of the correct dimensions)
I am aware that std::valarray (and many others for that matter) does a very similar thing already, but it's not my objective to just accomplish the behavior; my objective here is to learn how to elegantly, efficiently and safely add the following functionality to my Tensor<> class:
// Indexing/assigning with Tensor<bool>
B( B>0 ) += 1.0;
// Indexing/assigning arbitrary amount of dimensions, each dimension indexed
// with either Tensor<bool>, size_t, Tensor<size_t>, or Slice()
B( Slice(0,2,FINAL), 3, Slice(0,3,FINAL), 4 ) = C;
// double indexing/assignment operation
B(3, Slice(0,4,FINAL))(mask) = C; // [mask] == Tensor<bool>
.. etc.
Note that it's my intention to use operator[] for non-checked versions of operator(). Alternatively, I'll stick more to the std::vector<> approach of using .at() methods for checked versions of operator[]. Anyway, this is a design choice and besides the issue right now.
I've conjured up the following incomplete "solution". This method is only really manageable for vectors/matrices (rank-1 or rank-2 tensors), and has many undesirable side-effects:
// define a simple slice class
Slice ()
{
private:
size_t
start, stride, end;
public:
Slice(size_t s, size_t e) : start(s), stride(1), end(e) {}
Slice(size_t s, size_t S, size_t e) : start(s), stride(S), end(e) {}
...
};
template<typename T = double>
class Tensor
{
... // same as before
public:
// define two operators() for use with slices:
// version for retrieving data
const Tensor<T> & operator() (Slice r, size_t c) const {
// use slicing logic to construct return tensor
...
return M;
{
// version for assigning data
Sass operator() (Slice r, size_t c) {
// returns Sass object, defined below
return Sass(*this, r,c);
}
protected:
class Sass
{
friend class Tensor<T>;
private:
Tensor<T>& M;
const Slice &R;
const size_t c;
public:
Sass(Tensor<T> &M, const Slice &R, const size_t c)
: M(M)
, R(R)
, c(c)
{}
operator Tensor<T>() const { return M; }
Tensor<T> & operator= (const Tensor<T> &M2) {
// use R/c to copy contents of M2 into M using the same
// Slice-logic as in "Tensor<T>::operator()(...) const" above
...
return M;
}
};
But this just feels wrong...
For each of the indexing/assignment methods outlined above, I'd have to define a separate Tensor<T>::Sass::Sass(...) constructor, a new Tensor<T>::Sass::operator=(...), and a new Tensor<T>::operator()(...) for each and every such operation. Moreover, the Tensor<T>::Sass::operators=(...) would need to contain much of the same stuff that's already in the corresponding Tensor<T>::operator()(...), and making everything suitable for a Tensor<> of arbitrary rank makes this approach quite ugly, way too verbose and more importantly, completely unmanageable.
So, I'm under the impression there is a much more effective approach to all this.
Any suggestions?
First of all I'd like to point out some design issues:
T & operator() (size_t a, size_t b) const;
suggests you can't alter the matrix through this method, because it's const. But you are giving back a nonconst reference to a matrix element, so in fact you can alter it. This only compiles because of the raw pointer you are using. I suggest to use std::vector instead, which does the memory management for you and will give you an error because vector's const version of operator[] gives a const reference like it should.
Regarding your actual question, I am not sure what the parameters of the Slice constructor should do, nor what a Sass object is meant to be (I am no native speaker, and "Sass" gives me only one translation in the dictionary, meaning sth. like "impudence", "impertinence").
However, I suppose with a slice you want to create an object that gives access to a subset of a matrix, defined by the slice's parameters.
I would advice against using operator() for every way to access the matrix. op() with two indices to access a given element seems natural. Using a similar operator to get a whole matrix to me seems less intuitive.
Here's an idea: make a Slice class that holds a reference to a Matrix and the necessary parameters that define which part of the Matrix is represented by the Slice. That way a Slice would be something like a proxy to the Matrix subset it defines, similar to a pair of iterators which can be seen as a proxy to a subrange of the container they are pointing to. Give your Matrix a pair of slice() methods (const and nonconst) that give back a Slice/ConstSlice, referencing the Matrix you call the method on. That way, you can even put checks into the method to see if the Slice's parameters make sense for the Matrix it refers to. If it makes sense and is necessary, you can also add a conversion operator, to convert a Slice into a Matrix of its own.
Overloading operator() again and again and using the parameters as a mask, as linear indices and other stuff is more confusing than helping imo. operator() is slick if it does something natural which everybody expects from it. It only obfuscates the code if it is used everywhere. Use named methods instead.
Not an answer, just a note to follow up my comment:
Tensor<bool> T(false);
// T (whatever its rank) contains all false
auto lazy = T(Slice(0,4,2));
// if I use lazy here, it will be all false
T = true;
// now T contains all true
// if I use lazy here, it will be all true
This may be what you want, or it might be unexpected.
In general, this can work cleanly with immutable tensors, but allowing mutation gives the same class of problem as COW strings.
If you allow for your Tensor to implicitly be a double you can return only Tensors from your operator() overload.
operator double() {
return M.size() == 1 ? M[0] : std::numeric_limits<double>::quiet_NaN();
};
That should allow for
double a = B(3,4);
Tensor<> a = B(Slice(1,2,3),4);
To get the operator() to work with multiple overloads with Slice and integer is another issue. I'd probably just use Slice and create another implicit conversion so integers can be Slice's, then maybe using the variable argument elipses.
const Tensor<T> & operator() (int numOfDimensions, ...)
Although the variable argument route is kind of a kludge best to just have 8 specializations for 1-8 parameters of Slice.

Understanding std::accumulate

I want to know why std::accumulate (aka reduce) 3rd parameter is needed. For those who do not know what accumulate is, it's used like so:
vector<int> V{1,2,3};
int sum = accumulate(V.begin(), V.end(), 0);
// sum == 6
Call to accumulate is equivalent to:
sum = 0; // 0 - value of 3rd param
for (auto x : V) sum += x;
There is also optional 4th parameter, which allow to replace addition with any other operation.
Rationale that I've heard is that if you need let say not to add up, but multiply elements of a vector, we need other (non-zero) initial value:
vector<int> V{1,2,3};
int product = accumulate(V.begin(), V.end(), 1, multiplies<int>());
But why not do like Python - set initial value for V.begin(), and use range starting from V.begin()+1. Something like this:
int sum = accumulate(V.begin()+1, V.end(), V.begin());
This will work for any op. Why is 3rd parameter needed at all?
You're making a mistaken assumption: that type T is of the same type as the InputIterator.
But std::accumulate is generic, and allows all different kinds of creative accumulations and reductions.
Example #1: Accumulate salary across Employees
Here's a simple example: an Employee class, with many data fields.
class Employee {
/** All kinds of data: name, ID number, phone, email address... */
public:
int monthlyPay() const;
};
You can't meaningfully "accumulate" a set of employees. That makes no sense; it's undefined. But, you can define an accumulation regarding the employees. Let's say we want to sum up all the monthly pay of all employees. std::accumulate can do that:
/** Simple class defining how to add a single Employee's
* monthly pay to our existing tally */
auto accumulate_func = [](int accumulator, const Employee& emp) {
return accumulator + emp.monthlyPay();
};
// And here's how you call the actual calculation:
int TotalMonthlyPayrollCost(const vector<Employee>& V)
{
return std::accumulate(V.begin(), V.end(), 0, accumulate_func);
}
So in this example, we're accumulating an int value over a collection of Employee objects. Here, the accumulation sum isn't the same type of variable that we're actually summing over.
Example #2: Accumulating an average
You can use accumulate for more complex types of accumulations as well - maybe want to append values to a vector; maybe you have some arcane statistic you're tracking across the input; etc. What you accumulate doesn't have to be just a number; it can be something more complex.
For example, here's a simple example of using accumulate to calculate the average of a vector of ints:
// This time our accumulator isn't an int -- it's a structure that lets us
// accumulate an average.
struct average_accumulate_t
{
int sum;
size_t n;
double GetAverage() const { return ((double)sum)/n; }
};
// Here's HOW we add a value to the average:
auto func_accumulate_average =
[](average_accumulate_t accAverage, int value) {
return average_accumulate_t(
{accAverage.sum+value, // value is added to the total sum
accAverage.n+1}); // increment number of values seen
};
double CalculateAverage(const vector<int>& V)
{
average_accumulate_t res =
std::accumulate(V.begin(), V.end(), average_accumulate_t({0,0}), func_accumulate_average)
return res.GetAverage();
}
Example #3: Accumulate a running average
Another reason you need the initial value is because that value isn't always the default/neutral value for the calculation you're making.
Let's build on the average example we've already seen. But now, we want a class that can hold a running average -- that is, we can keep feeding in new values, and check the average so far, across multiple calls.
class RunningAverage
{
average_accumulate_t _avg;
public:
RunningAverage():_avg({0,0}){} // initialize to empty average
double AverageSoFar() const { return _avg.GetAverage(); }
void AddValues(const vector<int>& v)
{
_avg = std::accumulate(v.begin(), v.end(),
_avg, // NOT the default initial {0,0}!
func_accumulate_average);
}
};
int main()
{
RunningAverage r;
r.AddValues(vector<int>({1,1,1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 1.0
r.AddValues(vector<int>({-1,-1,-1}));
std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 0.0
}
This is a case where we absolutely rely on being able to set that initial value for std::accumulate - we need to be able to initialize the accumulation from different starting points.
In summary, std::accumulate is good for any time you're iterating over an input range, and building up one single result across that range. But the result doesn't need to be the same type as the range, and you can't make any assumptions about what initial value to use -- which is why you must have an initial instance to use as the accumulating result.
The way things are, it is annoying for code that knows for sure a range isn't empty and that wants to start accumulating from the first element of the range on. Depending on the operation that is used to accumulate with, it's not always obvious what the 'zero' value to use is.
If on the other hand you only provide a version that requires non-empty ranges, it's annoying for callers that don't know for sure that their ranges aren't empty. An additional burden is put on them.
One perspective is that the best of both worlds is of course to provide both functionality. As an example, Haskell provides both foldl1 and foldr1 (which require non-empty lists) alongside foldl and foldr (which mirror std::transform).
Another perspective is that since the one can be implemented in terms of the other with a trivial transformation (as you've demonstrated: std::transform(std::next(b), e, *b, f) -- std::next is C++11 but the point still stands), it is preferable to make the interface as minimal as it can be with no real loss of expressive power.
Because standard library algorithms are supposed to work for arbitrary ranges of (compatible) iterators. So the first argument to accumulate doesn't have to be begin(), it could be any iterator between begin() and one before end(). It could also be using reverse iterators.
The whole idea is to decouple algorithms from data. Your suggestion, if I understand it correctly, requires a certain structure in the data.
If you wanted accumulate(V.begin()+1, V.end(), V.begin()) you could just write that. But what if you thought v.begin() might be v.end() (i.e. v is empty)? What if v.begin() + 1 is not implemented (because v only implements ++, not generized addition)? What if the type of the accumulator is not the type of the elements? Eg.
std::accumulate(v.begin(), v.end(), 0, [](long count, char c){
return isalpha(c) ? count + 1 : count
});
It's indeed not needed. Our codebase has 2 and 3-argument overloads which use a T{} value.
However, std::accumulate is pretty old; it comes from the original STL. Our codebase has fancy std::enable_if logic to distinguish between "2 iterators and initial value" and "2 iterators and reduction operator". That requires C++11. Our code also uses a trailing return type (auto accumulate(...) -> ...) to calculate the return type, another C++11 feature.