Related
I am relative new at C++ and I have little problem. I have vector and in that vector are vectors with 3 integers.
Inner vector represents like one person. 3 integers inside that inner vector represents distance from start, velocity and original index (because in input integers aren't sorted and in output I need to print original index not index in this sorted vector).
Now I have given some points representing distance from start and I need to find which person will be first at that point so I have been thinking that my first step would be that I would find closest person to the given point so basically I need to find lower_bound/upper_bound.
How can I use lower_bound if I want to find the lower_bound of first item in inner vectors? Or should I use struct/class instead of inner vectors?
You would use the version of std::lower_bound which takes a custom comparator (the versions marked "(2)" at the link); and you would write a comparator of vectors which compares vectors by their first item (or whatever other way you like).
Howerver:
As #doctorlove points out, std::lower_bound doesn't compare the vectors to each other, it compares them to a given value (be it a vector or a scalar). So it's possible you actually want to do something else.
It's usually not a good idea to keep fixed-length sequences of elements in std::vector's. Have you considered std::array?
It's very likely that your "vectors with 3 integers" actually stand for something else, e.g. points in a 3-dimensional geometric space; in which case, yes, they should be in some sort of class.
I am not sure that your inner things should be std::vector-s of 3 elements.
I believe that they should std::array-s of 3 elements (because you know that the size is 3 and won't change).
So you probably want to have
typedef std::array<double,3> element_ty;
then use std::vector<element_ty> and for the rest (your lower_bound point) do like in einpoklum's answer.
BTW, you probably want to use std::min_element with an explicit compare.
Maybe you want something like:
std::vector<element_ty> vec;
auto minit =
std::min_element(vec.begin(), vec.end(),
[](const element_ty& x, const element_ty&y) {
return x[0] < y[0]));
I have an std::vector of floats that I want to not contain duplicates but the math that populates the vector isn't 100% precise. The vector has values that differ by a few hundredths but should be treated as the same point. For example here's some values in one of them:
...
X: -43.094505
X: -43.094501
X: -43.094498
...
What would be the best/most efficient way to remove duplicates from a vector like this.
First sort your vector using std::sort. Then use std::unique with a custom predicate to remove the duplicates.
std::unique(v.begin(), v.end(),
[](double l, double r) { return std::abs(l - r) < 0.01; });
// treats any numbers that differ by less than 0.01 as equal
Live demo
Sorting is always a good first step. Use std::sort().
Remove not sufficiently unique elements: std::unique().
Last step, call resize() and maybe also shrink_to_fit().
If you want to preserve the order, do the previous 3 steps on a copy (omit shrinking though).
Then use std::remove_if with a lambda, checking for existence of the element in the copy (binary search) (don't forget to remove it if found), and only retain elements if found in the copy.
I say std::sort() it, then go through it one by one and remove the values within certain margin.
You can have a separate write iterator to the same vector and one resize operation at the end - instead of calling erase() for each removed element or having another destination copy for increased performance and smaller memory usage.
If your vector cannot contain duplicates, it may be more appropriate to use an std::set. You can then use a custom comparison object to consider small changes as being inconsequential.
Hi you could comprare like this
bool isAlmostEquals(const double &f1, const double &f2)
{
double allowedDif = xxxx;
return (abs(f1 - f2) <= allowedDif);
}
but it depends of your compare range and the double precision is not on your side
if your vector is sorted you could use std::unique with the function as predicate
I would do the following:
Create a set<double>
go through your vector in a loop or using a functor
Round each element and insert into the set
Then you can swap your vector with an empty vector
Copy all elements from the set to the empty vector
The complexity of this approach will be n * log(n) but it's simpler and can be done in a few lines of code. The memory consumption will double from just storing the vector. In addition set consumes slightly more memory per each element than vector. However, you will destroy it after using.
std::vector<double> v;
v.push_back(-43.094505);
v.push_back(-43.094501);
v.push_back(-43.094498);
v.push_back(-45.093435);
std::set<double> s;
std::vector<double>::const_iterator it = v.begin();
for(;it != v.end(); ++it)
s.insert(floor(*it));
v.swap(std::vector<double>());
v.resize(s.size());
std::copy(s.begin(), s.end(), v.begin());
The problem with most answers so far is that you have an unusual "equality". If A and B are similar but not identical, you want to treat them as equal. Basically, A and A+epsilon still compare as equal, but A+2*epsilon does not (for some unspecified epsilon). Or, depending on your algorithm, A*(1+epsilon) does and A*(1+2*epsilon) does not.
That does mean that A+epsilon compares equal to A+2*epsilon. Thus A = B and B = C does not imply A = C. This breaks common assumptions in <algorithm>.
You can still sort the values, that is a sane thing to do. But you have to consider what to do with a long range of similar values in the result. If the range is long enough, the difference between the first and last can still be large. There's no simple answer.
Requirements:
container which sorts itself based on numerically comparing the keys (e.g. std::map)
check existence of key based on float tolerance (e.g. map.find() and use custom comparator )
and the tricky one: the float tolerance used by the comparator may be changed by the user at runtime!
The first 2 can be accomplished using a map with a custom comparator:
struct floatCompare : public std::binary_function<float,float,bool>
{
bool operator()( const float &left, const float &right ) const
{
return (fabs(left - right) > 1e-3) && (left < right);
}
};
typedef std::map< float, float, floatCompare > floatMap;
Using this implementation, floatMap.find( 15.0001 ) will find 15.0 in the map.
However, let's say the user doesn't want a float tolerance of 1e-3.
What is the easiest way to make this comparator function use a variable tolerance at runtime? I don't mind re-creating and re-sorting the map based on the new comparator each time epsilon is updated.
Other posts on modification after initialization here and using floats as keys here didn't provide a complete solution.
You can't change the ordering of the map after it's created (and you should just use plain old operator< even for the floating point type here), and you can't even use a "tolerant" comparison operator as that may vioate the required strict-weak-ordering for map to maintain its state.
However you can do the tolerant search with lower_bound and upper_bound. The gist is that you would create a wrapper function much like equal_range that does a lower_bound for "value - tolerance" and then an upper_bound for "value + tolerance" and see if it creates a non-empty range of values that match the criteria.
You cannot change the definition of how elements are ordered in a map once it's been instantiated. If you were to find some technical hack to do so (such as implementing a custom comparator that takes a tolerance that can change at runtime), it would evoke Undefined Behavior.
Your main alternative to changing the ordering is to create another map with a different ordering scheme. This other map could be an indexing map, where the keys are ordered in a different way, and the values arent the elements themselves, but an index in to the main map.
Alternatively maybe what you're really trying to do isn't change the ordering, but maintain the ordering and change the search parameters.
That you can do, and there are a few ways to do it.
One is to simply use map::lower_bound -- once with the lower bound of your tolerance, and once with the upper bound of your tolerance, just past the end of tolerance. For example, if you want to find 15.0 with a tolerance of 1e-5. You could lower_bound with 14.99995 and then again with 15.00005 (my math might be off here) to find the elements in that range.
Another is to use std::find_if with a custom functor, lambda, or std::function. You could declare the functor in such a way as to take the tolerance and the value at construction, and perform the check in operator().
Since this is a homework question, I'll leave the fiddly details of actually implementing all this up to you. :)
Rather than using a comparator with tolerance, which is going to fail in subtle ways, just use a consistent key that is derived from the floating point value. Make your floating point values consistent using rounding.
inline double key(double d)
{
return floor(d * 1000.0 + 0.5);
}
You can't achieve that with a simple custom comparator, even if it was possible to change it after the definition, or when resorting using a new comparator. The fact is: a "tolerant comparator" is not really a comparator. For three values, it's possible that a < c (difference is large enough) but neither a < b nor b < c (both difference too small). Example: a = 5.0, b = 5.5, c = 6.0, tolerance = 0.6
What you should do instead is to use default sorting using operator< for floats, i.e. simply don't provide any custom comparator. Then, for the lookup don't use find but rather lower_bound and upper_bound with modified values according to the tolerance. These two function calls will give you two iterators which define the sequence which will be accepted using this tolerance. If this sequence is empty, the key was not found, obviously.
You then might want to get the key which is closest to the value to be searched for. If this is true, you should then find the min_element of this subsequence, using a comparator which will consider the difference between the key and the value to be searched.
template<typename Map, typename K>
auto tolerant_find(const Map & map, const K & lookup, const K & tolerance) -> decltype(map.begin()) {
// First, find sub-sequence of keys "near" the lookup value
auto first = map.lower_bound(lookup - tolerance);
auto last = map.upper_bound(lookup + tolerance);
// If they are equal, the sequence is empty, and thus no entry was found.
// Return the end iterator to be consistent with std::find.
if (first == last) {
return map.end();
}
// Then, find the one with the minimum distance to the actual lookup value
typedef typename Map::mapped_type T;
return std::min_element(first, last, [lookup](std::pair<K,T> a, std::pair<K,T> b) {
return std::abs(a.first - lookup) < std::abs(b.first - lookup);
});
}
Demo: http://ideone.com/qT3JIa
It may be better to leave the std::map class alone (well, partly at least), and just write your own class which implements the three methods you mentioned.
template<typename T>
class myMap{
private:
float tolerance;
std::map<float,T> storage;
public:
void setTolerance(float t){tolerance=t;};
std::map<float,T>::iterator find(float val); // ex. same as you provided, just change 1e-3 for tolerance
/* other methods go here */
};
That being said, I don't think you need to recreate the container and sort it depending on the tolerance.
check existence of key based on float tolerance
merely means you have to check if an element exists. The position of the elements inside the map shouldn't change. You could start the search from val-tolerance, and when you find an element (the function find returns an iterator), get the next elements untill you reach the end of the map or untill their values exceed val+tolerance.
That basically means that the behavior of the insert/add/[]/whatever functions isn't based on the tolerance, so there's no real problem of storing the values.
If you're afraid the elements will be too close to eachother, you may want to start the searching from val, and then gradually increase the toleration untill it reaches the user desired one.
I've got two vector<MyType*> objects called A and B. The MyType class has a field ID and I want to get the MyType* which are in A but not in B. I'm working on a image analysis application and I was hoping to find a fast/optimized solution.
The unordered approach will typically have quadratic complexity unless the data is sorted beforehand (by your ID field), in which case it would be linear and would not require repeated searches through B.
struct CompareId
{
bool operator()(const MyType* a, const MyType* b) const
{
return a>ID < b->ID;
}
};
...
sort(A.begin(), A.end(), CompareId() );
sort(B.begin(), B.end(), CompareId() );
vector<MyType*> C;
set_difference(A.begin(), A.end(), B.begin(), B.end(), back_inserter(C) );
Another solution is to use an ordered container like std::set with CompareId used for the StrictWeakOrdering template argument. I think this would be better if you need to apply a lot of set operations. That has its own overhead (being a tree) but if you really find that to be an efficiency problem, you could implement a fast memory allocator to insert and remove elements super fast (note: only do this if you profile and determine this to be a bottleneck).
Warning: getting into somewhat complicated territory.
There is another solution you can consider which could be very fast if applicable and you never have to worry about sorting data. Basically, make any group of MyType objects which share the same ID store a shared counter (ex: pointer to unsigned int).
This will require creating a map of IDs to counters and require fetching the counter from the map each time a MyType object is created based on its ID. Since you have MyType objects with duplicate IDs, you shouldn't have to insert to the map as often as you create MyType objects (most can probably just fetch an existing counter).
In addition to this, have a global 'traversal' counter which gets incremented whenever it's fetched.
static unsigned int counter = 0;
unsigned int traversal_counter()
{
// make this atomic for multithreaded applications and
// needs to be modified to set all existing ID-associated
// counters to 0 on overflow (see below)
return ++counter;
}
Now let's go back to where you have A and B vectors storing MyType*. To fetch the elements in A that are not in B, we first call traversal_counter(). Assuming it's the first time we call it, that will give us a traversal value of 1.
Now iterate through every MyType* object in B and set the shared counter for each object from 0 to the traversal value, 1.
Now iterate through every MyType* object in A. The ones that have a counter value which doesn't match the current traversal value(1) are the elements in A that are not contained in B.
What happens when you overflow the traversal counter? In this case, we iterate through all the counters stored in the ID map and set them back to zero along with the traversal counter itself. This will only need to occur once in about 4 billion traversals if it's a 32-bit unsigned int.
This is about the fastest solution you can apply to your given problem. It can do any set operation in linear complexity on unsorted data (and always, not just in best-case scenarios like a hash table), but it does introduce some complexity so only consider it if you really need it.
Sort both vectors (std::sort) according to ID and then use std::set_difference. You will need to define a custom comparator to pass to both of these algorithms, for example
struct comp
{
bool operator()(MyType * lhs, MyType * rhs) const
{
return lhs->id < rhs->id;
}
};
First look at the problem. You want "everything in A not in B". That means you're going to have to visit "everything in A". You'll also have to visit everything in B to have knowledge of what is and is not in B. So that suggests there should be an O(n) + O(m) solution, or taking liberty to elide the difference between n and m, O(2n).
Let's consider the std::set_difference approach. Each sort is O(n log n), and set_difference is O(n). So the sort-sort-set_difference approach is O(n + 2n log n). Let's call that O(4n).
Another approach would be to first place the elements of B in a set (or map). Iteration across B to create the set is O(n) plus insertion O(log n) of each element, followed by iteration across A O(n), with a lookup for each element of A (log n), gives a total: O(2n log n). Let's call that O(3n), which is slightly better.
Finally, using an unordered_set (or unordered_map), and assuming we get average case of O(1) insertion and O(1) lookup, we have an approach that is O(2n). A-ha!
The real win here is that unordered_set (or map) is probably the most natural choice to represent your data in the first place, i.e., the proper design yields the optimized implementation. That doesn't always happen, but it's nice when it does!
If B preexists to A, then while populating A, you can bookkeep in a C vector.
I have a problem getting boost::multi_index_container work with random-access and with orderd_unique at the same time. (I'm sorry for the lengthly question, but I think I should use an example..)
Here an example: Suppose I want to produce N objects in a factory and for each object I have a demand to fulfill (this demand is known at creation of the multi-index).
Well, within my algorithm I get intermediate results, which I store in the following class:
class intermediate_result
{
private:
std::vector<int> parts; // which parts are produced
int used_time; // how long did it take to produce
ValueType max_value; // how much is it worth
};
The vector parts descibes, which objects are produced (its length is N and it is lexicographically smaller then my coresp demand-vector!) - for each such vector I know the used_time as well. Additionally I get a value for this vector of produced objects.
I got another constraint so that I can't produce every object - my algorithm needs to store several intermediate_result-objects in a data-structure. And here boost::multi_index_container is used, because the pair of parts and used_time describes a unique intermediate_result (and it should be unique in my data-structure) but the max_value is another index I'll have to consider, because my algorithm always needs the intermediate_result with the highest max_value.
So I tried to use boost::multi_index_container with ordered_unique<> for my "parts&used_time-pair" and ordered_non_unique<> for my max_value (different intermediate_result-objects may have the same value).
The problem is: the predicate, which is needed to decide which "parts&used_time-pair" is smaller, uses std::lexicographical_compare on my parts-vector and hence is very slow for many intermediate_result-objects.
But there would be a solution: my demand for each object isn't that high, therefore I could store on each possible parts-vector the intermediate results uniquely by its used_time.
For example: if I have a demand-vector ( 2 , 3 , 1) then I need a data-structure which stores (2+1)*(3+1)*(1+1)=24 possible parts-vectors and on each such entry the different used_times, which have to be unique! (storing the smallest time is insufficient - for example: if my additional constraint is: to meet a given time exactly for production)
But how do I combine a random_access<>-index with an ordered_unique<>-index?
(Example11 didn't help me on this one..)
To use two indices you could write the following:
indexed_by<
random_access< >,
ordered_unique<
composite_key<
intermediate_result,
member<intermediate_result, int, &intermediate_result::used_time>,
member<intermediate_result, std::vector<int>, &intermediate_result::parts>
>
>
>
You could use composite_key for comparing used_time at first and vector only if necessary. Besides that, keep in mind that you could use member function as index.
(I had to use an own answer to write code-blocks - sorry!)
The composite_key with used_time and parts (as Kirill V. Lyadvinsky suggested) is basically what I've already implemented. I want to get rid of the lexicographical compare of the parts-vector.
Suppose I've stored the needed_demand somehow then I could write a simple function, which returns the correct index within a random-access data-structure like that:
int get_index(intermediate_result &input_result) const
{
int ret_value = 0;
int index_part = 1;
for(int i=0;i<needed_demand.size();++i)
{
ret_value += input_result.get_part(i) * index_part;
index_part *= (needed_demand.get_part(i) + 1);
}
}
Obviously this can be implemented more efficiently and this is not the only possible index ordering for the needed demand. But let's suppose this function exists as a member-function of intermediate_result! Is it possible to write something like this to prevent lexicographical_compare ?
indexed_by<
random_access< >,
ordered_unique<
composite_key<
intermediate_result,
member<intermediate_result, int, &intermediate_result::used_time>,
const_mem_fun<intermediate_result,int,&intermediate_result::get_index>
>
>
>
If this is possible and I initialized the multi-index with all possible parts-vectors (i.e. in my comment above I would've pushed 24 empty maps in my data-structure), does this find the right entry for a given intermediate_result in constant time (after computing the correct index with get_index) ?
I have to ask this, because I don't quite see, how the random_access<> index is linked with the ordered_unique<> index..
But thank you for your answers so far!!