<algorithm> sort custom condition

<algorithm> sort custom condition - c++

Okay, so I've tried to use sort to vector of items so the size of two adjecant items is <= 2d.
So here's my attempt:
struct item{
long number;
long size;
};
// d is global variable.
bool check(const item& x, const item& y)
{
return ((x.size + y.size) <= (2 * d));
}
// Items is a vector of item.
sort(items.begin(), items.end(), check);
What am I doing wrong or it's even impossible to sort using condition like that ?

it's even impossible to sort using condition like that ?
No. The comparer in sort must satisfy the criteria of a strict weak ordering which yours clearly doesn’t (for instance it’s not irreflexive).

This problem cannot be solved in O(N log N) time. I don't know if it's NP-hard, but it's quite non-trivial. I do think it's safe to say that a program solving the problem as expressed in your code would require exponential time. There are such programs: I think it could be fiddled around and plugged into a linear optimizer.
No standard library function will get you even most of the way to a general solution. There are no standard library functions slower than O(N log N), and none solve problems that may be intractable.
This problem is intractable if, for example, every size equals 10 * d.

You're using the sort() method wrong.
STL sort is used to order a list of elements. For an 'ordering', you need to satisfy conditions like:
if check( A, B ) == false AND A != B, then check( B, A ) returns true.
if check( A, B ) == false AND check( B, C) == false AND A, B, C are distinct, then check ( A, C ) returns false.
A good idea for where you can use STL's sort() is, given your list of items S and the order you want the items to be in:
If the order of the items in S changes, the output order should remain the same.
The output is unique.
All the items in the output order have some relation that is a strict partial order relation.
If this is the case, then you probably can write the check function to work for you :)

Related

Good hash function over C++ unordered_set

I'm looking to implement a hash function over a C++ std::unordered_set<char>. I initially tried using boost::hash_range:
namespace std
{
template<> struct hash<unordered_set<char> >
size_t operator(const unordered_set<char> &s)(
{
return boost::hash_range(begin(s), end(s))
};
}
But then I realised that because the set is unordered, the iteration order isn't stable, and the hash function is thus wrong. What are some better options for me? I guess I could std::set instead of std::unordered_set, but using an ordered set just because it's easier to hash seems ... wrong.

A very similar question, albeit in C#, was asked here:
Hash function on list independant of order of items in it
Over there, Per gave a nice language-independent answer that should put you on the right track. In short, for the input
x1, …, xn
you should map it to
f(x1) op … op f(xn)
where
f is a good hash function for single elements (integer in your case)
op is a commutative operator, such as xor or plus
Hashing an integer may seam pointless at first, but your goal is to make two neighboring integers be dissimilar from each other, so that when combined with op do not create the same result. e.g. if you use + as the operator, you want f(1)+f(2) to give a different result than f(0)+f(3).
If standard hashing functions are not good candidates for f and you cannot find one, check the linked answer for more details...

You could try simply adding which is independent of order and returning the hash of that:
template<> struct hash<unordered_set<char> >
size_t operator(const unordered_set<char> &s) {
long long sum{0};
for ( auto e : s )
sum += s;
return std::hash(sum);
};

Error:"invalid comparator" when sorting using custom comparison function

I am trying to sort some integers and make odd integers followed by even ones. I am using Visual Studio 2015.
Here's my code:
int w[]={1,2,3,4,5,6};
sort(w,w+6,[](const int&i,const int&j)->bool {
return (i&1)==(j&1)//When both are odd or even, the order is OK
||i&1;//if one is odd and one is even,check if the first one is odd
});
When executed, it encounters an error says "Expression: invalid comparator". I don't know why it would cause this error. How to modify it?

sort requires a strict weak ordering. Your comparator isn't one. Among many other things, for a strict weak ordering, comp(x, x) must be false.
sort is the wrong algorithm for this anyway (yes, you can contort it to do what you want; no, you shouldn't do it). What you want to do is a partition. For that, we have std::partition:
std::partition(std::begin(w), std::end(w), [](int x) { return x % 2 != 0; });
Or std::stable_partition, if you want the partition to be stable (preserve the relative order of elements).

Tolerant key lookup in std::map

Requirements:
container which sorts itself based on numerically comparing the keys (e.g. std::map)
check existence of key based on float tolerance (e.g. map.find() and use custom comparator )
and the tricky one: the float tolerance used by the comparator may be changed by the user at runtime!
The first 2 can be accomplished using a map with a custom comparator:
struct floatCompare : public std::binary_function<float,float,bool>
{
bool operator()( const float &left, const float &right ) const
{
return (fabs(left - right) > 1e-3) && (left < right);
}
};
typedef std::map< float, float, floatCompare > floatMap;
Using this implementation, floatMap.find( 15.0001 ) will find 15.0 in the map.
However, let's say the user doesn't want a float tolerance of 1e-3.
What is the easiest way to make this comparator function use a variable tolerance at runtime? I don't mind re-creating and re-sorting the map based on the new comparator each time epsilon is updated.
Other posts on modification after initialization here and using floats as keys here didn't provide a complete solution.

You can't change the ordering of the map after it's created (and you should just use plain old operator< even for the floating point type here), and you can't even use a "tolerant" comparison operator as that may vioate the required strict-weak-ordering for map to maintain its state.
However you can do the tolerant search with lower_bound and upper_bound. The gist is that you would create a wrapper function much like equal_range that does a lower_bound for "value - tolerance" and then an upper_bound for "value + tolerance" and see if it creates a non-empty range of values that match the criteria.

You cannot change the definition of how elements are ordered in a map once it's been instantiated. If you were to find some technical hack to do so (such as implementing a custom comparator that takes a tolerance that can change at runtime), it would evoke Undefined Behavior.
Your main alternative to changing the ordering is to create another map with a different ordering scheme. This other map could be an indexing map, where the keys are ordered in a different way, and the values arent the elements themselves, but an index in to the main map.
Alternatively maybe what you're really trying to do isn't change the ordering, but maintain the ordering and change the search parameters.
That you can do, and there are a few ways to do it.
One is to simply use map::lower_bound -- once with the lower bound of your tolerance, and once with the upper bound of your tolerance, just past the end of tolerance. For example, if you want to find 15.0 with a tolerance of 1e-5. You could lower_bound with 14.99995 and then again with 15.00005 (my math might be off here) to find the elements in that range.
Another is to use std::find_if with a custom functor, lambda, or std::function. You could declare the functor in such a way as to take the tolerance and the value at construction, and perform the check in operator().
Since this is a homework question, I'll leave the fiddly details of actually implementing all this up to you. :)

Rather than using a comparator with tolerance, which is going to fail in subtle ways, just use a consistent key that is derived from the floating point value. Make your floating point values consistent using rounding.
inline double key(double d)
{
return floor(d * 1000.0 + 0.5);
}

You can't achieve that with a simple custom comparator, even if it was possible to change it after the definition, or when resorting using a new comparator. The fact is: a "tolerant comparator" is not really a comparator. For three values, it's possible that a < c (difference is large enough) but neither a < b nor b < c (both difference too small). Example: a = 5.0, b = 5.5, c = 6.0, tolerance = 0.6
What you should do instead is to use default sorting using operator< for floats, i.e. simply don't provide any custom comparator. Then, for the lookup don't use find but rather lower_bound and upper_bound with modified values according to the tolerance. These two function calls will give you two iterators which define the sequence which will be accepted using this tolerance. If this sequence is empty, the key was not found, obviously.
You then might want to get the key which is closest to the value to be searched for. If this is true, you should then find the min_element of this subsequence, using a comparator which will consider the difference between the key and the value to be searched.
template<typename Map, typename K>
auto tolerant_find(const Map & map, const K & lookup, const K & tolerance) -> decltype(map.begin()) {
// First, find sub-sequence of keys "near" the lookup value
auto first = map.lower_bound(lookup - tolerance);
auto last = map.upper_bound(lookup + tolerance);
// If they are equal, the sequence is empty, and thus no entry was found.
// Return the end iterator to be consistent with std::find.
if (first == last) {
return map.end();
}
// Then, find the one with the minimum distance to the actual lookup value
typedef typename Map::mapped_type T;
return std::min_element(first, last, [lookup](std::pair<K,T> a, std::pair<K,T> b) {
return std::abs(a.first - lookup) < std::abs(b.first - lookup);
});
}
Demo: http://ideone.com/qT3JIa

It may be better to leave the std::map class alone (well, partly at least), and just write your own class which implements the three methods you mentioned.
template<typename T>
class myMap{
private:
float tolerance;
std::map<float,T> storage;
public:
void setTolerance(float t){tolerance=t;};
std::map<float,T>::iterator find(float val); // ex. same as you provided, just change 1e-3 for tolerance
/* other methods go here */
};
That being said, I don't think you need to recreate the container and sort it depending on the tolerance.
check existence of key based on float tolerance
merely means you have to check if an element exists. The position of the elements inside the map shouldn't change. You could start the search from val-tolerance, and when you find an element (the function find returns an iterator), get the next elements untill you reach the end of the map or untill their values exceed val+tolerance.
That basically means that the behavior of the insert/add/[]/whatever functions isn't based on the tolerance, so there's no real problem of storing the values.
If you're afraid the elements will be too close to eachother, you may want to start the searching from val, and then gradually increase the toleration untill it reaches the user desired one.

C++ Array Intersection

Does anyone know if it's possible to turn this from O(m * n) to O(m + n)?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theFirst.push_back(1);
theFirst.push_back(22);
theFirst.push_back(1);
theSecond.push_back(1);
theSecond.push_back( -2147483648 );
theSecond.push_back(3);
theSecond.push_back(44);
theSecond.push_back(32);
theSecond.push_back(1);
for( int i = 0; i < theFirst.size(); i++ )
{
for( int x = 0; x < theSecond.size(); x++ )
{
if( theFirst[i] == theSecond[x] )
{
theMatch.push_back( theFirst[i] );
}
}
}

Put the contents of the first vector into a hash set, such as std::unordered_set. That is O(m). Scan the second vector, checking if the values are in the unordered_set and keeping a tally of those that are. That is n lookups of a hash structure, so O(n). So, O(m+n). If you have l elements in the overlap, you may count O(l) for adding them to the third vector. std::unordered_set is in the C++0x draft and available in the latest gcc versions, and there is also an implementation in boost.
Edited to use unordered_set
Using C++2011 syntax:
unordered_set<int> firstMap(theFirst.begin(), theFirst.end());
for (const int& i : theSecond) {
if (firstMap.find(i)!=firstMap.end()) {
cout << "Duplicate: " << i << endl;
theMatch.push_back(i);
}
}
Now, the question still remains, what do you want to do with duplicates in the originals? Explicitly, how many times should 1 be in theMatch, 1, 2 or 4 times?
This outputs:
Duplicate: 1
Duplicate: -2147483648
Duplicate: 44
Duplicate: 1

Using this: http://www.cplusplus.com/reference/algorithm/set_intersection/
You should be able to achieve O(mlogm + nlogn) I believe. (set_intersection requires that the input ranges be already sorted).
This might perform a bit differently than your solution for duplicate elements, however.

Please correct me if I am wrong,
you are suggesting following solution for the intersection problem:
sort two vectors, and keep iteration in both sorted vector in such a way that we reach to a common element,
so overall complexity will be
(n*log(n) + m*log(m)) + (n + m)
Assuming k*log(k) as complexity of sorting
Am I right?
Ofcourse the complexity will depend on the complexity of sorting.

I would sort the longer array O(n*log (n)), search for elements from the shorter array O(m*log (n)). Total is then O(n*log(n) + m*log (n) )

Assuming you want to produce theMatch from two data sets, and you don't care about the data sets themselves, put one in an unordered_map (available currently from Boost and listed in the final committee draft for C++11), mapping the key to an integer that increases whenever added to, and therefore keeps track of the number of times the key occurs. Then, when you get a hit on the other data set, you push_back the hit the number of times it occurred in the first time.
You can get to O(n log n + m log m) by sorting the vectors first, or O(n log n + m) by creating a std::map of one of them.
Caveat: these are not order-preserving operations, and theMatch will come out in different orders with different techniques. It looks to me like the order is likely considered arbitrary. If the order given in the code above is necessary, I don't think there's a better algorithm.
Edit:
Take data set A and data set B, of type Type. Create an unordered_map<Type, int>.
Go through data set A, and check each member to see if it's in the map. If not, add the element with the int 1 to the map. If it is, increment the int. Each of these operations is O(1) on the average, so this step is O(len A).
Go through data set B, and check each member to see if it's in the map. If not, go on to the next. If so, push_back the member onto the destination queue. The int is the number of times that value is in data set A, so do the push_back the number of times the member's in A to duplicate the behavior given. Each of these operations is on the average O(1), so this step is O(len B).
This is average behavior. If you always hit the worst case, you're back with O(m*n). I don't think there's a way to guarantee O(m + n).

If the order of the elements in the resulting array/set doesn't matter then the answer is yes.
For the arbitrary types of elements with some order defined the best algorithm is O( max(m,n)*log(min(m,n)) ). For the numbers of limited size the best algorithm is O(m+n).
Construct the set of elements of smaller array - for arbitrary elements just sorting is OK and for the numbers of limited size it must be something similar to intermediate table in numeric sort.
Iterate through larger array and check if the element is within a set constructed earlier - for the arbitrary element binary search is OK (which is O(log(min(n,m))) and for numbers the single check is O(1).

Difference between two vector<MyType*> A and B

I've got two vector<MyType*> objects called A and B. The MyType class has a field ID and I want to get the MyType* which are in A but not in B. I'm working on a image analysis application and I was hoping to find a fast/optimized solution.

The unordered approach will typically have quadratic complexity unless the data is sorted beforehand (by your ID field), in which case it would be linear and would not require repeated searches through B.
struct CompareId
{
bool operator()(const MyType* a, const MyType* b) const
{
return a>ID < b->ID;
}
};
...
sort(A.begin(), A.end(), CompareId() );
sort(B.begin(), B.end(), CompareId() );
vector<MyType*> C;
set_difference(A.begin(), A.end(), B.begin(), B.end(), back_inserter(C) );
Another solution is to use an ordered container like std::set with CompareId used for the StrictWeakOrdering template argument. I think this would be better if you need to apply a lot of set operations. That has its own overhead (being a tree) but if you really find that to be an efficiency problem, you could implement a fast memory allocator to insert and remove elements super fast (note: only do this if you profile and determine this to be a bottleneck).
Warning: getting into somewhat complicated territory.
There is another solution you can consider which could be very fast if applicable and you never have to worry about sorting data. Basically, make any group of MyType objects which share the same ID store a shared counter (ex: pointer to unsigned int).
This will require creating a map of IDs to counters and require fetching the counter from the map each time a MyType object is created based on its ID. Since you have MyType objects with duplicate IDs, you shouldn't have to insert to the map as often as you create MyType objects (most can probably just fetch an existing counter).
In addition to this, have a global 'traversal' counter which gets incremented whenever it's fetched.
static unsigned int counter = 0;
unsigned int traversal_counter()
{
// make this atomic for multithreaded applications and
// needs to be modified to set all existing ID-associated
// counters to 0 on overflow (see below)
return ++counter;
}
Now let's go back to where you have A and B vectors storing MyType*. To fetch the elements in A that are not in B, we first call traversal_counter(). Assuming it's the first time we call it, that will give us a traversal value of 1.
Now iterate through every MyType* object in B and set the shared counter for each object from 0 to the traversal value, 1.
Now iterate through every MyType* object in A. The ones that have a counter value which doesn't match the current traversal value(1) are the elements in A that are not contained in B.
What happens when you overflow the traversal counter? In this case, we iterate through all the counters stored in the ID map and set them back to zero along with the traversal counter itself. This will only need to occur once in about 4 billion traversals if it's a 32-bit unsigned int.
This is about the fastest solution you can apply to your given problem. It can do any set operation in linear complexity on unsorted data (and always, not just in best-case scenarios like a hash table), but it does introduce some complexity so only consider it if you really need it.

Sort both vectors (std::sort) according to ID and then use std::set_difference. You will need to define a custom comparator to pass to both of these algorithms, for example
struct comp
{
bool operator()(MyType * lhs, MyType * rhs) const
{
return lhs->id < rhs->id;
}
};

First look at the problem. You want "everything in A not in B". That means you're going to have to visit "everything in A". You'll also have to visit everything in B to have knowledge of what is and is not in B. So that suggests there should be an O(n) + O(m) solution, or taking liberty to elide the difference between n and m, O(2n).
Let's consider the std::set_difference approach. Each sort is O(n log n), and set_difference is O(n). So the sort-sort-set_difference approach is O(n + 2n log n). Let's call that O(4n).
Another approach would be to first place the elements of B in a set (or map). Iteration across B to create the set is O(n) plus insertion O(log n) of each element, followed by iteration across A O(n), with a lookup for each element of A (log n), gives a total: O(2n log n). Let's call that O(3n), which is slightly better.
Finally, using an unordered_set (or unordered_map), and assuming we get average case of O(1) insertion and O(1) lookup, we have an approach that is O(2n). A-ha!
The real win here is that unordered_set (or map) is probably the most natural choice to represent your data in the first place, i.e., the proper design yields the optimized implementation. That doesn't always happen, but it's nice when it does!

If B preexists to A, then while populating A, you can bookkeep in a C vector.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js