STL "closest" method?

STL "closest" method? - c++

I'm looking for an STL sort that returns the element "closest" to the target value if the exact value is not present in the container. It needs to be fast, so essentially I'm looking for a slightly modified binary search... I could write it, but it seems like something that should already exist...

Do you mean the lower_bound/upper_bound functions? These perform a binary search and return the closest element above the value you're looking for.
Clarification: The global versions of lower/upper_bound only work if the range is sorted, as they use some kind of binary search internally. (Obviously, the lower/upper_bound methods in std::map always work). You said in your question that you were looking for some kind of binary search, so I'll assume the range is sorted.
Also, Neither lower_bound nor upper_bound returns the closest member. If the value X you're looking for isn't a member of the range, they will both return the first element greater then X. Otherwise, lower_bound will return the first value equal to X, upper_boundwill return the last value equals X.
So to find the closest value, you'd have to
call lower_bound
if it returns the end of the range, all values are less then X. The last (i.e. the highest) element is the closest one
it if returns the beginning of the range, all values are greater then X. The first (i.e. the lowest) element is the closest one
if it returns an element in the middle of the range, check that element and the element before - the one that's closer to X is the one you're looking for

So you're looking for an element which has a minimal distance from some value k?
Use std::transform to transform each x to x-k. The use std::min_element with a comparison function which returns abs(l) < abs(r). Then add k back onto the result.
EDIT: Alternatively, you could just use std::min_element with a comparison function abs(l-k) < abs(r-k), and eliminate the std::transform.
EDIT2: This is good for unsorted containers. For sorted containers, you probably want nikie's answer.

If the container is already sorted (as implied) you should be able to use std::upper_bound and the item directly before to figure out which is closest:
// Untested.
template <class Iter, class T>
Iter closest_value(Iter begin, Iter end, T value)
{
Iter result = std::upper_bound(begin, end, value);
if(result != begin)
{
Iter lower_result = result;
--lower_result;
if(result == end || ((value - *lower_result) < (*result - value)))
{
result = lower_result;
}
}
return result;
}
If the container is not sorted, use min_element with a predicate as already suggested.

If your data is not sorted, use std::min_element with a comparison functor that calculates your distance.

Related

For loop exit condition with map iterator

I have a std::map<str,int> my_map
Right now, the key-value mapping looks like this -
{["apple",3],["addition",2],["app",7],["adapt",8]}
Objective:
Calculate the sum of values of keys with a given prefix.
Example : sum("ap") should return 10 (3 + 7).
I could implement it with two loops and an if condition. But, I'm trying to understand the following code that's submitted by someone to implement this.
for (auto it = my_map.lower_bound(prefix);
it != my_map.end() && it->first.substr(0, n) == prefix;
it++)
Won't the loop condition become false in the middle of iterating through my_map hence calculating an incorrect sum ?
I don't know how the code is able to give the right result. Why wouldn't the loop exit when it gets to key "addition" while looking for prefix "ap" ?
Any kind of help is appreciated.

The loop is completely correct, but not so readable at first sight.
We have std::map which is an associative container and sorted according to the compare function provided. For your map (i.e std::map<std:.string, int>), it will be sorted according to the std::string (i.e key).
So your map is already ordered like :
{["adapt",8], ["addition",2], ....., ["app",7], ["apple",3], .... }
Now let's start with the std::lower_bound:
Returns an iterator pointing to the first element in the range [first,
Last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
Meaning at the loop start:
auto it = my_map.lower_bound(prefix);
iterator it is pointing to the map entry ["app",7]. In otherwards the iteration starts from the first possible start.
["app",7], ["apple",3], ....
Now the condition comes in to play:
it != my_map.end() && it->first.substr(0, n) == prefix;
The first one to see whether the iterator is valid (i.e. it != my_map.end()).
The second one checks whether the prefix is the same as the key start (i.e. it->first.substr(0, n) == prefix;). Since we start from the sorted possible prefix start, the outcome of the loop will be correct.

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!

You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.

Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)

If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.

If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});

C++ get index of element of array by value

So far, I have been storing the array in a vector and then looping through the vector to find the matching element and then returning the index.
Is there a faster way to do this in C++? The STL structure I use to store the array doesn't really matter to me (it doesn't have to be a vector). My array is also unique (no repeating elements) and ordered (e.g. a list of dates going forward in time).

Since the elements are sorted, you can use a binary search to find the matching element. The C++ Standard Library has a std::lower_bound algorithm that can be used for this purpose. I would recommend wrapping it in your own binary search algorithm, for clarity and simplicity:
/// Performs a binary search for an element
///
/// The range `[first, last)` must be ordered via `comparer`. If `value` is
/// found in the range, an iterator to the first element comparing equal to
/// `value` will be returned; if `value` is not found in the range, `last` is
/// returned.
template <typename RandomAccessIterator, typename Value, typename Comparer>
auto binary_search(RandomAccessIterator const first,
RandomAccessIterator const last,
Value const& value,
Comparer comparer) -> RandomAccessIterator
{
RandomAccessIterator it(std::lower_bound(first, last, value, comparer));
if (it == last || comparer(*it, value) || comparer(value, *it))
return last;
return it;
}
(The C++ Standard Library has a std::binary_search, but it returns a bool: true if the range contains the element, false otherwise. It's not useful if you want an iterator to the element.)
Once you have an iterator to the element, you can use std::distance algorithm to compute the index of the element in the range.
Both of these algorithms work equally well any random access sequence, including both std::vector and ordinary arrays.

If you want to associate a value with an index and find the index quickly you can use std::map or std::unordered_map. You can also combine these with other data structures (e.g. a std::list or std::vector) depending on the other operations you want to perform on the data.
For example, when creating the vector we also create a lookup table:
vector<int> test(test_size);
unordered_map<int, size_t> lookup;
int value = 0;
for(size_t index = 0; index < test_size; ++index)
{
test[index] = value;
lookup[value] = index;
value += rand()%100+1;
}
Now to look up the index you simply:
size_t index = lookup[find_value];
Using a hash table based data structure (e.g. the unordered_map) is a fairly classical space/time tradeoff and can outperform doing a binary search for this sort of "reverse" lookup operation when you need to do a lot of lookups. The other advantage is that it also works when the vector is unsorted.
For fun :-) I've done a quick benchmark in VS2012RC comparing James' binary search code with a linear search and with using unordered_map for lookup, all on a vector:
To ~50000 elements unordered_set significantly (x3-4) outpeforms the binary search which is exhibiting the expected O(log N) behavior, the somewhat surprising result is that unordered_map loses it's O(1) behavior past 10000 elements, presumably due to hash collisions, perhaps an implementation issue.
EDIT: max_load_factor() for the unordered map is 1 so there should be no collisions. The difference in performance between the binary search and the hash table for very large vectors appears to be caching related and varies depending on the lookup pattern in the benchmark.
Choosing between std::map and std::unordered_map talks about the difference between ordered and unordered maps.

lower_bound in set (C++)

I've got a set and I want to find the largest number not greater than x in it. (something like lower_bound(x) ) how should i do it? Is there any predefined functions?
set<int> myset;
myset.insert(blahblahblah);
int y;
//I want y to be greatest number in myset not greater than x

You can use upper_bound like this: upper_bound(x)--. Upper bound gives you the first element greater than x, so the element you seek is the one before that. You need a special case if upper_bound returns begin().

In addition to lower_bound there is also upper_bound
C++ reference
The function returns an iterator to the first value that is strictly greater than yours. If it returns begin() then all of them are, otherwise subtract one from the resulting iterator to get the value you are looking for.

C++ How to find the biggest key in a std::map?

At the moment my solution is to iterate through the map to solve this.
I see there is a upper_bound method which can make this loop faster, but is there a quicker or more succinct way?

The end:
m.rbegin();
Maps (and sets) are sorted, so the first element is the smallest, and the last element is the largest. By default maps use std::less, but you can switch the comparer and this would of course change the position of the largest element. (For example, using std::greater would place it at begin().
Keep in mind rbegin returns an iterator. To get the actual key, use m.rbegin()->first. You might wrap it up into a function for clarity, though I'm not sure if it's worth it:
template <typename T>
inline const typename T::key_type& last_key(const T& pMap)
{
return pMap.rbegin()->first;
}
typedef std::map</* types */> map_type;
map_type myMap;
// populate
map_type::key_type k = last_key(myMap);

The entries in a std::map are sorted, so for a std::map m (assuming m.empty() is false), you can get the biggest key easily: (--m.end())->first

As std::map is assosiative array one can easily find biggest or smallest key very easily. By defualt compare function is less(<) operator so biggest key will be last element in map. Similarly if someone has different requirement anyone can modify compare function while declaring map.
std::map< key, Value, compare< key,Value > >
By default compare=std::less

Since you're not using unordered_map, your keys should be in order. Depending upon what you want to do with an iterator, you have two options:
If you want a forwards-iterator then you can use std::prev(myMap.end()). Note that --myMap.end() isn't guaranteed to work in all scenarios, so I'd usually avoid it.
If you want to iterate in reverse then use myMap.rbegin()

Since the map is just an AVL tree then, it's sorted -in an ascending order-. So, the element with largest key is the last element and you can obtain it using one of the following two methods:
1.
largestElement = (myMap.rbegin())-> first; // rbegin(): returns an iterator pointing to the last element
largestElement = (--myMap.end())->first; // end(): returns an iterator pointing to the theortical element following the last element

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

STL "closest" method? - c++

I'm looking for an STL sort that returns the element "closest" to the target value if the exact value is not present in the container. It needs to be fast, so essentially I'm looking for a slightly modified binary search... I could write it, but it seems like something that should already exist...

If your data is not sorted, use std::min_element with a comparison functor that calculates your distance.

Related

For loop exit condition with map iterator

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

C++ get index of element of array by value

lower_bound in set (C++)

C++ How to find the biggest key in a std::map?

Categories

Resources