Is there any pre-implemented library in C++ for fast search in a list like binary search? Does the normal list support any kind of find function? Or any function like exists?
I have a list of objects, I want to store them in a list, but not duplicate elements. I want to notice whether or not the new element exists in the list and do a proper action.
There is std::lower_bound() which finds a suitable position in any bidirectional sequence using O(log n) comparisons. Since linked lists don't support random access traversal is O(n). You can use std::binary_search() if you are only interested whether there is a suitable object but this algorithm isn't useful if you are interested in locating the object. Of course, a precondition for std::lower_bound() and std::binary_search() is that the sequence is sorted.
I believe you are looking for the C++ <algorithm> library. It includes a function called binary_search.
An example of it is provided on the page and echoed here:
// binary_search example
#include <iostream> // std::cout
#include <algorithm> // std::binary_search, std::sort
#include <vector> // std::vector
bool myfunction (int i,int j) { return (i<j); }
int main () {
int myints[] = {1,2,3,4,5,4,3,2,1};
std::vector<int> v(myints,myints+9); // 1 2 3 4 5 4 3 2 1
// using default comparison:
std::sort (v.begin(), v.end());
std::cout << "looking for a 3... ";
if (std::binary_search (v.begin(), v.end(), 3))
std::cout << "found!\n"; else std::cout << "not found.\n";
// using myfunction as comp:
std::sort (v.begin(), v.end(), myfunction);
std::cout << "looking for a 6... ";
if (std::binary_search (v.begin(), v.end(), 6, myfunction))
std::cout << "found!\n"; else std::cout << "not found.\n";
return 0;
}
If you are writing real C++ code you can use the algorithm standard library.
In it there is the find function which grant to you to look for a specific element defined between a range of element specified as a parameter.
You can find a real example in the same page.
Container list is not adopted for ordering storing of elements and for their direct access. Though standard class std::list has member functions sort nevertheless the search using bidirectional iterators (std::listhas bidirectional iterators) instead of random access iterators is not very effective..
It would be better if you would use some associative container as for example std::map or std::set (if you need unique elements) or std::multimap or std::multiset (if elements can be duplucated).
if the order of elements is not important then you could use some standard unordered container as std::unordered_map or std::unordered_set
Related
Is there a way to find the minimum odd element of a vector of integers without basically reimplementing std::min_element and without doing additional work like computing the vector of odd integers first?
While a custom comparison object suggested in another answer will be a simple solution for std::min_element (and similar) in particular, it won't work with all standard algorithms. A general approach that works with any standard algorithm is to define a custom iterator.
Customising, combining and extending standard algorithms can nearly always be achieved with iterators. Writing custom iterators from scratch involves a lot of boilerplate and unfortunately standard doesn't provide templates for many iterator adaptors. Boost does provide plenty of iterator adaptor templates, and in this case boost::filter_iterator should prove useful.
Instead of the more traditional iterator algorithms, you could use range algorithms instead.
Since C++20, there are a host of standard range adaptors for range algorithms which are easy to compose:
auto it = std::ranges::min_element(
container | std::views::filter(condition)
);
Note that at the moment of writing, only libstdc++ has implemented the ranges standard library.
A simple solution consists in using a custom comparator function with sd::min_element.
What should be added in the following code is to check that the obtained value is odd indeed, as mentioned by #MSalters in their answer and by #Kevin in a comment.
#include <iostream>
#include <vector>
#include <algorithm>
int main() {
std::vector<int> v = {0, 3, 4, 1};
auto comp = [](int a, int b) {
if ((a%2) and (b%2 == 0)) return true;
if ((a%2 == 0) and (b%2)) return false;
return a < b;
};
auto min_odd = std::min_element (v.begin(), v.end(), comp);
std::cout << *min_odd << std::endl;
}
A C++20 solution:
std::vector<int> ints{0, 1, 2, 3, 4, 5};
auto odd = [](int i) { return bool(i % 2); };
auto e = std::ranges::min_element(ints | std::views::filter(odd));
Yes, that's not very hard. Implement a custom comparison that sorts each even element above all odd elements. You still need to sort the odd elements in their usual order, and at the end check that there was at least one odd element in the vector.
I can't infer I can use std::set_difference from documentation, because it says sets should be ordered, which means they are not sets, but lists. Also all examples are about ordered lists, not sets.
How to know the truth?
std::set_difference is for use with arbitrary sorted inputs (pre-sorted std::vectors, std::lists, std::deques, plain array, etc.), it just happens to work with std::set (which is sorted) too.
If you're working with std::unordered_set (or std::set, and you're okay with operating in place), you'd just use the erase method to remove all elements from one such set from another to get the difference, e.g.:
for (const auto& elem : set_to_remove) {
myset.erase(elem);
}
You can also do it into a new set with std::copy_if; the recipe there is trivially adaptable to the case of symmetric difference (it's just two calls to std::copy_if, where each one runs on one input set, and is conditioned on the element not existing in other input set).
std::set is sorted. Check out the docs:
std::set is an associative container that contains a sorted set of
unique objects of type Key. Sorting is done using the key comparison
function Compare. Search, removal, and insertion operations have
logarithmic complexity. Sets are usually implemented as red-black
trees.
Therefore, you can use it in a same way as any other container that provides the required interface. The difference between std::set and e.g. std::vector is that std::set is sorting its elements on insertion and in case of std::vector you need to use std::sort function to get its elements sorted.
For example, if you need to std::set_difference for std::unordered_set, you can do it like this:
#include <set>
#include <iostream>
#include <algorithm>
#include <unordered_set>
int main() {
std::unordered_set<int> a {3, 1, 4, 6, 5, 9};
std::unordered_set<int> b {3, 1, 4};
std::set<int> c;
std::set<int> d;
std::copy(a.begin(), a.end(), std::inserter(c, c.end()));
std::copy(b.begin(), b.end(), std::inserter(d, d.end()));
std::vector<int> diff;
std::set_difference(c.begin(), c.end(), d.begin(), d.end(),
std::inserter(diff, diff.begin()));
for (auto const i : diff)
std::cout << i << ' ';
return 0;
}
See live
Both can be used to apply a function to a range of elements.
On a high level:
std::for_each ignores the return value of the function, and
guarantees order of execution.
std::transform assigns the return value to the iterator, and does
not guarantee the order of execution.
When do you prefer using the one versus the other? Are there any subtle caveats?
std::transform is the same as map. The idea is to apply a function to each element in between the two iterators and obtain a different container composed of elements resulting from the application of such a function. You may want to use it for, e.g., projecting an object's data member into a new container. In the following, std::transform is used to transform a container of std::strings in a container of std::size_ts.
std::vector<std::string> names = {"hi", "test", "foo"};
std::vector<std::size_t> name_sizes;
std::transform(names.begin(), names.end(), std::back_inserter(name_sizes), [](const std::string& name) { return name.size();});
On the other hand, you execute std::for_each for the sole side effects. In other words, std::for_each closely resembles a plain range-based for loop.
Back to the string example:
std::for_each(name_sizes.begin(), name_sizes.end(), [](std::size_t name_size) {
std::cout << name_size << std::endl;
});
Indeed, starting from C++11 the same can be achieved with a terser notation using range-based for loops:
for (std::size_t name_size: name_sizes) {
std::cout << name_size << std::endl;
}
Your high level overview
std::for_each ignores the return value of the function and guarantees order of execution.
std::transform assigns the return value to the iterator, and does not guarantee the order of execution.
pretty much covers it.
Another way of looking at it (to prefer one over the other);
Do the results (the return value) of the operation matter?
Is the operation on each element a member method with no return value?
Are there two input ranges?
One more thing to bear in mind (subtle caveat) is the change in the requirements of the operations of std::transform before and after C++11 (from en.cppreference.com);
Before C++11, they were required to "not have any side effects",
After C++11, this changed to "must not invalidate any iterators, including the end iterators, or modify any elements of the ranges involved"
Basically these were to allow the undetermined order of execution.
When do I use one over the other?
If I want to manipulate each element in a range, then I use for_each. If I have to calculate something from each element, then I would use transform. When using the for_each and transform, I normally pair them with a lambda.
That said, I find my current usage of the traditional for_each being diminished somewhat since the advent of the range based for loops and lambdas in C++11 (for (element : range)). I find its syntax and implementation very natural (but your mileage here will vary) and a more intuitive fit for some use cases.
Although the question has been answered, I believe that this example would clarify the difference further.
for_each belongs to non-modifying STL operations, meaning that these operations do not change elements of the collection or the collection itself. Therefore, the value returned by for_each is always ignored and is not assigned to a collection element.
Nonetheless, it is still possible to modify elements of collection, for example when an element is passed to the f function using reference. One should avoid such behavior as it is not consistent with STL principles.
In contrast, transform function belongs to modifying STL operations and applies given predicates (unary_op or binary_op) to elements of the collection or collections and store results in another collection.
#include <vector>
#include <iostream>
#include <algorithm>
#include <functional>
using namespace std;
void printer(int i) {
cout << i << ", ";
}
int main() {
int mynumbers[] = { 1, 2, 3, 4 };
vector<int> v(mynumbers, mynumbers + 4);
for_each(v.begin(), v.end(), negate<int>());//no effect as returned value of UnaryFunction negate() is ignored.
for_each(v.begin(), v.end(), printer); //guarantees order
cout << endl;
transform(v.begin(), v.end(), v.begin(), negate<int>());//negates elements correctly
for_each(v.begin(), v.end(), printer);
return 0;
}
which will print:
1, 2, 3, 4,
-1, -2, -3, -4,
Real example of using std::tranform is when you want to convert a string to uppercase, you can write code like this :
std::transform(s.begin(), s.end(), std::back_inserter(out), ::toupper);
if you will try to achieve same thing with std::for_each like :
std::for_each(s.begin(), s.end(), ::toupper);
It wont convert it into uppercase string
The following snippet is returning me 0. I expected it to be 1. What's wrong going on here?
#include <iostream>
#include <iterator>
#include <ostream>
#include <algorithm>
#include <vector>
using namespace std;
int main(){
vector<int> v;
int arr[] = {10,20,30,40,50};
v.push_back(11);
v.push_back(22);
copy(arr,arr + sizeof(arr)/sizeof(arr[0]),back_inserter(v)); // back_inserter makes space starting from the end of vector v
for(auto i = v.begin(); i != v.end(); ++i){
cout << *i << endl;
}
cout << endl << "Binary Search - " << binary_search(v.begin(), v.end(), 10) <<endl; // returns bool
}
I am using gcc /usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper
I ran the program and saw this:
11
22
10
20
30
40
50
Binary Search - 0
Your array is not sorted, therefore, binary search fails. (it sees 11 in the first position, and concludes 10 does not exist here)
You either want to ensure the array is sorted before binary searching or use the regular std::find.
binary_search says:
Checks if the sorted range [first, last) contains an element equal to
value. The first version uses operator< to compare the elements, the
second version uses the given comparison function comp.
Your list is not sorted, it contains the elements 11 and 22 prior to 10.
Your array is not sorted, so binary_search got undefined behavior. Try std::find instead
bool found = std::find(v.begin(), v.end(), 10) != v.end()
ยง25.4.3.4 of the C++11 standard (3242 draft)
Requires: The elements e of [first,last) are partitioned with respect to the expressions e < value and !(value < e) or comp(e,
value) and !comp(value, e). Also, for all elements e of [first, last),
e < value implies !(value < e) or comp(e, value) implies !comp(value,
e).
"Unexpected behavior"? There's nothing unexpected here.
The whole idea of binary search algorithm is taking advantage of the fact that the input array is sorted. If the array is not sorted, there can't be any binary search on it.
When you use std::binary_search (as well as all other standard binary search-based algorithms), the input sequence must be sorted in accordance with the same comparison predicate as the one used by std::binary_search. Since you did not pass any custom predicate to std::binary_search, it will use the ordering defined by < operator. That means that your input Sequence of integers must be sorted in ascending order.
In your case the input sequence does not satisfy that requirement. std::binary_search cannot be used on it.
Let's assume I have a vector<node> containing 10000 objects:
vect[0] to vect[9999]
struct node
{
int data;
};
And let's say I want to find the vector id that contain this data ("444"), which happens to be in node 99.
Do I really have to do a for-loop to loop through all the elements then use
if (data == c[i].data)
Or is there a quicker way? Consider that my data is distinct and won't repeat in other nodes.
For this answer I am assuming that you've made an informed decision to use a std::vector over the other containers available.
Do I really have to do a for-loop to loop through all the elements?
No, you do not have to roll a for-loop to find an element. The idiomatic way of finding an element in a container is to use an algorithm from the standard library. Whether you should roll your own really depends on the situation.
To help you decide...
Alternative 1:
std::find() requires a that there is a suitable equality comparator for your node data type, which may be as simple as this:
bool operator ==(node const& l, node const& r)
{
return l.data == r.data;
}
Then, given a required node, you can search for the element. This returns an iterator (or a pointer if you're using a plain old array). If you need the index, this requires a little calculation:
auto i = std::find(v.begin(), v.end(), required);
if (i != v.end())
{
std::cout << i->data << " found at index " << i - v.begin() << std::endl;
}
else
{
std::cout << "Item not found" << std::endl;
}
Alternative 2:
If creating a node is too expensive or you don't have an equality operator, a better approach would be to use std::find_if(), which takes a predicate (here I use a lambda because it's succinct, but you could use a functor like in this answer):
// Alternative linear search, using a predicate...
auto i = std::find_if(v.begin(), v.end(), [](node const& n){return n.data == 444;});
if (i != v.end())
{
std::cout << i->data << " found at index " << i - v.begin() << std::endl;
}
else
{
std::cout << "Item not found" << std::endl;
}
Or is there a quicker way?
Again, it depends. std::find() and std::find_if() run in linear time (O(n)), the same as your for-loop.
That said, using std::find() or std::find_if() won't involve random access or indexing into the container (they use iterators) but they may require a little bit of extra code compared with your for-loop.
Alternative 3:
If running time is critical and your array is sorted (say with std::sort()), you could perform a binary-search, which runs in logarithmic time (O(log n)). std::lower_bound() implements a binary search for the first element that is not less than the given value. It does not take a predicate unfortunately but requires a suitable less-than comparator for your node data type, such as:
bool operator <(node const& l, node const& r)
{
return l.data < r.data;
}
The invocation is similar to std::find() and returns an iterator, but requires an extra check:
auto i = std::lower_bound(v.begin(), v.end(), required);
if (i != v.end() && i->data == required.data)
{
std::cout << i->data << " found at index " << i - v.begin() << std::endl;
}
else
{
std::cout << "Item not found" << std::endl;
}
These functions from the Algorithms Library work with any container supplying an iterator, so switching to another container from std::vector would be quick and easy to test and to maintain.
The decision is yours!
[See a demonstration here.]
You should use std::find. You can't get faster than linear complexity (O(n)) if you know nothing about the vector beforehand (like it being sorted).
If you want to find elements in the container then vector is not the right data-structure. You should use an ordered container such as std::set or std::map. Since elements in these containers are kept ordered (sorted), we can find elements in O(log (n)) time as opposed to linear time for unordered containers.
Use std::find :
vector<int>::Iterator it = find (vect.begin(), vect.end(), 444);
Note that If you have sorted vector, you can make it faster.
A neat solution would be to add an extra int index member to the node struct to provide data-to-index mapping when you have an instance of the struct. In such a case, you should probably wrap std::vector in a NodeVector class which will handle the updating of indices when, say, you remove an item (it's enough to subtract 1 from elements' indices which preceed the element being removed in such a case) etc. If the vector doesn't change the number of elements, that's not even an issue. Other than that, if you can't have an instance of the struct grow in size, use std::map. Iterating over the containter to find one item is not very smart, unless you need to do it very rarely and making anything complicated isn't worth the trouble.