Using for loop and find instead of set_intersection? - c++

This question suggests using std::set_intersection to find the intersection of two arrays. Wouldn't using std::find work just as well?
int a[5] = {1, 2, 3, 4, 5};
int b[5] = {3, 4, 5, 6, 7};
for (int i=0; i<5; i++)
{
if (std::find(b, b+5, a[i])!=b+5)
std::cout << a[i] << std::endl;
}
Does std::set_intersection basically do the same thing? Or maybe it uses a more efficient algorithm? I think the complexity of above is O(n^2) if std::find takes O(n) time.

For all (or at least almost all) of your standard-library-complexity questions, a good reference can answer this for you.
In particular we get that std::find performs At most last - first applications of the predicate (operator< in this case) where first and last define your range to be searched.
We also get that std::set_intersection performs At most 2·(N1+N2-1) comparisons, where N1 = std::distance(first1, last1) and N2 = std::distance(first2, last2).
This means that your loop performs at most N1 * N2 applications of operator<, which in this case is 25. std::set_intersection would use at most 18.
So, the two methods would "work just as well" in the sense that they give the same answer, but std::set_intersection would take less comparisons to do it.

std::set is an ordered collection. There are faster methods (linear) for such collections (think mergesort).

std::set::find takes O(lg n), it's a binary search. So using a for loop together with find takes O(n lg n). std::set_intersection takes linear time: align the two sets to find their intersection (similar to the merge operation of mergesort).

Related

[C++][std::sort] How does it work on 2D containers?

I have this vector object that contains vector of ints
std::vector<std::vector<int>> vec;
I have been trying to figure out how std::sort(vec.begin(), vec.end()) works on it. Here are my observations:
2D vectors are sorted by size.
If some of inner vectors has the same size, the vector with lesser value of first element will have lesser index value.
I have been generating a few 2D vectors now, and it seems that these two are always true. However, I am doubting about my second assumption. Does std::sort really work this way, or it was just some luck that made my assumptions correct?
Sorting vector elements works the same way as sorting any other type. std::sort uses the comparison object given as an argument. If none was passed explicitly, std::less is the default.
std::less uses operator<. As per vector documentation, it:
Compares the contents of lhs and rhs lexicographically. The comparison is performed by a function equivalent to std::lexicographical_compare.
Lexicographical comparison is a operation with the following properties:
Two ranges are compared element by element.
The first mismatching element defines which range is lexicographically less or greater than the other.
If one range is a prefix of another, the shorter range is lexicographically less than the other.
If two ranges have equivalent elements and are of the same length, then the ranges are lexicographically equal.
An empty range is lexicographically less than any non-empty range.
Two empty ranges are lexicographically equal.
In short, lexicographical sorting is the same as sorting used for dictionaries (ignoring oddities of some languages).
2D vectors are sorted by size.
Not quite. {1}, {3, 4}, {1, 2, 5} would be sorted as {1}, {1, 2, 5}, {3, 4}.
std::sort uses operator < by default to sort. Since std::vector has an overloaded operator < it uses that. std::vector::operator < does a lexicographical compare meaning it returns the vector that has the first smaller element. That means {1, 1, 2} is less than {1, 1, 3} since the 2 is less than 3. If the vectors are of different length but the smaller one has the same elements that the larger one has then the smaller one is returned. That means that
int main()
{
std::vector a{5, 1}, b{10};
std::cout << (a < b);
}
Prints 1 since 5 is less than 10.
int main()
{
std::vector a{5, 10}, b{5};
std::cout << (a < b);
}
Prints 0 since a is larger than b but they have the same common element.

C++ library method for intersection of two unordered_set

I have two unordered_set and want the intersection of those. I can't find a library function to do that.
Essentially, what I want is this:
unordered_set<int> a = {1, 2, 3};
unordered_set<int> b = {2, 4, 1};
unordered_set<int> c = a.intersect(b); // Should be {1, 2}
I can do something like
unordered_set<int> c;
for (int element : a) {
if (b.count(element) > 0) {
c.insert(element);
}
}
but I think there should be a more convenient way to do that? If there's not, can someone explain why? I know there is set_intersection, but that seems to operate on vectors only?
Thanks
In fact, a loop-based solutions is the best thing you can use with std::unordered_set.
There is an algorithm called std::set_intersection which allows to find an intersection of two sorted ranges:
Constructs a sorted range beginning at d_first consisting of elements
that are found in both sorted ranges [first1, last1) and [first2,
last2).
As you deal with std::unordered_set, you cannot apply this algorithm because there is no guaranteed order for the elements in std::unordered_set.
My advice is to stick with loops as it explicitly says what you want to achieve and has a linear complexity (O(N), where N is a number of elements in the unordered set you traverse with a for loop) which is the best compexity you might achieve.
There is a function from std called set_intersection. However, it would have a very high complexity using it with std::set as input parameter.. A better solution is, create two vectors from those sets and use set_intersection with vectors as input parameters.

efficient way to get index in sorted vector in c++

Can anyone suggest a fast way to get the rank of each element in a vector.
I don't need to sort the vector, but only get the index of each element if the vector was sorted
for ex: {40, 20, 10, 30}
should give {3, 1, 0, 2}
Will i be able to get a speedup because i don't actually have to sort the data in-place?
The exact same proof of the lower bound on sorting applies here. Sans additional information (key distribution, etc.), it is n log(n) at a lower bound, and you might as well sort. Formally, anything lower would allow you to compress permutations below the Kolmogorov complexity.
That being said, there is the question of how to sort the indices. See here.
You may use the following:
template <typename T>
std::vector<std::size_t> compute_order(const std::vector<T>& v)
{
std::vector<std::size_t> indices(v.size());
std::iota(indices.begin(), indices.end(), 0u);
std::sort(indices.begin(), indices.end(), [&](int lhs, int rhs) {
return v[lhs] < v[rhs];
});
std::vector<std::size_t> res(v.size());
for (std::size_t i = 0; i != indices.size(); ++i) {
res[indices[i]] = i;
}
return res;
}
Live example
A first approach is to do a copy of array and sort it. Afterward you traverse the original array and on each item you perform a binary search for determing the rank. During this traversing you produce the desired sequence. With this aproach you takes O(n) for the copy, plus O(n lg n) for the sort and finally O(n lg n) for producing the sequence of ranks.
Another way is to insert all the items in a binary search tree (a balanced one such as an avl or red-black). This takes O(n lg n). Your binary tree must support "the rank extension"; that is, the sizes of each subtree must be stored in the nodes. These trees can export the operation position(key) which returns the rank of key.
Afterward, you traverse your array and for each entry you call to position(array[i]). During this process you are producing the sequence of rank which is parallel to your array. This takes O(n lg n).
I think that the advantage of this approach respect to copy into an array of pairs and then sort it or simply to sort a copy of array and then to determine the rank by searching with binary search in the copied array, is that you avoid the extra copy from the sorted array of pairs to the sequence of ranks.
Added and corrected:
According to #xiaotian-peiI answer, I think it would be even better simply to insert pairs (key, index) in a deterministically balanced binary search tree (avl or red-black) sorted by keys; that takes O(n lg n). Then you traverse the binary tree inorder extracting the indexes, what takes O(n). Finally you free the tree, what takes O(n). So the total would be O(n lg n) + O(n) + O(n)
Maybe still more efficient according to scale and not the same complexity: to use a heap of pairs (key,index) and successively extract from it for building the sequence of ranks.
And very probably faster and sure less space consuming: the algorithm published by Jarod42, what I think is O(n) + O(n lg n) + O(n) too, but that would profit more the cache
I can think of two ways: (but I don't think they will be faster)
put <value, index> pair in a map
put index in another vector, sort that vector with proper comparison function
For your case of numbers, sorting the array itself isn't harder than sorting the indices -- you'd be constructing a index set and order that by the original values.

How to traverse fixed number of elements in c++ list

I want to traverse a list in C++ but only till fifth from last not till the end.
But I see that there is no "-" operator defined so that I could use
list<>::iterator j=i-5;
I can do it using size() function somehow keeping counts etc but is there any other direct way?
Count is the only practical way that may not involve effectively traversing the list in some way.
auto myEnd = std::advance(myList.end(),-5)
but this will just traverse the last five list elements to get to your desired point, so its no faster or more elegant than most other solutions. However, using an integer loop does require keeping both an integer count and an iterator, this really only requires the iterator so in that regard it may be nicer.
If your <list> has an O(1) count, and the distance back from end is large, use an integer loop, else the above is nice.
List doesn't support random access iterators. You can use reverse iterator and counter.
You could use std::advance to get the iterator to the fifth from last.
The list has bidirectional iterator. So that to get the fifth iterator from the end iterator you should 5 times apply operation -- that is defined for bidirectional iterators. The C++ Standard provides two functions that perform this task. The first one that appeared in C++ 2003 is std::advance. The second one that appeared in C++ 2011 is std::prev. It is simpler to use the second function, std::prev, because it returns the needed iterator. For example
std::list<int> l = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
std::copy( l.begin(), std::prev( l.end(), 5 ), std::ostream_iterator<int>( std::cout, " " ) );
In addition to the available answers, I'd recommend sticking to a standard algorithm for traversing the list rather than dealing with iterators directly; if you can avoid it.
For example:
auto l = list<int>{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
for_each(begin(l), prev(end(l), 5), [](const int& i) {
cout << i << endl;
});
http://ideone.com/6wNuMP

count the number of distinct absolute values among the elements of the array

I was asked an interview question to find the number of distinct absolute values among the elements of the array. I came up with the following solution (in C++) but the interviewer was not happy with the code's run time efficiency.
I will appreciate pointers as to how I can improve the run time efficiency of this code?
Also how do I calculate the efficiency of the code below? The for loop executes A.size() times. However I am not sure about the efficiency of STL std::find (In the worse case it could be O(n) so that makes this code O(n²) ?
Code is:
int countAbsoluteDistinct ( const std::vector<int> &A ) {
using namespace std;
list<int> x;
vector<int>::const_iterator it;
for(it = A.begin();it < A.end();it++)
if(find(x.begin(),x.end(),abs(*it)) == x.end())
x.push_back(abs(*it));
return x.size();
}
To propose alternative code to the set code.
Note that we don't want to alter the caller's vector, we take by value. It's better to let the compiler copy for us than make our own. If it's ok to destroy their value we can take by non-const reference.
#include <vector>
#include <algorithm>
#include <iterator>
#include <cstdlib>
using namespace std;
int count_distinct_abs(vector<int> v)
{
transform(v.begin(), v.end(), v.begin(), abs); // O(n) where n = distance(v.end(), v.begin())
sort(v.begin(), v.end()); // Average case O(n log n), worst case O(n^2) (usually implemented as quicksort.
// To guarantee worst case O(n log n) replace with make_heap, then sort_heap.
// Unique will take a sorted range, and move things around to get duplicated
// items to the back and returns an iterator to the end of the unique section of the range
auto unique_end = unique(v.begin(), v.end()); // Again n comparisons
return distance(v.begin(), unique_end); // Constant time for random access iterators (like vector's)
}
The advantage here is that we only allocate/copy once if we decide to take by value, and the rest is all done in-place while still giving you an average complexity of O(n log n) on the size of v.
std::find() is linear (O(n)). I'd use a sorted associative container to handle this, specifically std::set.
#include <vector>
#include <set>
using namespace std;
int distict_abs(const vector<int>& v)
{
std::set<int> distinct_container;
for(auto curr_int = v.begin(), end = v.end(); // no need to call v.end() multiple times
curr_int != end;
++curr_int)
{
// std::set only allows single entries
// since that is what we want, we don't care that this fails
// if the second (or more) of the same value is attempted to
// be inserted.
distinct_container.insert(abs(*curr_int));
}
return distinct_container.size();
}
There is still some runtime penalty with this approach. Using a separate container incurs the cost of dynamic allocations as the container size increases. You could do this in place and not occur this penalty, however with code at this level its sometimes better to be clear and explicit and let the optimizer (in the compiler) do its work.
Yes, this will be O(N2) -- you'll end up with a linear search for each element.
A couple of reasonably obvious alternatives would be to use an std::set or std::unordered_set. If you don't have C++0x, you can replace std::unordered_set with tr1::unordered_set or boost::unordered_set.
Each insertion in an std::set is O(log N), so your overall complexity is O(N log N).
With unordered_set, each insertion has constant (expected) complexity, giving linear complexity overall.
Basically, replace your std::list with a std::set. This gives you O(log(set.size())) searches + O(1) insertions, if you do things properly. Also, for efficiency, it makes sense to cache the result of abs(*it), although this will have only a minimal (negligible) effect. The efficiency of this method is about as good as you can get it, without using a really nice hash (std::set uses bin-trees) or more information about the values in the vector.
Since I was not happy with the previous answer here is mine today. Your intial question does not mention how big your vector is. Suppose your std::vector<> is extremely large and have very few duplicates (why not?). This means that using another container (eg. std::set<>) will basically duplicate your memory consumption. Why would you do that since your goal is simply to count non duplicate.
I like #Flame answer, but I was not really happy with the call to std::unique. You've spent lots of time carefully sorting your vector and then simply discard the sorted array while you could be re-using it afterward.
I could not find anything really elegant in the STD library, so here is my proposal (a mixture of std::transform + std::abs + std::sort, but without touching the sorted array afterward).
// count the number of distinct absolute values among the elements of the sorted container
template<class ForwardIt>
typename std::iterator_traits<ForwardIt>::difference_type
count_unique(ForwardIt first, ForwardIt last)
{
if (first == last)
return 0;
typename std::iterator_traits<ForwardIt>::difference_type
count = 1;
ForwardIt previous = first;
while (++first != last) {
if (!(*previous == *first) ) ++count;
++previous;
}
return count;
}
Bonus point is works with forward iterator:
#include <iostream>
#include <list>
int main()
{
std::list<int> nums {1, 3, 3, 3, 5, 5, 7,8};
std::cout << count_unique( std::begin(nums), std::end(nums) ) << std::endl;
const int array[] = { 0,0,0,1,2,3,3,3,4,4,4,4};
const int n = sizeof array / sizeof * array;
std::cout << count_unique( array, array + n ) << std::endl;
return 0;
}
Two points.
std::list is very bad for search. Each search is O(n).
Use std::set. Insert is logarithmic, it removes duplicate and is sorted. Insert every value O(n log n) then use set::size to find how many values.
EDIT:
To answer part 2 of your question, the C++ standard mandates the worst case for operations on containers and algorithms.
Find: Since you are using the free function version of find which takes iterators, it cannot assume anything about the passed in sequence, it cannot assume that the range is sorted, so it must traverse every item until it finds a match, which is O(n).
If you are using set::find on the other hand, this member find can utilize the structure of the set, and it's performance is required to be O(log N) where N is the size of the set.
To answer your second question first, yes the code is O(n^2) because the complexity of find is O(n).
You have options to improve it. If the range of numbers is low you can just set up a large enough array and increment counts while iterating over the source data. If the range is larger but sparse, you can use a hash table of some sort to do the counting. Both of these options are linear complexity.
Otherwise, I would do one iteration to take the abs value of each item, then sort them, and then you can do the aggregation in a single additional pass. The complexity here is n log(n) for the sort. The other passes don't matter for complexity.
I think a std::map could also be interesting:
int absoluteDistinct(const vector<int> &A)
{
map<int, char> my_map;
for (vector<int>::const_iterator it = A.begin(); it != A.end(); it++)
{
my_map[abs(*it)] = 0;
}
return my_map.size();
}
As #Jerry said, to improve a little on the theme of most of the other answers, instead of using a std::map or std::set you could use a std::unordered_map or std::unordered_set (or the boost equivalent).
This would reduce the runtimes down from O(n lg n) or O(n).
Another possibility, depending on the range of the data given, you might be able to do a variant of a radix sort, though there's nothing in the question that immediately suggests this.
Sort the list with a Radix style sort for O(n)ish efficiency. Compare adjacent values.
The best way is to customize the quicksort algorithm such that when we are partitioning whenever we get two equal element then overwrite the second duplicate with last element in the range and then reduce the range. This will ensure you will not process duplicate elements twice. Also after quick sort is done the range of the element is answer
Complexity is still O(n*Lg-n) BUT this should save atleast two passes over the array.
Also savings are proportional to % of duplicates. Imagine if they twist original questoin with, 'say 90% of the elements are duplicate' ...
One more approach :
Space efficient : Use hash map .
O(logN)*O(n) for insert and just keep the count of number of elements successfully inserted.
Time efficient : Use hash table O(n) for insert and just keep the count of number of elements successfully inserted.
You have nested loops in your code. If you will scan each element over the whole array it will give you O(n^2) time complexity which is not acceptable in most of the scenarios. That was the reason the Merge Sort and Quick sort algorithms came up to save processing cycles and machine efforts. I will suggest you to go through the suggested links and redesign your program.