How to apply the intersection between two lists in C++? - c++

I am new to C++ list.
I have two lists: list1 and list2. I need to get common elements between these lists. How can I get this?

You can use: std::set_intersection for that, provided you first sort the two lists:
Example:
#include <algorithm>
#include <iostream>
#include <list>
int main() {
std::list<int> list1{2, 5, 7, 8, -3, 7};
std::list<int> list2{9, 1, 6, 3, 5, 2, 11, 0};
list1.sort();
list2.sort();
std::list<int> out;
std::set_intersection(list1.begin(), list1.end(), list2.begin(), list2.end(),
std::back_inserter(out));
for(auto k : out)
std::cout << k << ' ';
}
Output:
2 5
EDIT:
The above method is likely not going to be optimal, mostly because sorting a std::list isn't nice to the CPU...
For a trade-off of space, the method below will certainly be faster for larger sets of data, because we iterate through each list only once, and all operations done at each iteration doesn't go beyond a O(1) amortized complexity
template<typename T>
std::list<T> intersection_of(const std::list<T>& a, const std::list<T>& b){
std::list<T> rtn;
std::unordered_multiset<T> st;
std::for_each(a.begin(), a.end(), [&st](const T& k){ st.insert(k); });
std::for_each(b.begin(), b.end(),
[&st, &rtn](const T& k){
auto iter = st.find(k);
if(iter != st.end()){
rtn.push_back(k);
st.erase(iter);
}
}
);
return rtn;
}
I used std::unordered_multiset rather than std::unordered_set because it preserves the occurences of common duplicates in both lists
I ran a dirty benchmark for the two methods on randomly generated 9000 ints, The result was (lower is better):
Average timings for 100 runs:
intersection_of: 8.16 ms
sortAndIntersect: 18.38 ms
Analysis of using the std::set_intersection method:
Sorting List 1 of size N is: O(Nlog(N))
Sorting List 2 of size M is: O(Mlog(M))
Finding the Intersection is: O(M + N)
Total: O( Nlog(N) + Mlog(M) + M + N) ...(generalized as logarithmic)
Assuming M and N are equal, we can generalize it as: O(Nlog(N))
But if we use the intersection_of method I posted above:
Iterating through List 1 of size N and adding to the set is: O(N) + O(1) = O(N)
Iterating through List 2 of size M, checking the multiset, adding to out, removing from List 2 : O(M) + O(1) + O(1) + O(1) = O(M)
Total: O(M + N) ...(generalized as linear)
Assuming M and N are equal, we can generalize it as: O(N)

Related

There is a given element say N. How to modify Binary Search to find greatest element in a sorted vector which smaller than N

For example:
Let us have a sorted vector with elements: [1, 3, 4, 6, 7, 10, 11, 13]
And we have an element N = 5
I want output as:
4
Since 4 is the greatest element smaller than N.
I want to modify Binary Search to get the answer
What would you want to happen if there is an element that equals N in the vector?
I would use std::lower_bound (or std::upper_bound depending on the answer to the above question). It runs in logarithmic time which means it's probably using binary search under the hood.
std::optional<int> find_first_less_than(int n, std::vector<int> data) {
// things must be sorted before processing
std::sort(data.begin(), data.end());
auto it = std::lower_bound(data.begin(), data.end(), n);
// if all of the elements are above N, we'll return nullopt
if (it == data.begin()) return std::nullopt;
return *std::prev(it);
}

What is the most efficient way of copying elements that occur only once in a std vector?

I have a std vector with elements like this:
[0 , 1 , 2 , 0 , 2 , 1 , 0 , 0 , 188 , 220 , 0 , 1 , 2 ]
What is the most efficient way to find and copy the elements that occur only once in this vector, excluding the brute force O(n^2) algorithm? In this case the new list should contain [188, 220]
Make an unordered_map<DataType, Count> count;
Iterate over the input vector increasing count of each value. Sort of count[value]++;
Iterate over the count map copying keys for which value is 1.
It's O(n). You have hashes so for small data sets normal map might be more efficient, but technically it would be O(n log n).
It's a good method for discrete data sets.
Code example:
#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
vector<int> v{1,1,2,3,3,4};
unordered_map<int,int> count;
for (const auto& e : v) count[e]++;
vector<int> once;
for (const auto& e : count) if(e.second == 1) once.push_back(e.first);
for (const auto& e : once) cout << e << '\n';
return 0;
}
I have tried few ideas. But I don't see a way around map. unordered_multiset is almost a great way... except it does not allow you to iterate over keys. It has a method to check for count of key, but you would need another set just for keys to probe. I don't see it as a simpler way. In modern c++ with autos counting is easy. I've also looked through algorithm library, but I haven't found any transfrom, copy_if, generate, etc. which could conditionally transform an element (map entry -> value if count is 1).
There are very few universally-optimal algorithms. Which algorithm works best usually depends upon the properties of the data that's being processed. Removing duplicates is one such example.
Is v small and filled mostly with unique values?
auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
hi = std::mismatch(lo + 1, v.end(), lo).first;
lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}
Is v small and filled mostly with duplicates?
auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
hi = std::upper_bound(lo + 1, v.end(), *lo);
lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}
Is v gigantic?
std::unordered_map<int, bool> keyUniqueness{};
keyUniqueness.reserve(v.size());
for (int key : v) {
bool wasMissing = keyUniqueness.find(key) == keyUniqueness.end();
keyUniqueness[key] = wasMissing;
}
v.clear();
for (const auto& element : keyUniqueness) {
if (element.second) { v.push_back(element.first); }
}
And so on.
#luk32's answer is definitely the most time efficient way of solving this question. However, if you are short on memory and can't afford an unordered_map, there are other ways of doing it.
You can use std::sort() to sort the vector first. Then the non-duplicates can be found in one iteration. Overall complexity being O(nlogn).
If the question is slightly different, and you know there is only one non-duplicate element, you can use this code (code in Java). The conplexity here is O(n).
Since you use a std::vector, I presume you want to maximize all its benefits including reference locality. In order to do that, we need a bit of typing here. And I benchmarked the code below...
I have a linear O(n) algorithm here (effectively O(nlog(n))), its a bit like brian's answer, but I use OutputIterators instead of doing it in-place. The pre-condition is that it's sorted.
template<typename InputIterator, typename OutputIterator>
OutputIterator single_unique_copy(InputIterator first, InputIterator last, OutputIterator result){
auto previous = first;
if(previous == last || ++first == last) return result;
while(true){
if(*first == *previous)
while((++first != last) && (*first == *previous));
else
*(result++) = *previous;
if(first == last) break;
previous = first;
++first;
}
return ++result;
}
And here is a sample usage:
int main(){
std::vector<int> vm = {0, 1, 2, 0, 2, 1, 0, 0, 1, 88, 220, 0, 1, 2, 227, -8};
std::vector<int> kk;
std::sort(vm.begin(), vm.end());
single_unique_copy(vm.begin(), vm.end(), std::back_inserter(kk));
for(auto x : kk) std::cout << x << ' ';
return 0;
}
As expected, the output is:
-8, 88, 220, 227
Your use case may be different from mine, so, profile first... :-)
EDIT:
Using luk32's algorithm and mine... Using 13 million elements...
created in descending order, repeated at every i % 5.
Under debug build, luk32: 9.34seconds and mine: 7.80seconds
Under -O3, luk32: 2.71seconds and mine 0.52seconds
Mingw5.1 64bit, Windows10, 1.73Ghz Core i5 4210U, 6GB DDR3 1600Mhz RAM
Benchmark here, http://coliru.stacked-crooked.com/a/187e5e3841439742
For smaller numbers, the difference still holds, until it becomes a non-critical code

More efficient way of counting the number of values within an interval?

I want to determine how many numbers of an input-array (up to 50000) lie in each of my given intervals (many).
Currently, I'm trying to do it with this algorithm, but it is far too slow:
Example-array: {-3, 10, 5, 4, -999999, 999999, 6000}
Example-interval: [0, 11] (inclusive)
Sort array - O(n * log(n)). (-999999, -3, 4, 5, 10, 6000, 999999)
Find min_index: array[min_index] >= 0 - O(n). (for my example, min_index == 2).
Find max_index: array[max_index] <= 11 - O(n). (for my example, max_index == 4).
If both indexes exists, then Result == right_index - left_index + 1 (for my example, Result = (4 - 2 + 1) = 3).
You have good idea, but it needs amendments. You should find begin and end of interval in O(lg n) time using binary search. If n is length of array and q is number of questions [a, b] you have O(n+q*n) time, with binary search it's O((n + q) lg n) (n lg n from sorting array).
The advantage of this solution is simplicity, because C++ have std::lower_bound and std::upper_bound. And you can use std::distance. It's just a few lines of code.
If q is equal to n, this algorithm has O(n lg n) complexity. Could be better? Not at all. Why? Because the problem is equivalent to sorting. As is well known, it is impossible to obtain a better computational complexity. (Sorting by means of comparison.)
There's a simple O(ninput*mintervals) algorithm:
For ease of implementation, we use half-open intervals. Convert yours as needed.
Convert your intervals to half-open intervals (Always prefer half-open intervals)
Save all limits in an array.
For all elements in the input
For all elements in the limits-array
Increment the count if the input is smaller than the limit
Go through your intervals and get the answers by subtracting the counts for the corresponding limits.
For a slight performance-boost, sort the limits-array in step 2.
Create a std::map of your numbers to their index in the sorted array.
From your example map[-999999] = 0, map[-3] = 1, ... map[999999] = 7.
To find an interval, find the lowest number higher than or equal to the min (using map.lower_bound()), and find the first number higher than the max (using map.upper_bound()).
You can now subtract the lower index from the upper index to find the number of elements in that range in O(log n).
typedef std::pair<int,int> interval;
typedef std::map<interval,size_t> answers;
typedef std::vector<interval> questions;
// O((m+n)lg m)
answers solve( std::vector<int>& data, questions const& qs ){
// m = qs.size()
// n = data.size()
answers retval;
std::vector<std::pair<int, size_t>> edges;
edges.reserve( q.size()+1 );
// O(m) -- all start and ends of intervals is in edges
for ( auto q:qs ) {
edges.emplace_back( q.first, 0 );
edges.emplace_back( q.second, 0 );
}
// O(mlgm) -- sort
std::sort(begin(edges),end(edges));
edges.emplace_back( std::numeric_limits<int>::max(), 0 );
// O(m) -- remove duplicates
edges.erase(std::unique(begin(edges),end(edges)),end(edges));
// O(n lg m) -- count the number of elements < a given edge:
for(int x:data ){
auto it = std::lower_bound( begin(edges), end(edges), std::make_pair(x,0) );
it->second++;
}
// O(m)
size_t accum = 0;
for(auto& e:edges) {
accum += edges.second;
edges.second = accum;
}
// now edge (x,y) states that there are y elements < x.
// O(n lg m) -- find the edge corresponding
for(auto q:questions){
auto low = std::lower_bound(begin(edges), end(edges),
std::make_pair(q.first, size_t(0))
);
auto high = std::upper_bound(begin(edges), end(edges),
std::make_pair(q.second, size_t(0))
}
size_t total = high->second - low->second;
answers.emplace(q,total);
}
return answers;
}
O((n+m)lg m), where n is the integer count, m is the number of intervals, and x is the average number of intervals each interval overlaps with.

Swap the elements of two sequences, such that the difference of the element-sums gets minimal.

An interview question:
Given two non-ordered integer sequences a and b, their size is n, all
numbers are randomly chosen: Exchange the elements of a and b, such that the sum of the elements of a minus the sum of the elements of b is minimal.
Given the example:
a = [ 5 1 3 ]
b = [ 2 4 9 ]
The result is (1 + 2 + 3) - (4 + 5 + 9) = -12.
My algorithm: Sort them together and then put the first smallest n ints in a and left in b. It is O(n lg n) in time and O(n) in space. I do not know how to improve it to an algorithm with O(n) in time and O(1) in space. O(1) means that we do not need more extra space except seq 1 and 2 themselves.
Any ideas ?
An alternative question would be: What if we need to minimize the absolute value of the differences (minimize |sum(a) - sum(b)|)?
A python or C++ thinking is preferred.
Revised solution:
Merge both lists x = merge(a,b).
Calculate median of x (complexity O(n) See http://en.wikipedia.org/wiki/Selection_algorithm )
Using this median swap elements between a and b. That is, find an element in a that is less than median, find one in b that is more than median and swap them
Final complexity: O(n)
Minimizing absolute difference is NP complete since it is equivalent to the knapsack problem.
What comes into my mind is following algorithm outline:
C = A v B
Partitially sort #A (number of A) Elements of C
Subtract the sum of the last #B Elements from C from the sum of the first #A Elements from C.
You should notice, that you don't need to sort all elements, it is enough to find the number of A smallest elements. Your example given:
C = {5, 1, 3, 2, 4, 9}
C = {1, 2, 3, 5, 4, 9}
(1 + 2 + 3) - (5 + 4 + 9) = -12
A C++ solution:
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
// Initialize 'a' and 'b'
int ai[] = { 5, 1, 3 };
int bi[] = { 2, 4, 9 };
std::vector<int> a(ai, ai + 3);
std::vector<int> b(bi, bi + 3);
// 'c' = 'a' merged with 'b'
std::vector<int> c;
c.insert(c.end(), a.begin(), a.end());
c.insert(c.end(), b.begin(), b.end());
// partitially sort #a elements of 'c'
std::partial_sort(c.begin(), c.begin() + a.size(), c.end());
// build the difference
int result = 0;
for (auto cit = c.begin(); cit != c.end(); ++cit)
result += (cit < c.begin() + a.size()) ? (*cit) : -(*cit);
// print result (and it's -12)
std::cout << result << std::endl;
}

Algorithm for finding the number which appears the most in a row - C++

I need a help in making an algorithm for solving one problem: There is a row with numbers which appear different times in the row, and i need to find the number that appears the most and how many times it's in the row, ex:
1-1-5-1-3-7-2-1-8-9-1-2
That would be 1 and it appears 5 times.
The algorithm should be fast (that's my problem).
Any ideas ?
What you're looking for is called the mode. You can sort the array, then look for the longest repeating sequence.
You could keep hash table and store a count of every element in that structure, like this
h[1] = 5
h[5] = 1
...
You can't get it any faster than in linear time, as you need to at least look at each number once.
If you know that the numbers are in a certain range, you can use an additional array to sum up the occurrences of each number, otherwise you'd need a hashtable, which is slightly slower.
Both of these need additional space though and you need to loop through the counts again in the end to get the result.
Unless you really have a huge amount of numbers and absolutely require O(n) runtime, you could simply sort your array of numbers. Then you can walk once through the numbers and simply keep the count of the current number and the number with the maximum of occurences in two variables. So you save yourself a lot of space, tradeing it off with a little bit of time.
There is an algorithm that solves your problem in linear time (linear in the number of items in the input). The idea is to use a hash table to associate to each value in the input a count indicating the number of times that value has been seen. You will have to profile against your expected input and see if this meets your needs.
Please note that this uses O(n) extra space. If this is not acceptable, you might want to consider sorting the input as others have proposed. That solution will be O(n log n) in time and O(1) in space.
Here's an implementation in C++ using std::tr1::unordered_map:
#include <iostream>
#include <unordered_map>
using namespace std;
using namespace std::tr1;
typedef std::tr1::unordered_map<int, int> map;
int main() {
map m;
int a[12] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
for(int i = 0; i < 12; i++) {
int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
m.insert(map::value_type(key, 1));
}
else {
it->second++;
}
}
int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
if(it->second > count) {
count = it->second;
value = it->first;
}
}
cout << "Value: " << value << endl;
cout << "Count: " << count << endl;
}
The algorithm works using the input integers as keys in a hashtable to a count of the number of times each integer appears. Thus the key (pun intended) to the algorithm is building this hash table:
int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
m.insert(map::value_type(key, 1));
}
else {
it->second++;
}
So here we are looking at the ith element in our input list. Then what we do is we look to see if we've already seen it. If we haven't, we add a new value to our hash table containing this new integer, and an initial count of one indicating this is our first time seeing it. Otherwise, we increment the counter associated to this value.
Once we have built this table, it's simply a matter of running through the values to find one that appears the most:
int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
if(it->second > count) {
count = it->second;
value = it->first;
}
}
Currently there is no logic to handle the case of two distinct values appearing the same number of times and that number of times being the largest amongst all the values. You can handle that yourself depending on your needs.
Here is a simple one, that is O(n log n):
Sort the vector # O(n log n)
Create vars: int MOST, VAL, CURRENT
for ELEMENT in LIST:
CURRENT += 1
if CURRENT >= MOST:
MOST = CURRENT
VAL = ELEMENT
return (VAL, MOST)
There are few methods:
Universal method is "sort it and find longest subsequence" which is O(nlog n). The fastest sort algorithm is quicksort ( average, the worst is O( n^2 ) ). Also you can use heapsort but it is quite slower in average case but asymptotic complexity is O( n log n ) also in the worst case.
If you have some information about numbers then you can use some tricks. If numbers are from the limited range then you can use part of algorithm for counting sort. It is O( n ).
If this isn't your case, there are some other sort algorithms which can do it in linear time but no one is universal.
The best time complexity you can get here is O(n). You have to look through all elements, because the last element may be the one which determines the mode.
The solution depends on whether time or space is more important.
If space is more important, then you can sort the list then find the longest sequence of consecutive elements.
If time is more important, you can iterate through the list, keeping a count of the number of occurences of each element (e.g. hashing element -> count). While doing this, keep track of the element with max count, switching if necessary.
If you also happen know that the mode is the majority element (i.e. there are more than n/2 elements in the array with this value), then you can get O(n) speed and O(1) space efficiency.
Generic C++ solution:
#include <algorithm>
#include <iterator>
#include <map>
#include <utility>
template<class T, class U>
struct less_second
{
bool operator()(const std::pair<T, U>& x, const std::pair<T, U>& y)
{
return x.second < y.second;
}
};
template<class Iterator>
std::pair<typename std::iterator_traits<Iterator>::value_type, int>
most_frequent(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type vt;
std::map<vt, int> frequency;
for (; begin != end; ++begin) ++frequency[*begin];
return *std::max_element(frequency.begin(), frequency.end(),
less_second<vt, int>());
}
#include <iostream>
int main()
{
int array[] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
std::pair<int, int> result = most_frequent(array, array + 12);
std::cout << result.first << " appears " << result.second << " times.\n";
}
Haskell solution:
import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)
count = foldl step Map.empty where
step frequency x = Map.alter next x frequency
next Nothing = Just 1
next (Just n) = Just (n+1)
most_frequent = maximumBy (compare `on` snd) . Map.toList . count
example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]
Shorter Haskell solution, with help from stack overflow:
import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)
most_frequent = maximumBy (compare `on` snd) . Map.toList .
Map.fromListWith (+) . flip zip (repeat 1)
example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]
The solution below gives you the count of each number. It is a better approach than using map in terms of time and space. If you need to get the number that appeared most number of times, then this is not better than previous ones.
EDIT: This approach is useful for unsigned numbers only and the numbers starting from 1.
std::string row = "1,1,5,1,3,7,2,1,8,9,1,2";
const unsigned size = row.size();
int* arr = new int[size];
memset(arr, 0, size*sizeof(int));
for (int i = 0; i < size; i++)
{
if (row[i] != ',')
{
int val = row[i] - '0';
arr[val - 1]++;
}
}
for (int i = 0; i < size; i++)
std::cout << i + 1 << "-->" << arr[i] << std::endl;
Since this is homework I think it's OK to supply a solution in a different language.
In Smalltalk something like the following would be a good starting point:
SequenceableCollection>>mode
| aBag maxCount mode |
aBag := Bag new
addAll: self;
yourself.
aBag valuesAndCountsDo: [ :val :count |
(maxCount isNil or: [ count > maxCount ])
ifTrue: [ mode := val.
maxCount := count ]].
^mode
As time is going by, the language evolves.
We have now many more language constructs that make life simpler
namespace aliases
CTAD (Class Template Argument Deduction)
more modern containers like std::unordered_map
range based for loops
the std::ranges library
projections
using statment
structured bindings
more modern algorithms
We could now come up with the following code:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
namespace rng = std::ranges;
int main() {
// Demo data
std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
// Count values
using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;
Counter counter{}; for (const auto& d : data) counter[d]++;
// Get max
const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);
// Show output
std::cout << '\n' << value << " found " << count << " times\n";
}