Related
I am currently practicing for coding interviews and am working on a function that takes in an array and the size of that array and prints out which numbers in it are duplicates. I have gotten this to work using the two for loop method but want an optimized solution using sets. Snippet of the code I have is below,
#include <iostream>
#include <set>
using namespace std;
void FindDuplicate(int integers[], int n){
set<int>setInt;
for(int i = 0; i < n; i++){
//if this num is not in the set then it is not a duplicate
if(setInt.find(integers[i]) != setInt.end()){
setInt.insert({integers[i]});
}
else
cout << integers[i] << " is a duplicate";
}
}
int main() {
int integers [] = {1,2,2,3,3};
int n = sizeof(integers)/sizeof(integers[0]);
FindDuplicate(integers, n);
}
Any helpful advice is appreciated, thanks
I think your comparison is not needed, insert do it for you:
https://en.cppreference.com/w/cpp/container/set/insert
Returns a pair consisting of an iterator to the inserted element (or
to the element that prevented the insertion) and a bool value set to
true if the insertion took place.
Just insert element and check what insert function returns (false on second element of pair in case of duplication) :)
my solution proposal is :
count the frequencies of each element (algo for frequencies are explained here frequency
display elements with frequency more than 1 (it is a duplicate)
In each operation, you do not use imbricated loops.
#include <iostream>
#include <unordered_map>
using namespace std;
void FindDuplicate(int integers[], int n)
{
unordered_map<int, int> mp;
// Traverse through array elements and
// count frequencies
for (int i = 0; i < n; i++)
{
mp[integers[i]]++;
}
cout << "The repeating elements are : " << endl;
for (int i = 0; i < n; i++) {
if (mp[integers[i]] > 1)
{
cout << integers[i] << endl;
mp[integers[i]] = -1;
}
}
}
int main()
{
int integers [] = {1,1,0,0,2,2,3,3,3,6,7,7,8};
int n = sizeof(integers)/sizeof(integers[0]);
FindDuplicate(integers, n);
}
This is my feedback:
#include <iostream>
#include <vector>
#include <set>
// dont' do this, in big projects it's not done (nameclash issues)
// using namespace std;
// pass vector by const reference you're not supposed to change the input
// the reference will prevent data from being copied.
// naming is important, do you want to find one duplicate or more...
// renamed to FindDuplicates because you want them all
void FindDuplicates(const std::vector<int>& input)
{
std::set<int> my_set;
// don't use index based for loops if you don't have to
// range based for loops are more safe
// const auto is more refactorable then const int
for (const auto value : input)
{
//if (!my_set.contains(value)) C++ 20 syntax
if (my_set.find(value) == my_set.end())
{
my_set.insert(value);
}
else
{
std::cout << "Array has a duplicate value : " << value << "\n";
}
}
}
int main()
{
// int integers[] = { 1,2,2,3,3 }; avoid "C" style arrays they're a **** to pass around safely
// int n = sizeof(integers) / sizeof(integers[0]); std::vector (or std::array) have size() methods
std::vector input{ 1,2,2,3,3 };
FindDuplicates(input);
}
You do not need to use a set.
To find the duplicates:
Sort array with numbers
Iterate over the array (start with second element) and copy elements where previous element equals
current element into a new vector "duplicates"
(Optional) use unique on the "duplicates" if you like to know which number is a duplicate and do not care if it is 2, 3 or 4 times in the numbers array
Example Implementation:
#include <algorithm>
#include <iostream>
#include <vector>
void
printVector (std::vector<int> const &numbers)
{
for (auto const &number : numbers)
{
std::cout << number << ' ';
}
std::cout << std::endl;
}
int
main ()
{
auto numbers = std::vector<int>{ 1, 2, 2, 42, 42, 42, 3, 3, 42, 42, 1, 2, 3, 4, 5, 6, 7, 7 };
std::sort (numbers.begin (), numbers.end ());
auto duplicates = std::vector<int>{};
std::for_each (numbers.begin () + 1, numbers.end (), [prevElement = numbers.begin (), &duplicates] (int currentElement) mutable {
if (currentElement == *prevElement)
{
duplicates.push_back (currentElement);
}
prevElement++;
});
duplicates.erase (std::unique (duplicates.begin (), duplicates.end ()), duplicates.end ());
printVector (duplicates);
}
edit:
If you have no problem with using more memory and more calculations but like it more expressive:
Sort numbers
Create a new array with unique numbers "uniqueNumbers"
Use "set_difference" to calculate (numbers-uniqueNumbers) which leads to an new array with all the duplicates
(Optional) use unique on the "duplicates" if you like to know which number is a duplicate and do not care if it is 2, 3 or 4 times in the numbers array
Example Implementation:
#include <algorithm>
#include <iostream>
#include <vector>
void
printVector (std::vector<int> const &numbers)
{
for (auto const &number : numbers)
{
std::cout << number << ' ';
}
std::cout << std::endl;
}
int
main ()
{
auto numbers = std::vector<int>{ 2, 2, 42, 42, 42, 3, 3, 42, 42, 1, 2, 3, 4, 5, 6, 7, 7 };
std::sort (numbers.begin (), numbers.end ());
auto uniqueNumbers = std::vector<int>{};
std::unique_copy (numbers.begin (), numbers.end (), std::back_inserter (uniqueNumbers));
auto duplicates = std::vector<int>{};
std::set_difference (numbers.begin (), numbers.end (), uniqueNumbers.begin (), uniqueNumbers.end (), std::back_inserter (duplicates));
std::cout << "duplicate elements: ";
printVector (duplicates);
std::cout << "unique duplicate elements: ";
printVector ({ duplicates.begin (), std::unique (duplicates.begin (), duplicates.end ()) });
}
here's a quick solution use an array of size N (try a big number)
and whenever a number is added into the other array on the large array add 1 to the position like:
array_of_repeated[user_input]++;
so if the program asks how many times (for example) number 234 was repeated?
std::cout<<array_of_repeated[requested_number]<<std::endl;
so in this way you wont spend time looking for a number inside the other list
what is the default value for second element in map STL if i am initializing it with an array?
for example:
#include <bits/stdc++.h>
using namespace std;
void countFreq(int arr[], int n)
{
unordered_map<int, int> mp;
// Traverse through array elements and
// count frequencies
for (int i = 0; i < n; i++)
mp[arr[i]]++;
// Traverse through map and print frequencies
for (auto x : mp)
cout << x.first << " " << x.second << endl;
}
int main()
{
int arr[] = { 10, 20, 20, 10, 10, 20, 5, 20 };
int n = sizeof(arr) / sizeof(arr[0]);
countFreq(arr, n);
return 0;
}
How can this program return the frequency of the element in the array by accessing the second element of map mp?
what is the default value for the second element in map STL if I am initializing it with an array?
When accessing a key-value pair (kvp) in a std::map with operator[], either the key already exists, or a new kvp is constructed and the mapped_type is value-initialised. A value-initialized int is always 0. This imposes a requirement that it must be default constructible. Note that you can also access entries in a map using the at member function, which throws if the key is not found.
How can this program return the frequency of the element in the array by accessing the second element of map mp?
You have done this correctly in your code snippet. You could have used a std::multiset or std::unordered_multiset, they provide a count member function, that is the frequency of the key.
#include <set>
#include <iostream>
int main()
{
int arr[] = { 10, 20, 20, 10, 10, 20, 5, 20 };
std::multiset<int> freq (std::begin(arr), std::end(arr));
for(auto elem = freq.begin();
elem != freq.end();
elem=freq.upper_bound(*elem)) // Traverse the unique elements
{
std::cout << *elem << " count: " << freq.count(*elem) << "\n";
}
}
Godbolt
Note that your question mentions std::map but the example you provided references std::unordered_map, much of this applies to both data-structures.
Second element of map is, by default, initialized to 0(if its type is int as is in code) after trying to access its key at least once.So, when you access for the first time some element x, mp[x] becomes 0 and then in your code increases by 1 when counting.
Let's say I have a C-style array (int numbers[10]). I want to split the array into an array of odd numbers and an array of even numbers. Further, I'd like to use a predicate to determine if a number is odd.
Question: I am curious - are there STL functions that can do this?
The closest thing I can find is list::splice, but that's not for C-style arrays and doesn't take a predicate.
std::partition() would work.
Indeed, Example 1 on that page is separating even and odd numbers. It's doing it on a vector, but there's no reason it wouldn't work on native arrays.
Here's a quick example I worked up:
#include <algorithm>
#include <iostream>
int main()
{
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
auto mid = std::partition(std::begin(a), std::end(a),
[](int n){return n%2;});
std::cout << "Odd: " << std::endl;
for (auto p = std::begin(a); p < mid; ++p)
{
std::cout << *p << std::endl;
}
std::cout << "Even: " << std::endl;
for (auto p = mid; p < std::end(a); ++p)
{
std::cout << *p << std::endl;
}
}
Indeed you can: std::partition partitions a sequence according to a predicate.
auto begin = std::begin(array);
auto end = std::end(array);
auto part = std::partition(begin, end, [](int n){return n%2;});
Now [begin,part) contains the odd values (for which the predicate is true), and [part,end) contains the even values (for which the predicate is false).
This question already has answers here:
Checking for duplicates in a vector [duplicate]
(5 answers)
Closed 9 years ago.
I have a vector of int which can include maximum 4 elements and minimum 2, for example :
std::vector<int> vectorDATA(X); // x means unknown here
What I want to do is to erase the elements that are repeated for example :
vectorDATA{1,2,2} to vectorDATA{1,2}
vectorDATA{1,2,3} to nothing changes
vectorDATA{2,2,2} to vectorDATA{2}
vectorDATA{3,2,1,3} to vectorDATA{3,2,1}
vectorDATA{1,2,1,2} to vector{1,2}
and so on
here a code simple :
cv::HoughLines(canny,lineQ,1,CV_PI/180,200);
std::cout << " line Size "<<lineQ.size()<< std::endl;
std::vector<int> linesData(lineQ.size());
std::vector<int> ::iterator it;
if(lineQ.size() <=4 && lineQ.size() !=0 ){
if(lineQ.size()==1){
break;
}else {
for ( int i = 0; i<lineQ.size();i++){
linesData[i] = lineQ[i][1]; // my comparison parameter is the lineQ[i][1]
}
// based on the answer I got I'm trying this but I really don't how to continue ?
std::sort(lineQ.begin(),lineQ.end(),[](const cv::Vec2f &a,const cv::Vec2f &b)
{
return ????
}
I tried use a for and do while loop, but I didn't get it, and the function std::adjacent_find this has a condition that the elements should be consecutive.
Maybe it's easy but I just don't get it !
thanks for any help !
The easy way is sort then unique-erase, but this changes order.
The c++11 order preserving way is to create an unordered_set<int> s; and do:
unordered_set<int> s;
vec.erase(
std::remove_if( vec.begin(),vec.end(), // remove from vector
[&](int x)->bool{
return !std::get<1>(s.insert(x)); // true iff the item was already in the set
}
),
vec.end() // erase from the end of kept elements to the end of the `vec`
);
which is the remove-erase idiom using the unordered_set to detect duplicates.
I didn't see a sort-less source code in the already mentioned answers, so here it goes. Hash table for checking duplicates, shifting unique elements towards the front of the vector, note that src is always >= dst and dst is the number of copied, i.e. unique elements at the end.
#include <unordered_set>
#include <vector>
#include <iostream>
void
uniq (std::vector<int> &a) {
std::unordered_set<int> s;
size_t dst = 0;
for (size_t src = 0; src < a.size(); ++src) {
if (s.count (a[src]) == 0) {
s.insert (a[src]);
a[dst++] = a[src];
}
}
a.resize (dst);
}
int
main () {
std::vector<int> a = { 3, 2, 1, 3, 2, 1, 2, 3, 4, 5 ,2, 3, 1, 1 };
uniq (a);
for (auto v : a)
std::cout<< v << " ";
std::cout << std::endl;
}
If you want to realy remove repeated elements, you may try something like this:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main () {
int data[] = {1,2,3,2,1};
vector<int> vectorDATA = (&data[0], &data[0] + 5);
sort(vectorDATA.begin(),vectorDATA.end());
for(int i = 0; i < vectorDATA.size()-1; ++i)
{
if(vectorDATA[i] == vectorDATA[i+1])
vectorDATA.erase(vectorDATA.begin()+i+1);
}
for(int i = 0; i < vectorDATA.size();++i)
{
cout << vectorDATA[i] << " ";
}
cout << endl;
return 0;
}
Lack of of this method is then elements lost his order.
I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?
Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.
Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.
It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.
This should be an improved version of #hazelnusse which is executed in O(nlogk) instead of O(nlogn)
#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
int k = 5; // number of indices we need
for (int i = 0; i < test.size(); ++i) {
if(q.size()<k)
q.push(std::pair<double, int>(test[i], i));
else if(q.top().first < test[i]){
q.pop();
q.push(std::pair<double, int>(test[i], i));
}
}
k = q.size();
std::vector<int> res(k);
for (int i = 0; i < k; ++i) {
res[k - i - 1] = q.top().second;
q.pop();
}
for (int i = 0; i < k; ++i) {
std::cout<< res[i] <<std::endl;
}
}
8
4
1
2
6
Here is my implementation that does what I want and I think is reasonably efficient:
#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
std::priority_queue<std::pair<double, int>> q;
for (int i = 0; i < test.size(); ++i) {
q.push(std::pair<double, int>(test[i], i));
}
int k = 3; // number of indices we need
for (int i = 0; i < k; ++i) {
int ki = q.top().second;
std::cout << "index[" << i << "] = " << ki << std::endl;
q.pop();
}
}
which gives output:
index[0] = 3
index[1] = 1
index[2] = 0
The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.
Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).
You can use the basis of the quicksort algorithm to do what you need, except instead of reordering the partitions, you can get rid of the entries falling out of your desired range.
It's been referred to as "quick select" and here is a C++ implementation:
int partition(int* input, int p, int r)
{
int pivot = input[r];
while ( p < r )
{
while ( input[p] < pivot )
p++;
while ( input[r] > pivot )
r--;
if ( input[p] == input[r] )
p++;
else if ( p < r ) {
int tmp = input[p];
input[p] = input[r];
input[r] = tmp;
}
}
return r;
}
int quick_select(int* input, int p, int r, int k)
{
if ( p == r ) return input[p];
int j = partition(input, p, r);
int length = j - p + 1;
if ( length == k ) return input[j];
else if ( k < length ) return quick_select(input, p, j - 1, k);
else return quick_select(input, j + 1, r, k - length);
}
int main()
{
int A1[] = { 100, 400, 300, 500, 200 };
cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
int A2[] = { 100, 400, 300, 500, 200 };
cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
int A3[] = { 100, 400, 300, 500, 200 };
cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
int A4[] = { 100, 400, 300, 500, 200 };
cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
int A5[] = { 100, 400, 300, 500, 200 };
cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}
OUTPUT:
1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500
EDIT
That particular implementation has an O(n) average run time; due to the method of selection of pivot, it shares quicksort's worst-case run time. By optimizing the pivot choice, your worst case also becomes O(n).
The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both std::partition and std::nth_element are O(n)):
#include <iostream>
#include <algorithm>
#include <vector>
struct Pred {
Pred(int nth) : nth(nth) {};
bool operator()(int k) { return k >= nth; }
int nth;
};
int main() {
int n = 4;
std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};
// Moves the nth element to the nth from the end position.
std::nth_element(v.begin(), v.end() - n, v.end());
// Reorders the range, so that the first n elements would be >= nth.
std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));
for (auto it = v.begin(); it != v.end(); ++it)
std::cout << *it << " ";
std::cout << "\n";
return 0;
}
You can do this in O(n) time with a single order statistic calculation:
Let r be the k-th order statistic
Initialize two empty lists bigger and equal.
For each index i:
If array[i] > r, add i to bigger
If array[i] = r, add i to equal
Discard elements from equal until the sum of the lengths of the two lists is k
Return the concatenation of the two lists.
Naturally, you only need one list if all items are distinct. And if needed, you could do tricks to combine the two lists into one, although that would make the code more complicated.
Even though the following code might not fulfill the desired complexity constraints it might be an interesting alternative for the before-mentioned priority queue.
#include <queue>
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
std::vector<int> largestIndices(const std::vector<double>& values, int k) {
std::vector<int> ret;
std::vector<std::pair<double, int>> q;
int index = -1;
std::transform(values.begin(), values.end(), std::back_inserter(q), [&](double val) {return std::make_pair(val, ++index); });
auto functor = [](const std::pair<double, int>& a, const std::pair<double, int>& b) { return b.first > a.first; };
std::make_heap(q.begin(), q.end(), functor);
for (auto i = 0; i < k && i<values.size(); i++) {
std::pop_heap(q.begin(), q.end(), functor);
ret.push_back(q.back().second);
q.pop_back();
}
return ret;
}
int main()
{
std::vector<double> values = { 7,6,3,4,5,2,1,0 };
auto ret=largestIndices(values, 4);
std::copy(ret.begin(), ret.end(), std::ostream_iterator<int>(std::cout, "\n"));
}