Related
I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:
// Check for duplicate numbers in user inputted data
int i; // Need to declare i here so that it can be accessed by the 'inner' loop that starts on line 21
for(i = 0;i < 6; i++) { // Check each other number in the array
for(int j = i; j < 6; j++) { // Check the rest of the numbers
if(j != i) { // Makes sure don't check number against itself
if(userNumbers[i] == userNumbers[j]) {
b = true;
}
}
if(b == true) { // If there is a duplicate, change that particular number
cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
cin >> userNumbers[i];
}
} // Comparison loop
b = false; // Reset the boolean after each number entered has been checked
} // Main check loop
It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.
You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.
std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
if (userNumbers[i] == userNumbers[i + 1]) {
userNumbers.erase(userNumbers.begin() + i);
i--;
}
}
I also second the reccomendation to use a std::set - no duplicates there.
The following solution is based on sorting the numbers and then removing the duplicates:
#include <algorithm>
int main()
{
int userNumbers[6];
// ...
int* end = userNumbers + 6;
std::sort(userNumbers, end);
bool containsDuplicates = (std::unique(userNumbers, end) != end);
}
Indeed, the fastest and as far I can see most elegant method is as advised above:
std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);
It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:
std::set<int> tTmp;
std::vector<int>::iterator tNewEnd =
std::remove_if(tUserNumbers.begin(), tUserNumbers.end(),
[&tTmp] (int pNumber) -> bool {
return (!tTmp.insert(pNumber).second);
});
tUserNumbers.erase(tNewEnd, tUserNumbers.end());
which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.
Cheers,
Paul
It is in extension to the answer by #Puppy, which is the current best answer.
PS : I tried to insert this post as comment in the current best answer by #Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.
Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.
bool hasDuplicate(vector<int>& nums) {
size_t count = nums.size();
if (!count)
return false;
std::unordered_map<int, int> tbl;
//std::set<int> tbl;
for (size_t i = 0; i < count; i++) {
if (tbl.find(nums[i]) != tbl.end())
return true;
tbl[nums[i]] = 1;
//tbl.insert(nums[i]);
}
return false;
}
unordered_map Performance (Run time was 52 ms here)
Set/Map Performance
You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.
I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)
/*
============================
Foo
============================
*
Takes in a read only unsigned int. A table is created to store counters
for each digit. If any digit's counter is flipped higher than 1, function
returns. For example, with 48778584:
0 1 2 3 4 5 6 7 8 9
[0] [0] [0] [0] [2] [1] [0] [2] [2] [0]
When we iterate over this array, we find that 4 is duplicated and immediately
return false.
*/
bool Foo(int number)
{
int temp = number;
int digitTable[10]={0};
while(temp > 0)
{
digitTable[temp % 10]++; // Last digit's respective index.
temp /= 10; // Move to next digit
}
for (int i=0; i < 10; i++)
{
if (digitTable [i] > 1)
{
return false;
}
}
return true;
}
It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.
Some small corrections for your code:
Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
You should't need to declare i variable outside the for statement.
I think #Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)
template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::unordered_set<T> values(begin, end);
std::size_t size = std::distance(begin,end);
return size != values.size();
}
//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());
//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();
//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));
#include<iostream>
#include<algorithm>
int main(){
int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
int len = sizeof(arr) / sizeof(*arr); // Finding length of array
std::sort(arr, arr+len);
int unique_elements = std::unique(arr, arr+len) - arr;
if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
else std::cout << "Duplicate number present in this array\n";
return 0;
}
As mentioned by #underscore_d, an elegant and efficient solution would be,
#include <algorithm>
#include <vector>
template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::vector<T> values(begin, end);
std::sort(values.begin(), values.end());
return (std::adjacent_find(values.begin(), values.end()) != values.end());
}
int main() {
int user_ids[6];
// ...
std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}
fast O(N) time and space solution
return first when it hits duplicate
template <typename T>
bool containsDuplicate(vector<T>& items) {
return any_of(items.begin(), items.end(), [s = unordered_set<T>{}](const auto& item) mutable {
return !s.insert(item).second;
});
}
Not enough karma to post a comment. Hence a post.
vector <int> numArray = { 1,2,1,4,5 };
unordered_map<int, bool> hasDuplicate;
bool flag = false;
for (auto i : numArray)
{
if (hasDuplicate[i])
{
flag = true;
break;
}
else
hasDuplicate[i] = true;
}
(flag)?(cout << "Duplicate"):("No duplicate");
Simplified question with a working example: I want to reuse a std::unordered_map (let's call it umap) multiple times, similar to the following dummy code (which does not do anything meaningful). How can I make this code run faster?
#include <iostream>
#include <unordered_map>
#include <time.h>
unsigned size = 1000000;
void foo(){
std::unordered_map<int, double> umap;
umap.reserve(size);
for (int i = 0; i < size; i++) {
// in my real program: umap gets filled with meaningful data here
umap.emplace(i, i * 0.1);
}
// ... some code here which does something meaningful with umap
}
int main() {
clock_t t = clock();
for(int i = 0; i < 50; i++){
foo();
}
t = clock() - t;
printf ("%f s\n",((float)t)/CLOCKS_PER_SEC);
return 0;
}
In my original code, I want to store matrix entries in umap. In each call to foo, the key values start from 0 up to N, and N can be different in each call to foo, but there is an upper limit of 10M for indices. Also, values can be different (contrary to the dummy code here which is always i*0.1).
I tried to make umap a non-local variable, for avoiding the repeated memory allocation of umap.reserve() in each call. This requires to call umap.clear() at the end of foo, but that turned out to be actually slower than using a local variable (I measured it).
I don't think there is any good way to accomplish what you're looking for directly -- i.e. you can't clear the map without clearing the map. I suppose you could allocate a number of maps up-front, and just use each one of them a single time as a "disposable map", and then go on to use the next map during your next call, but I doubt this would give you any overall speedup, since at the end of it all you'd have to clear all of them at once, and in any case it would be very RAM-intensive and cache-unfriendly (in modern CPUs, RAM access is very often the performance bottleneck, and therefore minimizing the number cache misses is the way to maximize effiency).
My suggestion would be that if clear-speed is so critical, you may need to move away from using unordered_map entirely, and instead use something simpler like a std::vector -- in that case you can simply keep a number-of-valid-items-in-the-vector integer, and "clearing" the vector is a matter of just setting the count back to zero. (Of course, that means you sacrifice unordered_map's quick-lookup properties, but perhaps you don't need them at this stage of your computation?)
A simple and effective way is reusing same container and memory again and again with pass-by-reference as follows.
In this method, you can avoid their recursive memory allocation std::unordered_map::reserve and std::unordered_map::~unordered_map which both have the complexity O(num. of elemenrs):
void foo(std::unordered_map<int, double>& umap)
{
std::size_t N = ...// set N here
for (int i = 0; i < N; ++i)
{
// overwrite umap[0], ..., umap[N-1]
// If umap does not have key=i, then it is inserted.
umap[i] = i*0.1;
}
// do something and not access to umap[N], ..., umap[size-1] !
}
The caller side would be as follows:
std::unordered_map<int,double> umap;
umap.reserve(size);
for(int i=0; i<50; ++i){
foo(umap);
}
But since your key set is always continuous integers {1,2,...,N}, I think that std::vector which enables you to avoid hash calculations would be more preferable to save values umap[0], ..., umap[N]:
void foo(std::vector<double>& vec)
{
int N = ...// set N here
for(int i = 0; i<N; ++i)
{
// overwrite vec[0], ..., vec[N-1]
vec[i] = i*0.1;
}
// do something and not access to vec[N], ..., vec[size-1] !
}
Have you tried to avoid all memory allocation by using a simple array? You've said above that you know the maximum size of umap over all calls to foo():
#include <iostream>
#include <unordered_map>
#include <time.h>
constexpr int size = 1000000;
double af[size];
void foo(int N) {
// assert(N<=size);
for (int i = 0; i < N; i++) {
af[i] = i;
}
// ... af
}
int main() {
clock_t t = clock();
for(int i = 0; i < 50; i++){
foo(size /* or some other N<=size */);
}
t = clock() - t;
printf ("%f s\n",((float)t)/CLOCKS_PER_SEC);
return 0;
}
As I suggested in the comments, closed hashing would be better for your use case. Here's a quick&dirty closed hash map with a fixed hashtable size you could experiment with:
template<class Key, class T, size_t size = 1000003, class Hash = std::hash<Key>>
class closed_hash_map {
typedef std::pair<const Key, T> value_type;
typedef typename std::vector<value_type>::iterator iterator;
std::array<int, size> hashtable;
std::vector<value_type> data;
public:
iterator begin() { return data.begin(); }
iterator end() { return data.end(); }
iterator find(const Key &k) {
size_t h = Hash()(k) % size;
while (hashtable[h]) {
if (data[hashtable[h]-1].first == k)
return data.begin() + (hashtable[h] - 1);
if (++h == size) h = 0; }
return data.end(); }
std::pair<iterator, bool> insert(const value_type& obj) {
size_t h = Hash()(obj.first) % size;
while (hashtable[h]) {
if (data[hashtable[h]-1].first == obj.first)
return std::make_pair(data.begin() + (hashtable[h] - 1), false);
if (++h == size) h = 0; }
data.emplace_back(obj);
hashtable[h] = data.size();
return std::make_pair(data.end() - 1, true); }
void clear() {
data.clear();
hashtable.fill(0); }
};
It can be made more flexible by dynamically resizing the hashtable on demand when appropriate, and more efficient by using robin-hood replacment.
suppose i have two vector
std::vector<int>vec_int = {4,3,2,1,5};
std::vector<Obj*>vec_obj = {obj1,obj2,obj3,obj4,obj5};
How do we sort vec_obj in regard of sorted vec_int position?
So the goal may look like this:
std::vector<int>vec_int = {1,2,3,4,5};
std::vector<Obj*>vec_obj = {obj4,obj3,obj2,obj1,obj5};
I've been trying create new vec_array:
for (int i = 0; i < vec_int.size(); i++) {
new_vec.push_back(vec_obj[vec_int[i]]);
}
But i think it's not the correct solution. How do we do this? thanks
std library may be the best solution,but i can't find the correct solution to implement std::sort
You don't have to call std::sort, what you need can be done in linear time (provided the indices are from 1 to N and not repeating)
std::vector<Obj*> new_vec(vec_obj.size());
for (size_t i = 0; i < vec_int.size(); ++i) {
new_vec[i] = vec_obj[vec_int[i] - 1];
}
But of course for this solution you need the additional new_vec vector.
If the indices are arbitrary and/or you don't want to allocate another vector, you have to use a different data structure:
typedef pair<int, Obj*> Item;
vector<Item> vec = {{4, obj1}, {3, obj2}, {2, obj3}, {1, obj4}, {5, obj5}};
std::sort(vec.begin(), vec.end(), [](const Item& l, const Item& r) -> bool {return l.first < r.first;});
Maybe there is a better solution, but personally I would use the fact that items in a std::map are automatically sorted by key. This gives the following possibility (untested!)
// The vectors have to be the same size for this to work!
if( vec_int.size() != vec_obj.size() ) { return 0; }
std::vector<int>::const_iterator intIt = vec_int.cbegin();
std::vector<Obj*>::const_iterator objIt = vec_obj.cbegin();
// Create a temporary map
std::map< int, Obj* > sorted_objects;
for(; intIt != vec_int.cend(); ++intIt, ++objIt )
{
sorted_objects[ *intIt ] = *objIt;
}
// Iterating through map will be in order of key
// so this adds the items to the vector in the desired order.
std::vector<Obj*> vec_obj_sorted;
for( std::map< int, Obj* >::const_iterator sortedIt = sorted_objects.cbegin();
sortedIt != sorted_objects.cend(); ++sortedIt )
{
vec_obj_sorted.push_back( sortedIt->second );
}
[Not sure this fits your usecase, but putting the elements into a map will store the elements sorted by key by default.]
Coming to your precise solution if creation of the new vector is the issue you can avoid this using a simple swap trick (like selection sort)
//Place ith element in its place, while swapping to its position the current element.
for (int i = 0; i < vec_int.size(); i++) {
if (vec_obj[i] != vec_obj[vec_int[i])
swap_elements(i,vec_obj[i],vec_obj[vec_int[i]])
}
The generic form of this is known as "reorder according to", which is a variation of cycle sort. Unlike your example, the index vector needs to have the values 0 through size-1, instead of {4,3,2,1,5} it would need to be {3,2,1,0,4} (or else you have to adjust the example code below). The reordering is done by rotating groups of elements according to the "cycles" in the index vector or array. (In my adjusted example there are 3 "cycles", 1st cycle: index[0] = 3, index[3] = 0. 2nd cycle: index[1] = 2, index[2] = 1. 3rd cycle index[4] = 4). The index vector or array is also sorted in the process. A copy of the original index vector or array can be saved if you want to keep the original index vector or array. Example code for reordering vA according to vI in template form:
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vI)
{
size_t i, j, k;
T t;
for(i = 0; i < vA.size(); i++){
if(i != vI[i]){
t = vA[i];
k = i;
while(i != (j = vI[k])){
// every move places a value in it's final location
vA[k] = vA[j];
vI[k] = k;
k = j;
}
vA[k] = t;
vI[k] = k;
}
}
}
Simple still would be to copy vA to another vector vB according to vI:
for(i = 0; i < vA.size(); i++){
vB[i] = vA[vI[i]];
I have two vectors: a vector and index vector. How can I make the vector be arranged by the indexes vector? Like:
Indexes 5 0 2 1 3 4
Values a b c d e f
Values after operation b d c e f a
The indexes vector will always contain the range [0, n) and each index only once.
I need this operation to be done in place because the code is going to be run on a device with low memory.
How can I do this in c++? I can use c++11
Since you know that your index array is a permutation of [0, N), you can do this in linear time and in-place (plus one temporary) by working cycle-by-cycle. Something like this:
size_t indices[N];
data_t values[N];
for (size_t pos = 0; pos < N; ++pos) // \
{ // } this loops _over_ cycles
if (indices[pos] == pos) continue; // /
size_t i = pos;
const data_t tmp = values[pos];
while (true) // --> this loops _through_ one cycle
{
const size_t next = indices[i];
indices[i] = i;
values[i] = values[next];
if (next == pos) break;
i = next;
}
values[i] = tmp;
}
This implementation has the advantage over using swap each time that we only need to use the temporary variable once per cycle.
If the data type is move-only, this still works if all the assignments are surrounded by std::move().
std::vector<int> indices = { 5, 0, 2, 1, 3, 4};
std::vector<char> values = {'a', 'b', 'c', 'd', 'e', 'f'};
for(size_t n = 0; n < indices.size(); ++n)
{
while(indices[n] != n)
{
std::swap(values[n], values[indices[n]]);
std::swap(indices[n], indices[indices[n]]);
}
}
EDIT:
I think this should be O(n), anyone disagree?
for(int i=0;i<=indexes.size();++i)
for(int j=i+1;j<=indexes.size();++j)
if(indexes[i] > indexes[j] )
swap(indexes[i],indexes[j]),
swap(values[i],values[j]);
It's O(N²) complexity, but should work fine on small number values.
You can also pass a comparison function to the C++ STL sort function if you want O(N*logN)
You can just sort the vector, your comparison operation should compare the indices. Of course, when moving the data around, you have to move the indices, too.
At the end, your indices will be just 0, 1, ... (n-1), and the data will be at the corresponding places.
As implementation note: you can store the values and indices together in a structure:
struct SortEntry
{
Data value;
size_t index;
};
and define the comparison operator to look only at indices:
bool operator< (const SortEntry& lhs, const SortEntry& rhs)
{
return lhs.index < rhs.index;
}
This solution runs in O(n) time:
int tmp;
for(int i = 0; i < n; i++)
while(indexes[i] != i){
swap(values[i], values[indexes[i]]);
tmp = indexes[i];
swap(indexes[i], indexes[tmp]);
}
This will run in O(n) time without any error.Check it on ideone
int main(int argc, char *argv[])
{
int indexes[6]={2,3,5,1,0,4};
char values[6]={'a','b','c','d','e','f'};
int result[sizeof(indexes)/4]; //creating array of size indexes or values
int a,i;
for( i=0;i<(sizeof(indexes)/4);i++)
{
a=indexes[i]; //saving the index value at i of array indexes
result[a]=values[i]; //saving the result in result array
}
for ( i=0;i<(sizeof(indexes)/4);i++)
printf("%c",result[i]); //printing the result
system("PAUSE");
return 0;
}
Whats the best way to shuffle a certain percentage of elements in a vector.
Say I want 10% or 90% of the vector shuffled.
Not necessarily the first 10% but just 10% across the board.
TIA
Modify a Fisher-Yates shuffle to do nothing on 10% of the indices in the array.
This is java code that I'm posting (from Wikipedia) and modifying, but I think you can make the translation to C++, because this is more of an algorithms problem than a language problem.
public static void shuffleNinetyPercent(int[] array)
{
Random rng = new Random(); // java.util.Random.
int n = array.length; // The number of items left to shuffle (loop invariant).
while (n > 1)
{
n--; // n is now the last pertinent index
if (rng.nextDouble() < 0.1) continue; //<-- ADD THIS LINE
int k = rng.nextInt(n + 1); // 0 <= k <= n.
// Simple swap of variables
int tmp = array[k];
array[k] = array[n];
array[n] = tmp;
}
}
You could try this:
Assign a random number to each element of the vector. Shuffle the elements whose random number is in the smallest 10% of the random numbers you assigned: You could even imagine replacing that 10% in the vector with placeholders, then sort your 10% according to their random number, and insert them back into the vector where your placeholders are.
How about writing your own random iterator and using random_shuffle, something like this: (Completely untested, just to get an idea)
template<class T>
class myRandomIterator : public std::iterator<std::random_access_iterator_tag, T>
{
public:
myRandomIterator(std::vector<T>& vec, size_t pos = 0): myVec(vec), myIndex(0), myPos(pos)
{
srand(time(NULL));
}
bool operator==(const myRandomIterator& rhs) const
{
return myPos == rhs.myPos;
}
bool operator!=(const myRandomIterator& rhs) const
{
return ! (myPos == rhs.myPos);
}
bool operator<(const myRandomIterator& rhs) const
{
return myPos < rhs.myPos;
}
myRandomIterator& operator++()
{
++myPos;
return fill();
}
myRandomIterator& operator++(int)
{
++myPos;
return fill();
}
myRandomIterator& operator--()
{
--myPos;
return fill();
}
myRandomIterator& operator--(int)
{
--myPos;
return fill();
}
myRandomIterator& operator+(size_t n)
{
++myPos;
return fill();
}
myRandomIterator& operator-(size_t n)
{
--myPos;
return fill();
}
const T& operator*() const
{
return myVec[myIndex];
}
T& operator*()
{
return myVec[myIndex];
}
private:
myRandomIterator& fill()
{
myIndex = rand() % myVec.size();
return *this;
}
private:
size_t myIndex;
std::vector<T>& myVec;
size_t myPos;
};
int main()
{
std::vector<int> a;
for(int i = 0; i < 100; ++i)
{
a.push_back(i);
}
myRandomIterator<int> begin(a);
myRandomIterator<int> end(a, a.size() * 0.4);
std::random_shuffle(begin, end);
return 0;
}
one way may using , std::random_shuffle() , control % by controlling input range ....
Why not perform N swaps of randomly selected positions, where N is determined by the percentage?
So if I have 100 elements, a 10% shuffle will perform 10 swaps. Each swap randomly picks two elements in the array and switches them.
If you have SGI's std::random_sample extension, you can do this. If not, it's easy to implement random_sample on top of a function which returns uniformly-distributed random integers in a specified range (Knuth, Volume 2, "Algorithm R").
#include <algorithm>
#include <vector>
using std::vector;
void shuffle_fraction(vector<int> &data, double fraction) {
assert(fraction >= 0.0 && fraction <= 1.0);
// randomly choose the indices to be shuffled
vector<int> bag(data.size());
for(int i = 0; i < bag.size(); ++i) bag[i] = i;
vector<int> selected(static_cast<int>(data.size() * fraction));
std::random_sample(bag.begin(), bag.end(), selected.begin(), selected.end());
// take a copy of the values being shuffled
vector<int> old_value(selected.size());
for (int i = 0; i < selected.size(); ++i) {
old_value[i] = data[selected[i]];
}
// choose a new order for the selected indices
vector<int> shuffled(selected);
std::random_shuffle(shuffled.begin(), shuffled.end());
// apply the shuffle to the data: each of the selected indices
// is replaced by the value for the corresponding shuffled indices
for (int i = 0; i < selected.size(); ++i) {
data[selected[i]] = old_value[shuffled[i]];
}
}
Not the most efficient, since it uses three "small" vectors, but avoids having to adapt the Fisher-Yates algorithm to operate on a subset of the vector. In practice you'd probably want this to be a function template operating on a pair of random-access iterators rather than a vector. I haven't done that because I think it would obfuscate the code a little, and you didn't ask for it. I'd also take a size instead of a proportion, leaving it up to the caller to decide how to round fractions.
you can use the shuffle bag algorithm to select 10% of your array. Then use the normal shuffle on that selection.