How to shuffle list of pointers in C++? - c++

How to shuffle stl list of pointers in C++ ?
I have stl vector of pointers on class Player and I shuffle like
std::random_shuffle(players.begin(), players.end());
Is there already algorithm for shuffle list without requiring Random Access or I need to convert list to vector => shuffle => back to list ? Is there more elegant solution ?

The random shuffle algorithm swaps a specific element with a randomly chosen one. It would be very inefficient to repeatedly traverse the list to get the elements (namely it would be an O(n^2) operation).
That's why it would be better (faster) to copy your list to an array once, do a random shuffle and possibly restore the list. That would be 3*n traversals, which is still O(n).

std::random_shuffle needs a random Iterator. Vector supports this whereas a List does not. How about std::deque, its like a sort of Vector and sort of List.

Your problem was interesting. So, I tried writing something and finally came up with this.
//---------- sample List initialization ------
list<string> lst;
lst.push_back("A");
lst.push_back("B");
....
lst.push_back("Y");
lst.push_back("Z");
#define LIST_SIZE 26
//--------------------------------------------
//------------- Shuffle Algorithm ------------
unordered_multimap<int,string> mymap;
int HashKeys[LIST_SIZE];
srand((int)time(NULL) * (int)clock());
for(int i = 0; i<LIST_SIZE; i++) // loop 'n' times
{
HashKeys[i] = rand(); // O(c) operation
}
for(int i = 0;lst.size() > 0; i++) // loop 'n' times
{
// O(n) operation ( varies from O(c) to O(n) according to the situations )
mymap.insert(std::make_pair<int,string>(HashKeys[rand() % LIST_SIZE],lst.front()));
lst.pop_front(); // O(c) operation
}
unordered_multimap<int,string>::iterator it;
for(int i = 0; i < LIST_SIZE ;i++) // loop 'n' times
{
while(mymap.count(HashKeys[i]) > 0) // unpredictable
{
it = mymap.find(HashKeys[i]); // O(c) for single O(n) for multi
// ...USAGE...
cout << it->second << endl;
lst.push_back(it->second);
//............
mymap.erase(it); // O(c) operation
}
}
//-------------------------------------------------
Time Complexity is O(n^2) if there are multiple values for same key in the hash map.
Otherwise the time complexity is O(n).
so everything depends on the function (rand() % LIST_SIZE)

Related

Most efficient algorithm for Two-sum problem (involving indices)

The problem statement is given an array and a given sum "T", find all the pairs of indices of the elements in the array which add up to T. Additional requirements/constraints:
Indexing starts from 0
The indices must be displayed with lower index first (Ex: 24, 30 instead of 30, 24)
The indices must be displayed in ascending order (Ex: if we find (1,3), (0,2) and (5,8) the output must be (0,2) (1,3) (5,8)
There can be duplicate elements in the array, which also have to be considered
Here's my code in C++, I used the hash-table approach using unordered_set:
void Twosum(vector <int> res, int T){
int temp; int ti = -1;
unordered_set<int> s;
vector <int> res2 = res; //Just a copy of the input vector
vector <tuple<int, int>> indices; //Result to be output
for (int i = 0; i < (int)res.size(); i++){
temp = T - res[i];
if (s.find(temp) != s.end()){
while(ti < (int)res.size()){ //While loop for finding all the instances of temp in the array,
//not part of the original hash-table algorithm, something I added
ti = find(res2.begin(), res2.end(), temp) - res2.begin();
//Here find() takes O(n) time which is an issue
res2[ti] = lim; //To remove that instance of temp so that new instances
//can be found in the while loop, here lim = 10^9
if(i <= ti) indices.push_back(make_tuple(i, ti));
else indices.push_back(make_tuple(ti, i));
}
}
s.insert(res[i]);
}
if(ti == -1)
{cout<<"-1 -1"; //if no indices were found
return;}
sort(indices.begin(), indices.end()); //sorting since unordered_set stores elements randomly
for(int i=0; i<(int)indices.size(); i++)
cout<<get<0>(indices[i])<<" "<<get<1>(indices[i])<<endl;
}
This has multiple issues:
firstly that while loop doesn't work as intended, instead it shows SIGABRT error (free(): invalid pointer). The ti index is also somehow going beyond the vector bounds, even though I have that check in the while loop.
Secondly the find() function works in O(n) time, which increases the overall complexity to O(n^2), which is causing my program to timeout during execution. However that function is required since we have to output indices.
Lastly this unordered-set implementation doesn't seem to work when there are many duplicate elements in the array (since sets only take unique elements), which is one of the main constraints of the problem. This makes me think we need some sort of hash function or hashmap to deal with the duplicates? I'm not sure...
All the different algorithms I've found for this on the internet have dealt with just printing the elements and not the indices, hence I've had no luck with this problem.
If any of you know an optimal algorithm for this while also satisfying the constraints and running under O(n) time, your help would be highly appreciated. Thank you in advance.
Here is a pseudo-code answering your question, using hash tables (or maps) and set. I let you translate this to cpp using adapted data structures (in this case, classic hashmaps and sets will do the job well).
Notations: we will denote A the array, n its length, and T the "sum".
// first we build a map element -> {set of indices corresponding to this element}
Let M be an empty map; // or hash map, or hash table, or dictionary
for i from 0 to n-1 do {
Let e = A[i];
if e is not a key of M then {
M[e] = new_set()
}
M[e].add(i)
}
// Now we iterate over the elements
for each key e of M do {
if T-e is a key of M then {
display_combinations(M[e], M[T-e]);
}
}
// The helper function display_combinations
function display_combinations(set1, set2) {
for each element e1 of set1 do {
for element e2 of set2 do {
if e1 < e2 then {
display "(e1, e2)";
} else if e1 > e2 then {
display "(e2, e1)";
}
}
}
}
As said in the comments, the complexity in the worst case of this algorithm is in O(n²). A way to see that we cannot go below this complexity is that the size of the output may be in O(n²), in the case where all elements of the array have the value T/2.
Edit: this pseudo code does not output the pairs in the order. Just store them in an array of pairs, and sort this array before displaying it. Same, I did not treat the case where a pair (i, i) may satisfy the requirement. You may have to consider it (just change e1 > e2 by e1 >= e2 in the last loop)

Find uncommon elements using hashing

I think this is a fairly common question but I didn't find any answer for this using hashing in C++.
I have two arrays, both of the same lengths, which contain some elements, for example:
A={5,3,5,4,2}
B={3,4,1,2,1}
Here, the uncommon elements are: {5,5,1,1}
I have tried this approach- iterating a while loop on both the arrays after sorting:
while(i<n && j<n) {
if(a[i]<b[j])
uncommon[k++]=a[i++];
else if (a[i] > b[j])
uncommon[k++]=b[j++];
else {
i++;
j++;
}
}
while(i<n && a[i]!=b[j-1])
uncommon[k++]=a[i++];
while(j < n && b[j]!=a[i-1])
uncommon[k++]=b[j++];
and I am getting the correct answer with this. However, I want a better approach in terms of time complexity since sorting both arrays every time might be computationally expensive.
I tried to do hashing but couldn't figure it out entirely.
To insert elements from arr1[]:
set<int> uncommon;
for (int i=0;i<n1;i++)
uncommon.insert(arr1[i]);
To compare arr2[] elements:
for (int i = 0; i < n2; i++)
if (uncommon.find(arr2[i]) != uncommon.end())
Now, what I am unable to do is to send only those elements to the uncommon array[] which are uncommon to both of them.
Thank you!
First of all, std::set does not have anything to do with hashing. Sets and maps are ordered containers. Implementations may differ, but most likely it is a binary search tree. Whatever you do, you wont get faster that nlogn with them - the same complexity as sorting.
If you're fine with nlogn and sorting, I'd strongly advice just using set_symmetric_difference algorithm https://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference , it requires two sorted containers.
But if you insist on an implementation relying on hashing, you should use std::unordered_set or std::unordered_map. This way you can be faster than nlogn. You can get your answer in nm time, where n = a.size() and m = b.size(). You should create two unordered_set`s: hashed_a, hashed_b and in two loops check what elements from hashed_a are not in hashed_b, and what elements in hashed_b are not in hashed_a. Here a pseudocode:
create hashed_a and hashed_b
create set_result // for the result
for (a_v : hashed_a)
if (a_v not in hashed_b)
set_result.insert(a_v)
for (b_v : hashed_b)
if (b_v not in hashed_a)
set_result.insert(b_v)
return set_result // it holds the symmetric diference, which you need
UPDATE: as noted in the comments, my answer doesn't count for duplicates. The easiest way to modify it for duplicates would be to use unordered_map<int, int> with the keys for elements in the set and values for number of encounters.
First, you need to find a way to distinguish between the same values contained in the same array (for ex. 5 and 5 in the first array, and 1 and 1 in the second array). This is the key to reducing the overall complexity, otherwise you can't do better than O(nlogn). A good possible algorithm for this task is to create a wrapper object to hold your actual values, and put in your arrays pointers to those wrapper objects with actual data, so your pointer addresses will serve as a unique identifier for objects. This wrapping will cost you just O(n1+n2) operations, but also an additional O(n1+n2) space.
Now your problem is that you have in both arrays only elements unique to each of those arrays, and you want to find the uncommon elements. This means the (Union of both array elements) - (Intersection of both array elements). Therefore, all you need to do is to push all the elements of the first array into a hash-map (complexity O(n1)), and then start pushing all the elements of the second array into the same hash-map (complexity O(n2)), by detecting the collisions (equality of an element from first array with an element from the second array). This comparison step will require O(n2) comparisons in the worst case. So for the maximum performance optimization you could have checked the size of the arrays before starting pushing the elements into the hash-map, and swap the arrays so that the first push will take place with the longest array. Your overall algorithm complexity would be O(n1+n2) pushes (hashings) and O(n2) comparisons.
The implementation is the most boring stuff, so I let it to you ;)
A solution without sorting (and without hashing but you seem to care more about complexity then the hashing itself) is to notice the following : an uncommon element e is an element that is in exactly one multiset.
This means that the multiset of all uncommon elements is the union between 2 multisets:
S1 = The element in A that are not in B
S2 = The element in B that are not in A
Using the std::set_difference, you get:
#include <set>
#include <vector>
#include <iostream>
#include <algorithm>
int main() {
std::multiset<int> ms1{5,3,5,4,2};
std::multiset<int> ms2{3,4,1,2,1};
std::vector<int> v;
std::set_difference( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), std::back_inserter(v));
std::set_difference( ms2.begin(), ms2.end(), ms1.begin(), ms1.end(), std::back_inserter(v));
for(int e : v)
std::cout << e << ' ';
return 0;
}
Output:
5 5 1 1
The complexity of this code is 4.(N1+N2 -1) where N1 and N2 are the size of the multisets.
Links:
set_difference: https://en.cppreference.com/w/cpp/algorithm/set_difference
compiler explorer: https://godbolt.org/z/o3KGbf
The Question can Be solved in O(nlogn) time-complexity.
ALGORITHM
Sort both array with merge sort in O(nlogn) complexity. You can also use sort-function. For example sort(array1.begin(),array1.end()).
Now use two pointer method to remove all common elements on both arrays.
Program of above Method
int i = 0, j = 0;
while (i < array1.size() && j < array2.size()) {
// If not common, print smaller
if (array1[i] < array2[j]) {
cout << array1[i] << " ";
i++;
}
else if (array2[j] < array1[i]) {
cout << array2[j] << " ";
j++;
}
// Skip common element
else {
i++;
j++;
}
}
Complexity of above program is O(array1.size() + array2.size()). In worst case say O(2n)
The above program gives the uncommon elements as output. If you want to store them , just create a vector and push them into vector.
Original Problem LINK

Find && Delete Duplicate vector elements in a for loop

I have these files in a vector
vector<string> myFileNames ={"TestFile1","TestFile2","copiedFile1","copiedFile2","copiedFile3"};
Should only have unique file left in list i.e. test file 1&2
vector<string> duplicateFilesFound;
To be filled with files that duplicates i.e. copiedFile 1, 2 && 3 are duplicates
vector<string> myMd5Strings;
Filled with md5 hash values
string target = " ";
int count = 0;
for (unsigned int i = 0; i < myFileNames.size(); ++i)
{
target = myMd5Strings[i];
for (unsigned int k = 1; k < myFileNames.size(); ++k)
{
if (target == myMd5Strings[k])
{
myFileNames.erase(myFileNames.begin() + k);
duplicateFilesFound.push_back(myFileNames[k]);
}
}
cout << "Duplicate Count is : " << duplicateFilesFound.size() << endl;
}
One mistake (maybe there are others) is that you are first getting the MD5 to test, but then your inner k loop will erroneously erase the first item matching the hash. For example, if your target (the i value) is at element 2, your loop doesn't check if k == 2 so as to skip over this first occurrence.
The test should be:
if ( i != k && target == myMd5Strings[k])
The second issue is that if you erase an item, you need to keep the k value where it is and not increment. Otherwise you will skip over the next item to check and miss this item if it matches the hash.
But even with pointing these gotchas out, your attempt is an order of n^2 -- you have 100 names, that is 10,000 iterations, 1,000 names, 1,000,000 iterations, etc. At the end of the day, this will perform poorly on large lists.
The approach mentioned below is logarithmic (due to the sort) in time, not quadratic, thus executing much faster for large list sizes.
But a few structural changes could be done. A better approach would be to store the file name, along with MD5 hash in a single struct. Then declare a container of this struct.
struct file_info
{
std::string filename;
std::string md5hash;
};
//...
std::vector<file_info> fInfoV;
Once you have this, then everything becomes very simple using some basic algorithm functions.
To remove duplicates with a certain md5 hash can be done by first sorting the list using the hash value as the sort criteria, and then removing the duplicates using std::unique and then erase.
#include <algorithm>
//...
std::sort(fInfoV.begin(), fInfoV.end(),
[](const file_info&f1, const file_info& f2)
{ f1.md5hash < f2.md5hash; });
// uniqueify the vector
auto iter = std::unique(fInfoV.begin(), fInfoV.end(),
[](const file_info& f1, const file_info& f2)
{ f1.md5hash == f2.md5hash; });
// for kicks, get the number of duplicates found
auto duplicateCount = std::distance(iter, fInfoV.end());
cout << "There are " << duplicateCount << " duplicates found";
// erase these duplicates
fInfoV.erase(iter, fInfoV.end());
The code above, sorts, uniquifies, and erases the duplicates. I threw in std::distance to show you that you can get the number of duplicates without having to create another vector. The code above also overcomes the need to write loops checking for equivalent loop indices, making sure you reseat the inner index, checking for bounds, etc.
As to time complexity:
The std::sort function is guaranteed to have logarithmic complexity.
The std::unique function has linear complexity:
So this will perform much better for large lists.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Merge sorted arrays - Efficient solution

Goal here is to merge multiple arrays which are already sorted into a resultant array.
I've written the following solution and wondering if there is a way to improve the solution
/*
Goal is to merge all sorted arrays
*/
void mergeAll(const vector< vector<int> >& listOfIntegers, vector<int>& result)
{
int totalNumbers = listOfIntegers.size();
vector<int> curpos;
int currow = 0 , minElement , foundMinAt = 0;
curpos.reserve(totalNumbers);
// Set the current position that was travered to 0 in all the array elements
for ( int i = 0; i < totalNumbers; ++i)
{
curpos.push_back(0);
}
for ( ; ; )
{
/* Find the first minimum
Which is basically the first element in the array that hasn't been fully traversed
*/
for ( currow = 0 ; currow < totalNumbers ; ++currow)
{
if ( curpos[currow] < listOfIntegers[currow].size() )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
break;
}
}
/* If all the elements were traversed in all the arrays, then no further work needs to be done */
if ( !(currow < totalNumbers ) )
break;
/*
Traverse each of the array and find out the first available minimum value
*/
for ( ;currow < totalNumbers; ++currow)
{
if ( listOfIntegers[currow][curpos[currow] ] < minElement )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
}
}
/*
Store the minimum into the resultant array
and increment the element traversed
*/
result.push_back(minElement);
++curpos[foundMinAt];
}
}
The corresponding main goes like this.
int main()
{
vector< vector<int> > myInt;
vector<int> result;
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt[0].push_back(10);
myInt[0].push_back(12);
myInt[0].push_back(15);
myInt[1].push_back(20);
myInt[1].push_back(21);
myInt[1].push_back(22);
myInt[2].push_back(14);
myInt[2].push_back(17);
myInt[2].push_back(30);
mergeAll(myInt,result);
for ( int i = 0; i < result.size() ; ++i)
{
cout << result[i] << endl;
}
}
You can generalize Merge Sort algorithm and work with multiple pointers. Initially, all of them are pointing to the beginning of each array. You maintain these pointers sorted (by the values they point to) in a priority queue. In each step, you remove the smallest element in the heap in O(log n) (n is the number of arrays). You then output the element pointed by the extracted pointer. Now you increment this pointer in one position and if you didn't reach the end of the array, reinsert in the priority queue in O(log n). Proceed this way until the heap is not empty. If there are a total of m elements, the complexity is O(m log n). The elements are output in sorted order this way.
Perhaps I'm misunderstanding the question...and I feel like I'm misunderstanding your solution.
That said, maybe this answer is totally off-base and not helpful.
But, especially with the number of vectors and push_back's you're already using, why do you not just use std::sort?
#include <algorithm>
void mergeAll(const vector<vector<int>> &origList, vector<int> &resultList)
{
for(int i = 0; i < origList.size(); ++i)
{
resultList.insert(resultList.end(), origList[i].begin(), origList[i].end());
}
std::sort(resultList.begin(), resultList.end());
}
I apologize if this is totally off from what you're looking for. But it's how I understood the problem and the solution.
std::sort runs in O(N log (N)) http://www.cppreference.com/wiki/stl/algorithm/sort
I've seen some solution on the internet to merge two sorted arrays, but most of them were quite cumbersome. I changed some of the logic to provide the shortest version I can come up with:
void merge(const int list1[], int size1, const int list2[], int size2, int list3[]) {
// Declaration & Initialization
int index1 = 0, index2 = 0, index3 = 0;
// Loop untill both arrays have reached their upper bound.
while (index1 < size1 || index2 < size2) {
// Make sure the first array hasn't reached
// its upper bound already and make sure we
// don't compare outside bounds of the second
// array.
if ((list1[index1] <= list2[index2] && index1 < size1) || index2 >= size2) {
list3[index3] = list1[index1];
index1++;
}
else {
list3[index3] = list2[index2];
index2++;
}
index3++;
}
}
If you want to take advantage of multi-threading then a fairly good solution would be to just merge 2 lists at a time.
ie suppose you have 9 lists.
merge list 0 with 1.
merge list 2 with 3.
merge list 4 with 5.
merge list 6 with 7.
These can be performed concurrently.
Then:
merge list 0&1 with 2&3
merge list 4&5 with 6&7
Again these can be performed concurrently.
then merge list 0,1,2&3 with list 4,5,6&7
finally merge list 0,1,2,3,4,5,6&7 with list 8.
Job done.
I'm not sure on the complexity of that but it seems the obvious solution and DOES have the bonus of being multi-threadable to some extent.
Consider the priority-queue implementation in this answer linked in a comment above: Merging 8 sorted lists in c++, which algorithm should I use
It's O(n lg m) time (where n = total number of items and m = number of lists).
All you need is two pointers (or just int index counters), checking for minimum between array A and B, copying the value over to the resultant list, and incrementing the pointer of the array the minimum came from. If you run out of elements on one source array, copy the remainder of the second to the resultant and you're done.
Edit:
You can trivially expand this to N arrays.
Edit:
Don't trivially expand this to N arrays :-). Do two at a time. Silly me.
If you are merging very many vector together, then you could speed up performance by using a sort of tree to determine which vector contains the smallest element. This is probably not necessary for your application, but comment if it is and I'll try to work it out.
You could just stick them all into a multiset. That will handle the sorting for you.