Finding the Unique Elements Among n Arrays

Finding the Unique Elements Among n Arrays - c++

I'm trying to write an algorithm that takes a variable amount of generic arrays, stored in d_arrays, and gathers all the unique elements (elements which occur exactly once) among them and stores them in an array, called d_results. For example, the arrays:
int intA[] = { 12, 54, 42 };
int intB[] = { 54, 3, 42, 7 };
int intC[] = { 3, 42, 54, 57, 3 };
Would produce the array d_results with the contents { 12, 7, 57 }.
Here's my current algorithm for the process:
template <class T>
inline
void UniqueTableau<T>::run() {
T* uniqueElements = d_arrays[0];
int count = 0;
for (int i = 1; i < d_currentNumberOfArrays; ++i) {
if (count == 0) {
uniqueElements = getUnique(uniqueElements, d_arrays[i], d_sizes[i - 1], d_sizes[i]);
++count;
}
else {
uniqueElements = getUnique(uniqueElements, d_arrays[i], d_numberOfElementsInResult, d_sizes[i]);
}
}
d_results = uniqueElements;
}
template <class T>
inline
T* UniqueTableau<T>::getUnique(T* first, T* second, int sizeOfFirst, int sizeOfSecond) {
int i = 0;
int j = 0;
int k = 0;
T* uniqueElements = new T[sizeOfFirst + sizeOfSecond];
while (i < sizeOfFirst) { // checks the first against the second
while ((first[i] != second[j]) && (j < sizeOfSecond)) {
++j;
}
if (j == sizeOfSecond) {
uniqueElements[k] = first[i];
++i;
++k;
j = 0;
} else {
++i;
j = 0;
}
}
i = 0;
j = 0;
while (i < sizeOfSecond) { // checks the second against the first
while ((second[i] != first[j]) && (j < sizeOfFirst)) {
++j;
}
if (j == sizeOfFirst) {
uniqueElements[k] = second[i];
++i;
++k;
j = 0;
} else {
++i;
j = 0;
}
}
T* a = new T[k]; // properly sized result array
for (int x = 0; x < k; ++x) {
a[x] = uniqueElements[x];
}
d_numberOfElementsInResult = k;
return a;
}
Note that d_sizes is an array holding the sizes of each array in d_arrays, and d_numberOfElementsInResult is the number of elements in d_results.
Now, what this array is doing is comparing two at a time, getting the unique elements between those two, and comparing those elements with the next array and so on. The problem is, when I do this, sometimes there are elements that are, for example, unique between the third array and the unique elements of the first two, but not unique between the third and first. That is confusingly worded, so here's a visual example using the arrays from above:
First, the algorithm finds the unique elements of the first and second arrays.
{ 12, 3, 7 }
Now, it checks this against the third array, producing the unique elements between those.
{ 12, 7, 42, 54, 57 }
Right? Wrong. The problem here, is that since 42 and 54 don't appear in the unique array, they end up in the final product, even though they are common to all three arrays.
Can anyone think of a solution for this? Alterations to this algorithm are preferred, but if that's not possible, what's another way to approach this problem?

EDIT: As pointed out the algorithm is O(nlogn) time and O(n) space complexity.
Do a traversal of each element in all the arrays and form a map of the count of each item traversed.
Once the map is created, just iterate through it and form array of those elements for which count is one.

Memory is the problem and though I'd do this in a different way (due lack of experience?) -- Actually I was thinking of the answer that just got posted!
Anyways, do not throw away your duplicates and save them in a secondary array. Take this array and append it twice to each new array and this will allow little change to your algorithm. Only change is creating the duplicates and looking through a larger list each time. Though this adds time and memory. If that is a concern then go with the first posted answer!

solution 1:
Just put all the element of all the arrays in to one.
sort the array
remove duplicate.
solution 2:
create a map where key is the element and value is boolean
just traverse individual array. if the element is not present in the map than put key as the element and value as true. But if the element is already present than make the value as false.
Now just print the element from the map whose value part is true i.e. just occurred once.
Why i am putting value as boolean not an integer:
As we know that if an element in the form of key in the map is present, it shows the element is present in the array. So if we make false next time if we find the element again it shows duplicate. Hope you understand.

Related

Remove duplicates from array C++

Input: int arr[] = {10, 20, 20, 30, 40, 40, 40, 50, 50}
Output: 10, 30
My code:
int removeDup(int arr[], int n)
{
int temp;
bool dupFound = false;
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
if(arr[i] == arr[j]){
if(!dupFound){
temp = arr[i];
dupFound = true;
}
else{
arr[i] = temp;
}
}
}
}
//shift here
}
First of all, I don't know if this is the most efficient way of doing this.
I'm trying to find the first duplicate element, assign it to every duplicate element and shift them to the end of the array, which doesn't work because the last duplicate element cannot be compared.
I need some help with finding the last duplicate element, so I can assign temp to it.

I do not understand the logic of your code. When you find the second element arr[j] that equals arr[i] you will assign temp to arr[i]. However, temp has been assigned arr[i] when you found the first duplicate. Essentially you do arr[i] = arr[i]. Its not clear how this is supposed to find unique elements.
You can use a map to count frequency of elements, then print those with frequency 1:
#include <unordered_map>
#include <iostream>
int main()
{
std::unordered_map<int,size_t> freq;
int arr[] = {10, 20, 20, 30, 40, 40, 40, 50, 50};
// count frequencies
for (auto e : arr) { ++freq[e]; }
// print the elements e where freq[e] == 1
for (const auto& f : freq) {
if (f.second == 1) {
std::cout << f.first << "\n";
}
}
}
Only small modifications needed to add the unique elements to a vector.

Instead of trying to do everything at once, let us focus on correctness first:
int removeDup(int* arr, int n) {
// Note: No i++! This depends on whether we find a duplicate.
for (int i = 0; i < n;) {
int v = arr[i];
bool dupFound = false;
for (int j = i+1; j < n; j++) {
if (v == arr[j]) {
dupFound = true;
break;
}
}
if (!dupFound) {
i++;
continue;
}
// Copy values to the sub-array starting at position i,
// skipping all values equal to v.
int write = i, skipped = 0;
for (int j = i; j < n; j++) {
if (arr[j] != v) {
arr[write] = arr[j];
write++;
} else {
skipped++;
}
}
// The previous loop duplicated some non-v elements.
// We decrease n to make sure these duplicates are not
// considered in the output
n -= skipped;
}
return n;
}

Let's start with logistics (so to speak). An array always contains a fixed number of items. There's simply no way to start with an array of 5 items, and turn it into an array of 2 items. Simply can't be done.
So, as a starting point, you need to either return something like an std::vector that keeps track of its size along with the data it contains, or else you're going to need to track the size, and return something to indicate how many elements in the array are valid after the processing.
Probably the simplest way to do things would be to use something like an std::unoredered_map to count the items, then walk through the map, and insert an item in the output if (and only if) its count is 1.
std::unordered_map<int, std::size_t> counts;
for (int i=0;i<n; i++)
++counts[arr[i]];
std::vector<int> output;
for (auto item : counts)
if (item.second == 1)
output.push_back(item.first);
return output;
If you want to modify the data in place, I'd start by sorting the input data. Then you'll start with two indices: one for your "input" position, and one for your "output" position. output starts as zero, and input as 1.
The general idea from there is pretty simple: we look at data[input] and see if it's different from both the preceding and succeeding elements. If so, we copy it to data[output], and increment the output position.
Since this tries to look at both the preceding and succeeding elements, we have to include special cases for the beginning and end of the array. The first element is unique if it's different from the following, and the end is unique if it's different from the preceding. Code can look like this:
#include <algorithm>
#include <iostream>
unsigned remove_dupe(int *data, unsigned n) {
if (n < 2) {
return n;
}
std::sort(data, data+n);
unsigned output = data[0] != data[1];
for (unsigned input = output+1; input<n-1; input++)
if (data[input] != data[input-1] && data[input] != data[input+1])
data[output++] = data[input];
if (data[n-1] != data[n-2]) {
data[output++] = data[n-1];
}
return output;
}
template <class T, std::size_t N>
void test(T (&arr)[N]) {
unsigned end = remove_dupe(arr, N);
for (int i=0; i<end; i++)
std::cout << arr[i] << "\t";
std::cout << "\n";
}
int main() {
int arr0[] = {10, 20, 20, 30, 40, 40, 40, 50, 50};
int arr1[] = { 1, 2};
int arr2[] = { 1, 1};
test(arr0);
test(arr1);
test(arr2);
}
Result:
10 30
1 2

Another option that might be available is to sort() the array. When this is done, all duplicate values throughout the array are now adjacent. You simply compare element [n] with element [n+1] to see if they are the same. You can now find and count all duplicates in a single linear pass through the sorted array.
Sorting is one of the most heavily-studied class of algorithms in computer science, and very efficient processes can be developed which rely upon things being sorted a certain way.

Pick every element once from sorted array

I have a sorted array and I want to take every element once into an other array
Example:
Input: array[] = { 1,2,2,3,3,5 }
Output: array2[] = { 1,2,3,5 }
Here is my attempt
int db = 0,array2[100];
for(int i = 0;i < k;i++){
int j = 0;
for(j = 0;j < db;j++){
if(array[i] == array2[j]){
break;
}
}
if(i == j){
array2[db] == array[i];
db++;
}
}
/* PRINT
for(int i = 0;i < db;i++){
cout<<array2[i]<<" ";
}
cout<<endl;*/

There's a standard algorithm std::unique_copy that does exactly this:
auto end = std::unique_copy(std::begin(array), std::end(array), array2);
The returned iterator end points to one past the last element that is inserted into array2, so you can calculate the number of unique elements that were copied like this:
auto num = std::distance(array2, end);
I would recommend using std::vector instead of arrays anyway, and then you don't have to worry about computing the number of copied unique elements. In case you use a vector the 3rd argument to the algorihtm would be std::back_inserter(vec).

We can't give you an answer about what happened to your code if we don't know what k is.
But generally, if you want unique values from a sorted array, the quick way to do it is just employ the set.
#include <set>
then set<int, greater<int> > s1;
for (int i: array) s1.insert(i);
this will only add unique value in the new vector in increasing order.

How to remake an existing random array to have unique elements (with restrictions)

The question ask for a function that takes an array of integers pre-filled with random elements. The function goes through the array linearly and if any element is equal to any of the preceding elements, regenerate that element. This goes on until the whole array is made of unique elements.
I know that this is very inefficient way of generating an array with unique elements but I wanted to give it a try. I wrote this code which falls into an infinite loop.
void makeUnique(int* const objArr, const int& size)
{
int i = 1,j = 0;
do {
j = 0;
while (j < i) {
if (objArr[j] == objArr[i]) {
objArr[i] = rand();
--i;
break;
}
++j;
}
++i;
} while (i<size);
}
Where am I going wrong?

Find order of an array using minimum memory and time

Let's say i have an array of 5 elements. My program knows it's always 5 elements and when sorted it's always 1,2,3,4,5 only.
As per permutations formula i.e n!/(n-r)! we can order it in 120 ways.
In C++ using std::next_permutation I can generate all those 120 orders.
Now, my program/routine accepts an input argument as a number in the range of 1 to 120 and gives the specific order of an array as output.
This works fine for small array sizes as i can repeat std::next_permutation until that matches input parameter.
The real problem is, How can i do it in less time if my array has 25 elements or more? For 25 elements, the number of possible orders are : 15511210043330985984000000.
Is there a technique that I can easily find the order of numbers using a given number as input?
Thanks in advance :)

This is an example c++ implementation of the algorithm mentioned in this link:
#include <vector>
#define ull unsigned long long
ull factorial(int n) {
ull fac = 1;
for (int i = 2; i <= n; i++)
fac *= i;
return fac;
}
std::vector<int> findPermutation(int len, long idx) {
std::vector<int> original = std::vector<int>(len);
std::vector<int> permutation = std::vector<int>();
for (int i = 0; i < len; i++) {
original[i] = i;
}
ull currIdx = idx;
ull fac = factorial(len);
while (original.size() > 0) {
fac /= original.size();
int next = (currIdx - 1) / fac;
permutation.push_back(original[next]);
original.erase(original.begin() + next);
currIdx -= fac * next;
}
return permutation;
}
The findPermutation function accepts the length of the original string and the index of the required permutation, and returns an array that represents that permutation. For example, [0, 1, 2, 3, 4] is the first permutation of any string with length 5, and [4, 3, 2, 1, 0] is the last (120th) permutation.

I have had a similar problem where I was storing lots of row in a Gtk TreeView and did not want to go over all of them every time I want to access a row by its position and not by its reference.
So, I created a map of the positions of the row so I could easily identify them by the parameter I needed.
So, my suggestion to this is you go over all permutations once and map every std::permutation in an array (I used a std::vector), so you can access it by myVector[permutation_id].
Here is my way I have done the mapping:
vector<int> FILECHOOSER_MAP;
void updateFileChooserMap() {
vector<int> map;
TreeModel::Children children = getInterface().getFileChooserModel()->children();
int i = 0;
for(TreeModel::Children::iterator iter = children.begin(); iter != children.end(); iter++) {
i++;
TreeModel::Row row = *iter;
int id = row[getInterface().getFileChooserColumns().id];
if( id >= map.size()) {
for(int x = map.size(); x <= id; x++) {
map.push_back(-1);
}
}
map[id] = i;
}
FILECHOOSER_MAP = map;
}
So in your case you would just iterate over the permutations like this and you can map them in a way that allows you accesing them by their id.
I hope this helps you :D
regards, tagelicht

Sorting two arrays into a combined array

I haven't done any programming classes for a few years, so please forgive any beginner mistakes/methods of doing something. I'd love suggestions for the future. With the code below, I'm trying to check the values of two arrays (sorted already) and put them into a combined array. My solution, however inefficient/sloppy, is to use a for loop to compare the contents of each array's index at j, then assign the lower value to index i of the combinedArray and the higher value to index i+1. I increment i by 2 to avoid overwriting the previous loop's indexes.
int sortedArray1 [5] = {11, 33, 55, 77, 99};
int sortedArray2 [5] = {22, 44, 66, 88, 00};
combinedSize = 10;
int *combinedArray;
combinedArray = new int[combinedSize];
for(int i = 0; i <= combinedSize; i+=2)
{
for(int j = 0; j <= 5; j++)
{
if(sortedArray1[j] < sortedArray2[j])
{
combinedArray[i] = sortedArray1[j];
combinedArray[i+1] = sortedArray2[j];
}
else if(sortedArray1[j] > sortedArray2[j])
{
combinedArray[i] = sortedArray2[j];
combinedArray[i+1] = sortedArray1[j];
}
else if(sortedArray1[j] = sortedArray2[j])
{
combinedArray[i] = sortedArray1[j];
combinedArray[i+1] = sortedArray2[j];
}
}
}
for(int i = 0; i < combinedSize; i++)
{
cout << combinedArray[i];
cout << " ";
}
And my result is this
Sorted Array 1 contents: 11 33 55 77 99
Sorted Array 2 contents: 0 22 44 66 88
5 77 5 77 5 77 5 77 5 77 Press any key to continue . . .
In my inexperienced mind, the implementation of the sorting looks good, so I'm not sure why I'm getting this bad output. Advice would be fantastic.

what about this:
int i=0,j=0,k=0;
while(i<5 && j<5)
{
if(sortedArray1[i] < sortedArray2[j])
{
combinedArray[k]=sortedArray1[i];
i++;
}
else
{
combinedArray[k]=sortedArray2[j];
j++;
}
k++;
}
while(i<5)
{
combinedArray[k]=sortedArray1[i];
i++;k++;
}
while(j<5)
{
combinedArray[k]=sortedArray2[j];
j++; k++;
}

Firstly, there are some immediate problems with how you use C++:
You use = instead of == for equality check (hence causing undesired value assignments and the if-condition to return true when it shouldn't);
Your outer loops upper boundary is defined as i <= 10, while the correct boundary check would be i < 10;
You have a memory leak at the end of the function because you fail to de-allocate memory. You need a delete [] combinedArray at the end.
Secondly, your outer loop iterates through all values of the destination array, and in each step uses an inner loop to iterate through all values of the source arrays. That is not what you want. What you want is one loop counting from j=0 to j<5 and iterating through the source arrays. The positions in the destination array are then determined as 2*j and 2*j+1, and there is no need for an inner loop.
Thirdly, as explained in the comment, a correct implementation of sorted-list merge needs two independent counters j1 and j2. However, your current input is hardwired into the code, and if you replace 00 with 100, your current algorithm (after the corrections above are made) will actually work for the given input.
Finally, but less importantly, I wonder why your destination array is allocated on the heap using new. As long as you are dealing with small arrays, you may allocate it on the stack just like the source arrays. If, however, you allocate it on the heap, better use a std::unique_ptr<>, possibly combined with std::array<>. You'll get de-allocation for free then without having to think of putting a delete [] statement at the end of the function.

Before even looking at the implementation, check the algorithm and write it down with pen and paper. The first thing that pops is that you are assuming that the first two elements in the result will come one from each source array. That need not be the case, consider two arrays where all elements in one are smaller than all elements in the other and the expected result:
int a[] = { 1, 2, 3 };
int b[] = { 4, 5, 6 };
If you want the result sorted, then the first three elements come all from the first array. With that in mind think on what you really know about the data. In particular, both arrays are sorted, which means that the first elements will be smaller than the rest of the elements in the respective array. The implication of this is that the smaller element is the smaller of the heads. By putting that element into the result you have reduced the problem to a smaller set. You have a' = { 2, 3 }, b = { 4, 5, 6 } and res = { 1 } and a new problem that is finding the second element of res knowing that a' and b are sorted.
Figure out in paper what you need to do, then it should be straight forward to map that to code.

So, I modified your code to make it work. Actually it would be good idea to have two pointer/index for two sorted arrays. So that you can update your corresponding pointer after adding it to your combinedArray. Let me know if you don't understand any part of this code. Thanks.
int sortedArray1 [5] = {11, 33, 55, 77, 99};
int sortedArray2 [5] = {0, 22, 44, 66, 88};
int combinedSize = 10;
int *combinedArray;
combinedArray = new int[combinedSize];
int j = 0;
int k = 0;
for(int i = 0; i < combinedSize; i++)
{
if (j < 5 && k < 5) {
if (sortedArray1[j] < sortedArray2[k]) {
combinedArray[i] = sortedArray1[j];
j++;
} else {
combinedArray[i] = sortedArray2[k];
k++;
}
}
else if (j < 5) {
combinedArray[i] = sortedArray1[j];
j++;
}
else {
combinedArray[i] = sortedArray2[k];
k++;
}
}
for(int i = 0; i < combinedSize; i++)
{
cout << combinedArray[i];
cout << " ";
}
cout<<endl;

The else if(sortedArray1[j] = sortedArray2[j]), did you mean else if(sortedArray1[j] == sortedArray2[j])?
The former one will assign the value of sortedArray2[j] to sortedArray1[j] -- and that's the reason that why you get 5 77 5 77...
But where's the 5 come from? There's no 5 in either sortedArray, yet I find for(int j = 0; j <= 5; j++) must be something wrong. The highest index of a size N array is N-1 rather than N in C/C++(but not in Basic).. so use j<5 as the condition, or you may fall into some situation which is hard to explain or predict..
After all, there's problem in your algorithm itself, every time the outer loop loops, it will at last compare the last elements in the two arrays, which makes the output to repeat two numbers.
So you need to correct your algorithm too, see Merge Sort.

Slightly different approach, which is IMHO a bit cleaner:
//A is the first array, m its length
//B is the second array, n its length
printSortedAndMerged(int A[], int m, int B[], int n){
int c[n+m];
int i=0, j=0;
for(int k=0; k < n+m; k++){
if(i < m && j < n){
if(A[i] < B[j]){
c[k] = A[i];
i++;
}
else{
c[k] = B[j];
j++;
}
continue; //jump to next iteration
}
if(i < m){ // && ~(j < n)
//we already completely traversed B[]
c[k] = A[i];
i++;
continue;
}
if(j < n){ // %% ~(i < m)
//we already completely traversed A[]
c[k] = B[j];
j++;
continue;
}
//we should never reach this
cout << "Wow, something wrong happened!" << endl;
}//for
for(int i=0; i<n+m; i++){
cout << c[i] << endl;
}
}
Hope it helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding the Unique Elements Among n Arrays - c++

EDIT: As pointed out the algorithm is O(nlogn) time and O(n) space complexity. Do a traversal of each element in all the arrays and form a map of the count of each item traversed. Once the map is created, just iterate through it and form array of those elements for which count is one.

Related

Remove duplicates from array C++

Pick every element once from sorted array

How to remake an existing random array to have unique elements (with restrictions)

Find order of an array using minimum memory and time

Sorting two arrays into a combined array

Categories

Resources