Data clustering and comparison between two arrays - c++

I have two collections of elements. How can I pick out those with duplicates and put them into each group with least amount of comparison? Preferably in C++.
For Example given
Array 1 = {1, 1, 2, 2, 3, 4, 5, 5, 1, 1, 2, 2, 4, 5, 8, …}
Array 2 = {2, 1, 1, 2, 2, 4, 7, 7, 8, 8, 2, 2, 4, 4, 8, …}.
At first, I want to cluster data.
Array 1 = { Group 1 = {1, 1, 1, 1, …}, Group 2 = {2, 2, 2, 2, …}, Group 3 = {3, …}, Group 4 = {4, 4, …}, Group 5 = {5, 5, 5, …}, Group 6 = {8, …} }.
Array 2 = { Group 1 = {1, 1, …}, Group 2 = {2, 2, 2, 2, 2 …}, Group 3 = {4, 4 ,4, …}, Group 4 = {7, 7, …}, Group 5 = {8, 8, 8 …} }.
And second, I want data matching.
Group 1 of Array 1 == Group 1 of Array 2
Group 2 of Array 1 == Group 2 of Array 2
Group 4 of Array 1 == Group 3 of Array 2
Group 6 of Array 1 == Group 5 of Array 2
How can I solve this problem in C++? Please give me your brilliant tips.
Additionally, I will explain my problem in detail. I have two data sets which is calculated in stereo image. Array 1 is data of left camera, and Array 2 is data of right camera. My final goal is to match groups which have same values such as group 6 of array1 and group 5 of array 2. Data ordering is not my consideration. I just want to find same values between groups in two arrays. (Will you recommend me to use data ordering first to reduce the number of comparison? ).
In order to solve this problem, should I use ‘std::map’ for data clustering, and compare those N! times (N: no. of groups in array 1 or 2)? Is this best way that I can do?
I’d like to get your advice. Thank you for sharing my problems.
My conclusion
My approach is to use map container in C++ STL.
Make 2 map containers (Array1_map, Array2_map).
Insert value of each array into the map containers as a key, and insert index of each array into the map as a value. (Two data of both arrays are orderly saved in a map without duplication.)
Use find() member function of map container for data matching.
After data matching, I was able to get the indexes of each array which have the matched keys (corresponding keys).
Thank you for all your helpful answers!

The easiest way I can see to do this is to construct a histogram of each array. Then you can compare those histograms together. That should be O(NlogN) to convert each array to a histogram where N is the array size and then O(N) to compare the histograms when N is the number of unique elements in the array (size of the map). That would look like
int arr1[] = {...};
int arr2[] = {...};
std::map<int, int> arr1_histogram, arr2_histogram;
for (auto e : arr1)
arr1_histogram[e]++;
for (auto e : arr2)
arr2_histogram[e]++;
if (arr1_histogram == arr2_histogram)
// true case
else
// false case

Related

Which is the fastest method to reshape a vector to a matrix of overlapping windows in C++ using Eigen library?

Let's suppose you have a vector of N elements and you want to create a n by m matrix made of stacked overlapping windows sliced from the original vector. Which is the fastest way to perform this operation in C++ using Eigen library?
for ex (N = 7, window_length(m) = 3, overlap = 2):
0, 1, 2, 3, 4, 5, 6 -> 0, 1, 2
1, 2, 3
2, 3, 4
3, 4, 5
4, 5, 6
where m and overlap must be parametric.

Combinations in ROOT

What does the Combinations function do in ROOT/C++?
I only found this documentation
https://root.cern.ch/doc/master/namespaceROOT_1_1VecOps.html#a6d1d00c2ccb769cc48c6813dbeb132db
But I am still not sure what it does exactly.
Can someone provide an example showing how the answers in the documentation examples are computed?
Here is an example of what Combinations is doing:
Suppose you have a vector v{1., 2., 3., 4.,}
1, 2, 3, and 4 are the elements of the vector v
and 0, 1, 2, 3 are the indices of those elements.
If we write
Combinations (v, 2)
we get
{{ 0, 0, 0, 1, 1, 2} , { 1, 2, 3, 2, 3, 3}}.
That comes from looking at the different combinations of the vector elements.
Which are:
1, 2
1, 3
1, 4
2, 3
2, 4
3, 4
Which has the corresponding indices
0 1
0 2
0 3
1 2
1 3
2 3
Then, the left-side column makes the first vector in the answer and the right side column makes the second vector shown in the answer.

how to handle reappearing values with std::next_permutation

I have a few vectors.
I want to find all permutations of each vector.
It works reasonably well, when the values are unique but if there are reappearing values it messes up.
I have the following vectors
vector<string> present = {"Schaukelpferd","Schaukelpferd","Puppe","Puppe"};
vector<string> children = {"Jan","Tim","Alex","Daniel"};
vector<int> houses = {4,5,5,5};
I am sorting the before using next_permutation()
sort(present.begin(),present.end());
sort(children.begin(),children.end());
sort(houses.begin(),houses.end());
do {
present_perm.push_back(present);
} while (next_permutation(present.begin(), present.end()));
do {
children_perm.push_back(children);
} while (next_permutation(children.begin(), children.end()));
do {
houses_perm.push_back(houses);
} while (next_permutation(houses.begin(), houses.end()));
children works good, but present as well as houses doesn't work as expected
children returns 24 permutation, as expected, present returns only 6 and houses returns only 4. I would expect all to return 24 because all vectors have 4 elements (4! = 24).
Consider the four integer values 4, 5, 5, 5. The four possible permutations are 4, 5, 5, 5 and 5, 4, 5, 5 and 5, 5, 4, 5 and 5, 5, 5, 4. That's it. The three 5s have the same value, so they cannot be distinguished from each other. The algorithm doesn't keep track of which of those values originally came before the other; they're the same. The same thing applies to present: there are three distinct values, not four.

How to move 2 elements from head into given position in vector

I want to move the first 2 elements to given position in vector, the result is not right by using memmove in following code:
vector<int> v{1, 2, 3, 4, 0};
memmove(&(v[3]), &(v[0]), 2);
The result by doing so is 1, 2, 3, 1, 0, while the expectation is 1, 2, 3, 1, 2. How can I achieve my job?
memmove copies bytes, not arbitrary objects (like int). So you would need to calculate the correct byte count with 2 * sizeof(int).
But a better way is to use std::copy_n:
std::copy_n(v.begin(), 2, v.begin() + 3);

Mathematica: How to combine lists of unequal length into single data structure

I have multiple lists, for instance:
a = {1, 2, 3};
b = {4, 5, 6, 7, 8};
I would like to combine these lists so that I can export a datafile of the following format:
a, b
1, 4
2, 5
3, 6
, 7
, 8
I can't for the life of me figure it out.