Introduction
Good day,
I am looking for a grouping algorithm that can do the following:
Let's suppose I have an array of sorted numbers (without any multiple occurences). For example, {0, 2, 5, 6, 7, 10}.
I want to make groups from that array, such that:
I minimize the number of groups,
Each groups needs to contain numbers that are linked with at most n - 1 "bonds" (for example, n = 3, 0 and 2 are neighbours but not 0 and 3).
EDIT
In other words, when I say neighbours, I should speak about integer distance. For example, the distance of 0 to 2 i 2 (and vice versa). The distance of 0 to 3 is 3. You could think of the problem like a set of 1D points, and one needs to find the minimal number of centers, which center contains points that are distant to it of n/2. I hope it is more clear like that.
The example has multiple groups possible but the best along conditions 1 and 2 (n = 3) is {{0, 2}, {5, 6, 7}, {10}}. {{0}, {2, 5}, {6, 7}, {10}} has one group more than the best solution. The ideal solution would happen if all sorted numbers are continuous:
nb_groups* = ceil(v.size() / n);
In addition, there might be multiple solution depending on the algorithm.
What I tried
For now, what I do is:
Compute the array of the distances between the neighbouring elemens,
Check neighbouring conditions with rests from the beginning of the vector to the end (see the code below).
It seems to work (to me), but I was wondering two things:
Does it really work for any cases (have maybe not tested all cases?)?
If so, could I optimize my implementation in a way (better than in.size() - 1 iteration ans with less memory consumption)?
Code
I was considering a function that takes the vector to group, and the max distance. This function would return the indices of the first element of the group.
#include <iostream>
#include <vector>
std::vector<int> groupe(const std::vector<int>& at, const int& n);
int main() {
// Example of input vector
std::vector<int> in = {0, 2, 5, 6, 7, 10, 11, 22, 30, 50, 51};
// Try to group with neighbouring distance of 3
std::vector<int> res = groupe(in, 3);
// Printing the result
for(const int& a : res) {
std::cout << a << " ";
}
}
std::vector<int> groupe(const std::vector<int>& at, const int& n) {
std::vector<int> out;
// Reste keeps tracks of a bigger neighbouring distance (in case we can look for another element to be in the group)
int reste(0);
size_t s = at.size() - 1;
for(int i = 0; i < s; i++) {
// Computing the distance between element i and i + 1
int d = at[i + 1] - at[i];
if(d >= n) {
if(reste == 0) {
out.push_back(i);
}
reste = 0;
} else {
if(reste == 0) {
out.push_back(i);
}
reste += d;
if(reste >= n) {
reste = 0;
}
}
}
if(reste == 0 || reste >= n) {
out.push_back(s);
}
return out;
}
OUTPUT
0 2 5 7 8 9
Note
If the original vector was not sorted, I guess we could have sorted it first and then achieved this step (or maybe there is another algorithm mor efficient?).
I thank you in advance for your time and help.
I am working with a list of a list of vector of ints (std::list<std::list<std::vector<int>>> z(nlevel)).
I might have something like:
{ {1} {2} {3} }
{ {1 2} {2 1} {1 3} }
{ {1 2 3} {2 1 3} {1 2 4} }
I need to remove the non-unique combination of integers, so e.g., the second element of the list above should become
{ { 1 2 } {1 3} }
This is a large object, so I'm trying to update each element of the outermost list by reference. I've tried something like:
lit = z.begin();
for (i = 0; i < nlevel; i++) {
distinct_z(*lit, primes);
lit++;
}
where distinct_z is a function to find the unique vector combinations by reference, but this doesn't seem to affect the list z. Note: distinct_z does work fine in another part of my code where I am already working with the ith element of the list. I've provided distinct_z below. It includes some unique data types from the Rcpp package in R, but is hopefully understandable. Essentially, I use the log sum of prime numbers to identify non-unique combinations of integers because the order of the integers does not matter. To reiterate, distinct_z does work in another part of my code where I pass it an actual list of vectors of ints. The problem seems to be that I'm trying to pass something using an iterator.
void distinct_lz(std::list<std::vector<int>> &lz,
const IntegerVector &primes) {
int i, j, npids = lz.size();
NumericVector pids(npids);
std::list<std::vector<int>>::iterator lit = lz.begin();
int z_size = lit -> size();
for(i = 0; i < npids; i++) {
for (j = 0; j < z_size; j++) {
// cprime = primes[lit -> at(j)];
// pids[i] += log(cprime);
// cprime = primes[lit -> at(j)];
pids[i] += log(primes[lit -> at(j)]);
}
lit++;
}
LogicalVector dup = duplicated(round(pids, 8));
lit = lz.begin();
for(i = 0; i < npids; i++) {
if(dup(i) == 1) {
lz.erase(lit);
}
lit++;
}
}
What is the best approach for doing what I want?
Background: The data structure probably seems unnecessarily complicated, but I'm enumerating all connected subgraphs starting at a vertex using a breadth-first approach. So given a current subgraph, I see what other vertices are connected to create a set of new subgraphs and repeat. I initially did this using a list of vectors of ints, but removing repeats was ridiculously slow due to the fact that I had to copy the current object if I removed part of the vector. This approach is much faster even though the structure is more complicated.
Edit: Here is a solution that mostly does what I want, though it results in some undesired copying. I updated distinct_z to return a copy of the object instead of modifying the reference, and then replaced the element at lit.
lit = z.begin();
for (i = 0; i < nlevel; i++) {
(*lit) = distinct_z(*lit, primes);
lit++;
}
In C++ there is a well known idiom known as the erase-remove idiom for removing elements from an STL container. It basically involves shuffling unwanted items to the end of the container and then erasing the unwanted tail.
We can use a predicate function (e.g. lambda) to select the items we want to erase and use functions from <algorithm>. In your case we use a set of set of ints (std::<set<int>>) to store unique combinations. Convert each vector in the list to a set of ints and delete it if hasn't been seen before.
#include <set>
#include <list>
#include <vector>
#include <algorithm>
#include <iostream>
void distinct_lz(std::list<std::vector<int>>& lz)
{
std::set<std::set<int>> unqiueNums;
lz.erase(std::remove_if(lz.begin(), lz.end(),
[&unqiueNums](std::vector<int>& v) {
std::set<int> s{ v.cbegin(), v.cend() };
if (unqiueNums.find(s) != unqiueNums.end())
return true;
else
{
unqiueNums.insert(s);
return false;
}
}), lz.end());
}
int main()
{
std::list<std::vector<int>> lv = { {1, 2}, {2, 1}, {1, 3}, {3,4} };
distinct_lz(lv);
for (auto &v: lv)
{
for( auto n: v)
{
std::cout << n << " ";
}
std::cout << "\n";
}
}
Output:
1 2
1 3
3 4
Working version here.
Let say I have some values in a List. I would like to return another list with a new element
fun newList():List<Int>{
val values =listOf<Int>(1,2,3,4,5,6);
return 7::values; // something like that
}
The Kotlin lists have the plus operator overloaded in kotlin-stdlib, so you can add an item to a list:
val values = listOf(1, 2, 3, 4, 5, 6)
return values + 7
There's also an overload that adds another list:
val values = listOf(1, 2, 3, 4, 5, 6)
return listOf(-1, 0) + values + listOf(7, 8)
Note that in both cases a new list is created, and the elements are copied into it.
For MutableList<T> (which has mutating functions, in contrast with List<T>), there is a plusAssign operator implementation, that can be used as follows:
fun newList(): List<Int> {
val values = mutableListOf(1, 2, 3, 4, 5, 6)
values += 7
values += listOf(8, 9)
return values
}
A different approach by using spread operator. Maybe in some case would be simpler than using + sign or mutableListOf
fun newList(): List<Int>{
val values = listOf(1,2,3,4,5,6)
return listOf(7, *values.toTypedArray(), 8, 9)
}
You can do it like this
fun newList():List<Int>{
val values =listOf(1,2,3,4,5,6) //Type can be inferred
return values.plus(7)
}
I wanted a Scala-like with for and yield. It's pretty good in - currently experimental - coroutines :
fun testYield(max:Int): List<Int> {
val values = buildSequence{
for (i in 1..max){
yield(i)
}
}
return values.toList();
}
Or in shorter way:
fun testYieldFast(max: Int) = buildSequence {
for (i in 1..max)
yield(i)
}.toList();
It allows fast immutable lazy construction, where frequent concatenation are usually slow with immutable lists.
Let's say I have a set of elements S = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }
I would like to create combinations of 3 and group them in a way such that no number appears in more than one combination.
Here is an example:
{ {3, 7, 9}, {1, 2, 4}, {5, 6, 8} }
The order of the numbers in the groups does not matter, nor does the order of the groups in the entire example.
In short, I want every possible group combination from every possible combination in the original set, excluding the ones that have a number appearing in multiple groups.
My question: is this actually feasible in terms of run time and memory? My sample sizes could be somewhere around 30-50 numbers.
If so, what is the best way to create this algorithm? Would it be best to create all possible combinations, and choose the groups only if the number hasn't already appeared?
I'm writing this in Qt 5.6, which is a C++ based framework.
You can do this recursively, and avoid duplicates, if you keep the first element fixed in each recursion, and only make groups of 3 with the values in order, eg:
{1,2,3,4,5,6,7,8,9}
Put the lowest element in the first spot (a), and keep it there:
{a,b,c} = {1, *, *}
For the second spot (b), iterate over every value from the second-lowest to the second-highest:
{a,b,c} = {1, 2~8, *}
For the third spot (c), iterate over every value higher than the second value:
{1, 2~8, b+1~9}
Then recurse with the rest of the values.
{1,2,3} {4,5,6} {7,8,9}
{1,2,3} {4,5,7} {6,8,9}
{1,2,3} {4,5,8} {6,7,9}
{1,2,3} {4,5,9} {6,7,8}
{1,2,3} {4,6,7} {5,8,9}
{1,2,3} {4,6,8} {5,7,9}
{1,2,3} {4,6,9} {5,7,8}
{1,2,3} {4,7,8} {5,6,9}
{1,2,3} {4,7,9} {5,6,8}
{1,2,3} {4,8,9} {5,6,7}
{1,2,4} {3,5,6} {7,8,9}
...
{1,8,9} {2,6,7} {3,4,5}
Wen I say "in order", that doesn't have to be any specific order (numerical, alphabetical...), it can just be the original order of the input. You can avoid having to re-sort the input of each recursion if you make sure to pass the rest of the values on to the next recursion in the order you received them.
A run-through of the recursion:
Let's say you get the input {1,2,3,4,5,6,7,8,9}. As the first element in the group, you take the first element from the input, and for the other two elements, you iterate over the other values:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{1,2,8}
{1,2,9}
{1,3,4}
{1,3,5}
{1,3,6}
...
{1,8,9}
making sure the third element always comes after the second element, to avoid duplicates like:
{1,3,5} ⇆ {1,5,3}
Now, let's say that at a certain point, you've selected this as the first group:
{1,3,7}
You then pass the rest of the values onto the next recursion:
{2,4,5,6,8,9}
In this recursion, you apply the same rules as for the first group: take the first element as the first element in the group and keep it there, and iterate over the other values for the second and third element:
{2,4,5}
{2,4,6}
{2,4,8}
{2,4,9}
{2,5,6}
{2,5,8}
{2,5,9}
{2,6,7}
...
{2,8,9}
Now, let's say that at a certain point, you've selected this as the second group:
{2,5,6}
You then pass the rest of the values onto the next recursion:
{4,8,9}
And since this is the last group, there is only one possibility, and so this particular recursion would end in the combination:
{1,3,7} {2,5,6} {4,8,9}
As you see, you don't have to sort the values at any point, as long as you pass them onto the next recursion in the order you recevied them. So if you receive e.g.:
{q,w,e,r,t,y,u,i,o}
and you select from this the group:
{q,r,u}
then you should pass on:
{w,e,t,y,i,o}
Here's a JavaScript snippet which demonstrates the method; it returns a 3D array with combinations of groups of elements.
(The filter function creates a copy of the input array, with elements 0, i and j removed.)
function clone2D(array) {
var clone = [];
for (var i = 0; i < array.length; i++) clone.push(array[i].slice());
return clone;
}
function groupThree(input) {
var result = [], combination = [];
group(input, 0);
return result;
function group(input, step) {
combination[step] = [input[0]];
for (var i = 1; i < input.length - 1; i++) {
combination[step][1] = input[i];
for (var j = i + 1; j < input.length; j++) {
combination[step][2] = input[j];
if (input.length > 3) {
var rest = input.filter(function(elem, index) {
return index && index != i && index != j;
});
group(rest, step + 1);
}
else result.push(clone2D(combination));
}
}
}
}
var result = groupThree([1,2,3,4,5,6,7,8,9]);
for (var r in result) document.write(JSON.stringify(result[r]) + "<br>");
For n things taken 3 at a time, you could use 3 nested loops:
for(k = 0; k < n-2; k++){
for(j = k+1; j < n-1; j++){
for(i = j+1; i < n ; i++){
... S[k] ... S[j] ... S[i]
}
}
}
For a generic solution of n things taken k at a time, you could use an array of k counters.
I think You can solve it by using coin change problem with dynamic programming, just assume You are looking for change of 3 and every index in array is a coin value 1, then just output coins(values in Your array) that has been found.
Link: https://www.youtube.com/watch?v=18NVyOI_690
In preparation of tech interviews a friend of mine came across an interesting problem: given a list of n integers find all identical pairs and return their positions.
Example input: [3,6,6,6,1,3,1]
Expected output: (0,5), (1,2), (1,3), (2,3), (4,6)
Stack Overflow has tons of answers regarding existence checks of unique pairs or special cases like no-duplicates, but I did not find a general, fast solution. Time complexity of my approach below is O(n) best case but degrades to O(n^2) worst case (where input values are all identical).
Is there a way to bring this down to O(n*logN) worst case?
// output is a vector of pairs
using TVecPairs= vector<pair<size_t,size_t>>;
TVecPairs findPairs2( const vector<uint32_t> &input )
{
// map keyvalue -> vector of indices
unordered_map<uint32_t, vector<size_t>> mapBuckets;
// stick into groups of same value
for (size_t idx= 0; idx<input.size(); ++idx) {
// append index for given key value
mapBuckets[input[idx]].emplace_back( idx );
}
// list of index pairs
TVecPairs out;
// foreach group of same value
for (const auto &kvp : mapBuckets) {
const vector<size_t> &group= kvp.second;
for (auto itor= cbegin(group); itor!=cend(group); ++itor) {
for (auto other= itor+1; other!=cend(group); ++other) {
out.emplace_back( make_pair(*itor,*other) );
}
}
}
return out;
}
As the others have said it's a O(n^2) in case you want the output in the way that you mentioned. If you can print it a different way you can do it in O(n * (complexity of insertion/read from a hashmap) = O(n*log(n)) in C++. Some python code describing the above follows:
def dupes(arrlist):
mydict=dict()
count = 0
for x in arrlist:
if mydict.has_key(x):
mydict[x] = mydict[x] + [count]
else:
mydict[x] = [count]
count = count + 1
print mydict
return
And for the above example:
>>> dupes([3, 6, 6, 6, 1, 3, 1])
{1: [4, 6], 3: [0, 5], 6: [1, 2, 3]}