Creating a subset of a set iteratively using vector - c++

So I want to create a subset, run that subset through code, then create a new subset. I'm using a vector for the set and subset. So far I have 3 nested for loops but I'm having trouble figuring out the variables I need.
Here's what I want to do. set = {0, 1, 2, 3, 4, 5} the value matches the index just to simplify this example. I now want subset = {} -> {0} -> {1} -> ... -> {0,1} -> {0,2} -> ... -> {0,5} -> {0,1,2} -> ... -> {0,4,5}. I'm having trouble representing the conditions in terms of variables.
Basically I want the first for loop to increase the subset size. from 0 to set.size() (this is easy). Within that loop, I want to have an iterator corresponding to the index in the element of the subset. I have this iterator initialized to subset.size(), so that we work with the last element first, then work our way to the first element in the subset. then the 3rd for loop, I want to iterate between possible values from the set. Let's say our current subset = {0,1,2} how do I let my program know to put the value '2' inside the last element of the subset, then 1 then 0?
I'm thinking it would involve something with taking the difference from set.size()-1 and subset.size()-1? But I'm not quite sure how. so then I want to iterate through until {0,1,5} and then {0,4,5} but again I'm not sure how to tell the program to stop at 4, as opposed to 5. Again I think this is something with difference but I can't quite figure it out.
to recap:
for loop to iterate through subset size
for loop to iterate through subset "working" element, starting from back
for loop to iterate through that index of subset,
starting from the correct corresponding set value to ending
at the correct corresponding set value
such that the subset goes from {} -> {0} -> {1} ->...-> {0,1} -> {4,5} -> {1,2,3} -> ... -> {1,4,5} and I dont actually need subset = {1,2,3,4,5} but it doesn't hurt my code if I can't stop before that. Again I'm looking to represent the start and end points as variables to make the inner loops work, but I can't figure it out. Huge thanks to anyone who can help me out.

this is approximately how I would go about it.
//handle null subset
for ( int size = 1; size < n; i++ ) {
int indices[size];
for ( int i = 0; i < size; i++ ) indices[i] = i;
while ( indices[0] <= n - size ) {
int i;
for ( i = 1; indices[size - i] == n - i; i-- );
indices[i]++;
for ( i = i + 1; i < size; i++ ) indices[i] = indices[i-1] + 1;
//print out elems using the indices in `indices`
}
//done with all subsets of size `size`
}
The outer loop should be pretty self explanatory. Including 0 seemed like it was going to make some of the inner logic annoying so I started at subsets of size 1.
indices holds the indices of the elements that should be included in the current subset. It starts with the indices 0-size-1.
The condition for the while isn't exactly obvious. The last valid subset this generates contains the last size elements, so if the first index is past n - size we've gone too far.
The inside of the while loop is just incrementing the subset. It looks for the last element that can be incremented and still give a valid subset, increments it, and then resets all of the subsequent elements to be as small as possible. Then you print it out somehow.
And that should be close to something that will do what you want. Let me know if it needs clarifications or corrections.

A trick to enumerate all subsets is to permutate a "selection flag" array, each element of which indicates whether corresponding element in original array is selected.
following is sample code:
void foo(const vector<int>& a)
{
size_t size = a.size();
// selection flag array
// '1' indicates selected, '0' indicates unselected
vector<int> f(size, 0);
for (size_t i = 1; i <= size; i++)
{
// increase the count of selected elements
f[i - 1] = 1;
do
{
for (size_t i = 0; i < size; i++)
{
if (f[i])
{
printf("%d\t", a[i]);
}
}
printf("\n");
} while (next_permutation(f.begin(), f.end(), [](int a, int b){ return a > b; }));
// next_permutation tries to permutate the array
// i.e. '1 1 0 0' -> '1 0 1 0' -> '0 1 1 0' -> ... -> '0 0 1 1'(end)
}
}

Related

Array-Sum Operation

I have written this code using vector. Some case has been passed but others show timeout termination error.
The problem statement is:-
You have an identity permutation of N integers as an array initially. An identity permutation of N integers is [1,2,3,...N-1,N]. In this task, you have to perform M operations on the array and report the sum of the elements of the array after each operation.
The ith operation consists of an integer opi.
If the array contains opi, swap the first and last elements in the array.
Else, remove the last element of the array and push opi to the end of the array.
Input Format
The first line contains two space-separated integers N and M.
Then, M lines follow denoting the operations opi.
Constraints :
2<=N,M <= 10^5
1 <= op <= 5*10^5
Output Format
Print M lines, each containing a single integer denoting the answer to each of the M operations.
Sample Input 0
3 2
4
2
Sample Output 0
7
7
Explanation 0
Initially, the array is [1,2,3].
After the 1st operation, the array becomes[1,2,4] as opi = 4, as 4 is not present in the current array, we remove 3 and push 4 to the end of the array and hence, sum=7 .
After 2nd operation the array becomes [4,2,1] as opi = 2, as 2 is present in the current array, we swap 1 and 4 and hence, sum=7.
Here is my code:
#include <bits/stdc++.h>
using namespace std;
int main()
{
long int N,M,op,i,t=0;
vector<long int > g1;
cin>>N>>M;
if(N>=2 && M>=2) {
g1.reserve(N);
for(i = 1;i<=N;i++) {
g1.push_back(i);
}
while(M--) {
cin>>op;
auto it = find(g1.begin(), g1.end(), op);
if(it != (g1.end())) {
t = g1.front();
g1.front() = g1.back();
g1.back() = t;
cout<<accumulate(g1.begin(), g1.end(), 0);
cout<<endl;
}
else {
g1.back() = op;
cout<<accumulate(g1.begin(), g1.end(), 0);
cout<<endl;
}
}
}
return 0;
}
Please Suggest changes.
Looking carefully in question you will find that the operation are made only on the first and last element. So there is no need to involve a whole vector in it much less calculating the sum. we can calculate the whole sum of the elements except first and last by (n+1)(n-2)/2 and then we can manipulate the first and last element in the question. We can also shorten the search by using (1<op<n or op==first element or op == last element).
p.s. I am not sure it will work completely but it certainly is faster
my guess, let take N = 3, op = [4, 2]
N= [1,2,3]
sum = ((N-2) * (N+1)) / 2, it leave first and last element, give the sum of numbers between them.
we need to play with the first and last elements. it's big o(n).
function performOperations(N, op) {
let out = [];
let first = 1, last = N;
let sum = Math.ceil( ((N-2) * (N+1)) / 2);
for(let i =0;i<op.length;i++){
let not_between = !(op[i] >= 2 && op[i] <= N-1);
if( first!= op[i] && last != op[i] && not_between) {
last = op[i];
}else {
let t = first;
first = last;
last = t;
}
out.push(sum + first +last)
}
return out;
}

Algorithm that can create all combinations and all groups of those combinations

Let's say I have a set of elements S = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }
I would like to create combinations of 3 and group them in a way such that no number appears in more than one combination.
Here is an example:
{ {3, 7, 9}, {1, 2, 4}, {5, 6, 8} }
The order of the numbers in the groups does not matter, nor does the order of the groups in the entire example.
In short, I want every possible group combination from every possible combination in the original set, excluding the ones that have a number appearing in multiple groups.
My question: is this actually feasible in terms of run time and memory? My sample sizes could be somewhere around 30-50 numbers.
If so, what is the best way to create this algorithm? Would it be best to create all possible combinations, and choose the groups only if the number hasn't already appeared?
I'm writing this in Qt 5.6, which is a C++ based framework.
You can do this recursively, and avoid duplicates, if you keep the first element fixed in each recursion, and only make groups of 3 with the values in order, eg:
{1,2,3,4,5,6,7,8,9}
Put the lowest element in the first spot (a), and keep it there:
{a,b,c} = {1, *, *}
For the second spot (b), iterate over every value from the second-lowest to the second-highest:
{a,b,c} = {1, 2~8, *}
For the third spot (c), iterate over every value higher than the second value:
{1, 2~8, b+1~9}
Then recurse with the rest of the values.
{1,2,3} {4,5,6} {7,8,9}
{1,2,3} {4,5,7} {6,8,9}
{1,2,3} {4,5,8} {6,7,9}
{1,2,3} {4,5,9} {6,7,8}
{1,2,3} {4,6,7} {5,8,9}
{1,2,3} {4,6,8} {5,7,9}
{1,2,3} {4,6,9} {5,7,8}
{1,2,3} {4,7,8} {5,6,9}
{1,2,3} {4,7,9} {5,6,8}
{1,2,3} {4,8,9} {5,6,7}
{1,2,4} {3,5,6} {7,8,9}
...
{1,8,9} {2,6,7} {3,4,5}
Wen I say "in order", that doesn't have to be any specific order (numerical, alphabetical...), it can just be the original order of the input. You can avoid having to re-sort the input of each recursion if you make sure to pass the rest of the values on to the next recursion in the order you received them.
A run-through of the recursion:
Let's say you get the input {1,2,3,4,5,6,7,8,9}. As the first element in the group, you take the first element from the input, and for the other two elements, you iterate over the other values:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{1,2,8}
{1,2,9}
{1,3,4}
{1,3,5}
{1,3,6}
...
{1,8,9}
making sure the third element always comes after the second element, to avoid duplicates like:
{1,3,5} &lrarr; {1,5,3}
Now, let's say that at a certain point, you've selected this as the first group:
{1,3,7}
You then pass the rest of the values onto the next recursion:
{2,4,5,6,8,9}
In this recursion, you apply the same rules as for the first group: take the first element as the first element in the group and keep it there, and iterate over the other values for the second and third element:
{2,4,5}
{2,4,6}
{2,4,8}
{2,4,9}
{2,5,6}
{2,5,8}
{2,5,9}
{2,6,7}
...
{2,8,9}
Now, let's say that at a certain point, you've selected this as the second group:
{2,5,6}
You then pass the rest of the values onto the next recursion:
{4,8,9}
And since this is the last group, there is only one possibility, and so this particular recursion would end in the combination:
{1,3,7} {2,5,6} {4,8,9}
As you see, you don't have to sort the values at any point, as long as you pass them onto the next recursion in the order you recevied them. So if you receive e.g.:
{q,w,e,r,t,y,u,i,o}
and you select from this the group:
{q,r,u}
then you should pass on:
{w,e,t,y,i,o}
Here's a JavaScript snippet which demonstrates the method; it returns a 3D array with combinations of groups of elements.
(The filter function creates a copy of the input array, with elements 0, i and j removed.)
function clone2D(array) {
var clone = [];
for (var i = 0; i < array.length; i++) clone.push(array[i].slice());
return clone;
}
function groupThree(input) {
var result = [], combination = [];
group(input, 0);
return result;
function group(input, step) {
combination[step] = [input[0]];
for (var i = 1; i < input.length - 1; i++) {
combination[step][1] = input[i];
for (var j = i + 1; j < input.length; j++) {
combination[step][2] = input[j];
if (input.length > 3) {
var rest = input.filter(function(elem, index) {
return index && index != i && index != j;
});
group(rest, step + 1);
}
else result.push(clone2D(combination));
}
}
}
}
var result = groupThree([1,2,3,4,5,6,7,8,9]);
for (var r in result) document.write(JSON.stringify(result[r]) + "<br>");
For n things taken 3 at a time, you could use 3 nested loops:
for(k = 0; k < n-2; k++){
for(j = k+1; j < n-1; j++){
for(i = j+1; i < n ; i++){
... S[k] ... S[j] ... S[i]
}
}
}
For a generic solution of n things taken k at a time, you could use an array of k counters.
I think You can solve it by using coin change problem with dynamic programming, just assume You are looking for change of 3 and every index in array is a coin value 1, then just output coins(values in Your array) that has been found.
Link: https://www.youtube.com/watch?v=18NVyOI_690

Maintain a sorted array in O(1)?

We have a sorted array and we would like to increase the value of one index by only 1 unit (array[i]++), such that the resulting array is still sorted. Is this possible in O(1)?
It is fine to use any data structure possible in STL and C++.
In a more specific case, if the array is initialised by all 0 values, and it is always incrementally constructed only by increasing a value of an index by one, is there an O(1) solution?
I haven't worked this out completely, but I think the general idea might help for integers at least. At the cost of more memory, you can maintain a separate data-structure that maintains the ending index of a run of repeated values (since you want to swap your incremented value with the ending index of the repeated value). This is because it's with repeated values that you run into the worst case O(n) runtime: let's say you have [0, 0, 0, 0] and you increment the value at location 0. Then it is O(n) to find out the last location (3).
But let's say that you maintain the data-structure I mentioned (a map would works because it has O(1) lookup). In that case you would have something like this:
0 -> 3
So you have a run of 0 values that end at location 3. When you increment a value, let's say at location i, you check to see if the new value is greater than the value at i + 1. If it is not, you are fine. But if it is, you look to see if there is an entry for this value in the secondary data-structure. If there isn't, you can simply swap. If there is an entry, you look up the ending-index and then swap with the value at that location. You then make any changes you need to the secondary data-structure to reflect the new state of the array.
A more thorough example:
[0, 2, 3, 3, 3, 4, 4, 5, 5, 5, 7]
The secondary data-structure is:
3 -> 4
4 -> 6
5 -> 9
Let's say you increment the value at location 2. So you have incremented 3, to 4. The array now looks like this:
[0, 2, 4, 3, 3, 4, 4, 5, 5, 5, 7]
You look at the next element, which is 3. You then look up the entry for that element in the secondary data-structure. The entry is 4, which means that there is a run of 3's that end at 4. This means that you can swap the value from the current location with the value at index 4:
[0, 2, 3, 3, 4, 4, 4, 5, 5, 5, 7]
Now you will also need to update the secondary data-structure. Specifically, there the run of 3's ends one index early, so you need to decrement that value:
3 -> 3
4 -> 6
5 -> 9
Another check you will need to do is to see if the value is repeated anymore. You can check that by looking at the i - 1th and the i + 1th locations to see if they are the same as the value in question. If neither are equal, then you can remove the entry for this value from the map.
Again, this is just a general idea. I will have to code it out to see if it works out the way I thought about it.
Please feel free to poke holes.
UPDATE
I have an implementation of this algorithm here in JavaScript. I used JavaScript just so I could do it quickly. Also, because I coded it up pretty quickly it can probably be cleaned up. I do have comments though. I'm not doing anything esoteric either, so this should be easily portable to C++.
There are essentially two parts to the algorithm: the incrementing and swapping (if necessary), and book-keeping done on the map that keeps track of our ending indices for runs of repeated values.
The code contains a testing harness that starts with an array of zeroes and increments random locations. At the end of every iteration, there is a test to ensure that the array is sorted.
var array = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
var endingIndices = {0: 9};
var increments = 10000;
for(var i = 0; i < increments; i++) {
var index = Math.floor(Math.random() * array.length);
var oldValue = array[index];
var newValue = ++array[index];
if(index == (array.length - 1)) {
//Incremented element is the last element.
//We don't need to swap, but we need to see if we modified a run (if one exists)
if(endingIndices[oldValue]) {
endingIndices[oldValue]--;
}
} else if(index >= 0) {
//Incremented element is not the last element; it is in the middle of
//the array, possibly even the first element
var nextIndexValue = array[index + 1];
if(newValue === nextIndexValue) {
//If the new value is the same as the next value, we don't need to swap anything. But
//we are doing some book-keeping later with the endingIndices map. That code requires
//the ending index (i.e., where we moved the incremented value to). Since we didn't
//move it anywhere, the endingIndex is simply the index of the incremented element.
endingIndex = index;
} else if(newValue > nextIndexValue) {
//If the new value is greater than the next value, we will have to swap it
var swapIndex = -1;
if(!endingIndices[nextIndexValue]) {
//If the next value doesn't have a run, then location we have to swap with
//is just the next index
swapIndex = index + 1;
} else {
//If the next value has a run, we get the swap index from the map
swapIndex = endingIndices[nextIndexValue];
}
array[index] = nextIndexValue;
array[swapIndex] = newValue;
endingIndex = swapIndex;
} else {
//If the next value is already greater, there is nothing we need to swap but we do
//need to do some book-keeping with the endingIndices map later, because it is
//possible that we modified a run (the value might be the same as the value that
//came before it). Since we don't have anything to swap, the endingIndex is
//effectively the index that we are incrementing.
endingIndex = index;
}
//Moving the new value to its new position may have created a new run, so we need to
//check for that. This will only happen if the new position is not at the end of
//the array, and the new value does not have an entry in the map, and the value
//at the position after the new position is the same as the new value
if(endingIndex < (array.length - 1) &&
!endingIndices[newValue] &&
array[endingIndex + 1] == newValue) {
endingIndices[newValue] = endingIndex + 1;
}
//We also need to check to see if the old value had an entry in the
//map because now that run has been shortened by one.
if(endingIndices[oldValue]) {
var newEndingIndex = --endingIndices[oldValue];
if(newEndingIndex == 0 ||
(newEndingIndex > 0 && array[newEndingIndex - 1] != oldValue)) {
//In this case we check to see if the old value only has one entry, in
//which case there is no run of values and so we will need to remove
//its entry from the map. This happens when the new ending-index for this
//value is the first location (0) or if the location before the new
//ending-index doesn't contain the old value.
delete endingIndices[oldValue];
}
}
}
//Make sure that the array is sorted
for(var j = 0; j < array.length - 1; j++) {
if(array[j] > array[j + 1]) {
throw "Array not sorted; Value at location " + j + "(" + array[j] + ") is greater than value at location " + (j + 1) + "(" + array[j + 1] + ")";
}
}
}
In a more specific case, if the array is initialised by all 0 values, and it is always incrementally constructed only by increasing a value of an index by one, is there an O(1) solution?
No. Given an array of all 0's: [0, 0, 0, 0, 0]. If you increment the first value, giving [1, 0, 0, 0, 0], then you will have to make 4 swaps to ensure that it remains sorted.
Given a sorted array with no duplicates, then the answer is yes. But after the first operation (i.e. the first time you increment), then you could potentially have duplicates. The more increments you do, the higher the likelihood is that you'll have duplicates, and the more likely it'll take O(n) to keep that array sorted.
If all you have is the array, it's impossible to guarantee less than O(n) time per increment. If what you're looking for is a data structure that supports sorted order and lookup by index, then you probably want an order stastic tree.
If the values are small, counting sort will work. Represent the array [0,0,0,0] as {4}. Incrementing any zero gives {3,1} : 3 zeroes and a one. In general, to increment any value x, deduct one from the count of x and increment the count of {x+1}. The space efficiency is O(N), though, where N is the highest value.
It depends on how many items can have the same value. If more items can have the same value, then it is not possible to have O(1) with ordinary arrays.
Let's do an example: suppose array[5] = 21, and you want to do array[5]++:
Increment the item:
array[5]++
(which is O(1) because it is an array).
So, now array[5] = 22.
Check the next item (i.e., array[6]):
If array[6] == 21, then you have to keep checking new items (i.e., array[7] and so on) until you find a value higher than 21. At that point you can swap the values. This search is not O(1) because potentially you have to scan the whole array.
Instead, if items cannot have the same value, then you have:
Increment the item:
array[5]++
(which is O(1) because it is an array).
So, now array[5] = 22.
The next item cannot be 21 (because two items cannot have the same value), so it must have a value > 21 and the array is already sorted.
So you take sorted array and hashtable. You go over array to figure out 'flat' areas - where elements are of the same value. For every flat area you have to figure out three things 1) where it starts (index of first element) 2) what is it's value 3) what is the value of next element (the next bigger). Then put this tuple into the hashtable, where the key will be element value. This is prerequisite and it's complexity doesn't really matter.
Then when you increase some element (index i) you look up a table for index of next bigger element (call it j), and swap i with i - 1. Then 1) add new entry to hashtable 2) update existing entry for it's previous value.
With perfect hashtable (or limited range of possible values) it will be almost O(1). The downside: it will not be stable.
Here is some code:
#include <iostream>
#include <unordered_map>
#include <vector>
struct Range {
int start, value, next;
};
void print_ht(std::unordered_map<int, Range>& ht)
{
for (auto i = ht.begin(); i != ht.end(); i++) {
Range& r = (*i).second;
std::cout << '(' << r.start << ", "<< r.value << ", "<< r.next << ") ";
}
std::cout << std::endl;
}
void increment_el(int i, std::vector<int>& array, std::unordered_map<int, Range>& ht)
{
int val = array[i];
array[i]++;
//Pick next bigger element
Range& r = ht[val];
//Do the swapping, so last element of that range will be first
std::swap(array[i], array[ht[r.next].start - 1]);
//Update hashtable
ht[r.next].start--;
}
int main(int argc, const char * argv[])
{
std::vector<int> array = {1, 1, 1, 2, 2, 3};
std::unordered_map<int, Range> ht;
int start = 0;
int value = array[0];
//Build indexing hashtable
for (int i = 0; i <= array.size(); i++) {
int cur_value = i < array.size() ? array[i] : -1;
if (cur_value > value || i == array.size()) {
ht[value] = {start, value, cur_value};
start = i;
value = cur_value;
}
}
print_ht(ht);
//Now let's increment first element
increment_el(0, array, ht);
print_ht(ht);
increment_el(3, array, ht);
print_ht(ht);
for (auto i = array.begin(); i != array.end(); i++)
std::cout << *i << " ";
return 0;
}
Yes and no.
Yes if the list contains only unique integers, as that means you only need to check the next value. No in any other situation. If the values are not unique, incrementing the first of N duplicate values means that it must move N positions. If the values are floating-point, you may have thousands of values between x and x+1
It's important to be very clear about the requirements; the simplest way is to express the problem as an ADT (Abstract Datatype), listing the required operations and complexities.
Here's what I think you are looking for: a datatype which provides the following operations:
Construct(n): Create a new object of size n all of whose values are 0.
Value(i): Return the value at index i.
Increment(i): Increment the value at index i.
Least(): Return the index of the element with least value (or one such element if there are several).
Next(i): Return the index of the next element after element i in a sorted traversal starting at Least(), such that the traversal will return every element.
Aside from the Constructor, we want every one of the above operations to have complexity O(1). We also want the object to occupy O(n) space.
The implementation uses a list of buckets; each bucket has a value and a list of elements. Each element has an index, a pointer to the bucket it is part of. Finally, we have an array of pointers to elements. (In C++, I'd probably use iterators rather than pointers; in another language, I'd probably use intrusive lists.) The invariants are that no bucket is ever empty, and the value of the buckets are strictly monotonically increasing.
We start with a single bucket with value 0 which has a list of n elements.
Value(i) is implemented by returning the value of the bucket of the element referenced by the iterator at element i of the array. Least() is the index of the first element in the first bucket. Next(i) is the index of the next element after the one referenced by the iterator at element i, unless that iterator is already pointing at the end of the the list in which case it is the first element in the next bucket, unless the element's bucket is the last bucket, in which case we're at the end of the element list.
The only interface of interest is Increment(i), which is as follows:
If element i is the only element in its bucket (i.e. there is no next element in the bucket list, and element i is the first element in the bucket list):
Increment the value of the associated bucket.
If the next bucket has the same value, append the next bucket's element list to this bucket's element list (this is O(1), regardless of the list's size, because it is just a pointer swap), and then delete the next bucket.
If element i is not the only element in its bucket, then:
Remove it from its bucket list.
If the next bucket has the next sequential value, then push element i onto the next bucket's list.
Otherwise, the next bucket's value is larger, then create a new bucket with the next sequential value and only element i and insert it between this bucket and the next one.
just iterate along the array from the modified element until you find the correct place, then swap. Average case complexity is O(N) where N is the average number of duplicates. Worst case is O(n) where n is the length of the array. As long as N isn't large and doesn't scale badly with n, you're fine and can probably pretend it's O(1) for practical purposes.
If duplicates are the norm and/or scale strongly with n, then there are better solutions, see other responses.
I think that it is possible without using a hashtable. I have an implementation here:
#include <cstdio>
#include <vector>
#include <cassert>
// This code is a solution for http://stackoverflow.com/questions/19957753/maintain-a-sorted-array-in-o1
//
// """We have a sorted array and we would like to increase the value of one index by only 1 unit
// (array[i]++), such that the resulting array is still sorted. Is this possible in O(1)?"""
// The obvious implementation, which has O(n) worst case increment.
class LinearIncrementor
{
public:
LinearIncrementor(int numElems);
int valueAt(int index) const;
void incrementAt(int index);
private:
std::vector<int> m_values;
};
// Free list to store runs of same values
class RunList
{
public:
struct Run
{
int m_end; // end index of run, inclusive, or next object in free list
int m_value; // value at this run
};
RunList();
int allocateRun(int endIndex, int value);
void freeRun(int index);
Run& runAt(int index);
const Run& runAt(int index) const;
private:
std::vector<Run> m_runs;
int m_firstFree;
};
// More optimal implementation, which increments in O(1) time
class ConstantIncrementor
{
public:
ConstantIncrementor(int numElems);
int valueAt(int index) const;
void incrementAt(int index);
private:
std::vector<int> m_runIndices;
RunList m_runs;
};
LinearIncrementor::LinearIncrementor(int numElems)
: m_values(numElems, 0)
{
}
int LinearIncrementor::valueAt(int index) const
{
return m_values[index];
}
void LinearIncrementor::incrementAt(int index)
{
const int n = static_cast<int>(m_values.size());
const int value = m_values[index];
while (index+1 < n && value == m_values[index+1])
++index;
++m_values[index];
}
RunList::RunList() : m_firstFree(-1)
{
}
int RunList::allocateRun(int endIndex, int value)
{
int runIndex = -1;
if (m_firstFree == -1)
{
runIndex = static_cast<int>(m_runs.size());
m_runs.resize(runIndex + 1);
}
else
{
runIndex = m_firstFree;
m_firstFree = m_runs[runIndex].m_end;
}
Run& run = m_runs[runIndex];
run.m_end = endIndex;
run.m_value = value;
return runIndex;
}
void RunList::freeRun(int index)
{
m_runs[index].m_end = m_firstFree;
m_firstFree = index;
}
RunList::Run& RunList::runAt(int index)
{
return m_runs[index];
}
const RunList::Run& RunList::runAt(int index) const
{
return m_runs[index];
}
ConstantIncrementor::ConstantIncrementor(int numElems) : m_runIndices(numElems, 0)
{
const int runIndex = m_runs.allocateRun(numElems-1, 0);
assert(runIndex == 0);
}
int ConstantIncrementor::valueAt(int index) const
{
return m_runs.runAt(m_runIndices[index]).m_value;
}
void ConstantIncrementor::incrementAt(int index)
{
const int numElems = static_cast<int>(m_runIndices.size());
const int curRunIndex = m_runIndices[index];
RunList::Run& curRun = m_runs.runAt(curRunIndex);
index = curRun.m_end;
const bool freeCurRun = index == 0 || m_runIndices[index-1] != curRunIndex;
RunList::Run* runToMerge = NULL;
int runToMergeIndex = -1;
if (curRun.m_end+1 < numElems)
{
const int nextRunIndex = m_runIndices[curRun.m_end+1];
RunList::Run& nextRun = m_runs.runAt(nextRunIndex);
if (curRun.m_value+1 == nextRun.m_value)
{
runToMerge = &nextRun;
runToMergeIndex = nextRunIndex;
}
}
if (freeCurRun && !runToMerge) // then free and allocate at the same time
{
++curRun.m_value;
}
else
{
if (freeCurRun)
{
m_runs.freeRun(curRunIndex);
}
else
{
--curRun.m_end;
}
if (runToMerge)
{
m_runIndices[index] = runToMergeIndex;
}
else
{
m_runIndices[index] = m_runs.allocateRun(index, curRun.m_value+1);
}
}
}
int main(int argc, char* argv[])
{
const int numElems = 100;
const int numInc = 1000000;
LinearIncrementor linearInc(numElems);
ConstantIncrementor constInc(numElems);
srand(1);
for (int i = 0; i < numInc; ++i)
{
const int index = rand() % numElems;
linearInc.incrementAt(index);
constInc.incrementAt(index);
for (int j = 0; j < numElems; ++j)
{
if (linearInc.valueAt(j) != constInc.valueAt(j))
{
printf("Error: differing values at increment step %d, value at index %d\n", i, j);
}
}
}
return 0;
}
As a complement to the other answers: if you can only have the array, then you cannot indeed guarantee the operation will be constant-time; but because the array is sorted, you can find the end of a run of identical numbers in log n operations, not in n operations. This is simply a binary search.
If we expect most runs of numbers to be short, we should use galloping search, which is a variant where we first find the bounds by looking at positions +1, +2, +4, +8, +16, etc. and then doing binary search inside. You would get a time that is often constant (and extremely fast if the item is unique) but can grow up to log n. Unless for some reason long runs of identical numbers remain common even after many updates, this might outperform any solution that requires keeping additional data.

Using an array and moving duplicates to end

I got this question at an interview and at the end was told there was a more efficient way to do this but have still not been able to figure it out. You are passing into a function an array of integers and an integer for size of array. In the array you have a lot of numbers, some that repeat for example 1,7,4,8,2,6,8,3,7,9,10. You want to take that array and return an array where all the repeated numbers are put at the end of the array so the above array would turn into 1,7,4,8,2,6,3,9,10,8,7. The numbers I used are not important and I could not use a buffer array. I was going to use a BST, but the order of the numbers must be maintained(except for the duplicate numbers). I could not figure out how to use a hash table so I ended up using a double for loop(n^2 horrible I know). How would I do this more efficiently using c++. Not looking for code, just an idea of how to do it better.
In what follows:
arr is the input array;
seen is a hash set of numbers already encountered;
l is the index where the next unique element will be placed;
r is the index of the next element to be considered.
Since you're not looking for code, here is a pseudo-code solution (which happens to be valid Python):
arr = [1,7,4,8,2,6,8,3,7,9,10]
seen = set()
l = 0
r = 0
while True:
# advance `r` to the next not-yet-seen number
while r < len(arr) and arr[r] in seen:
r += 1
if r == len(arr): break
# add the number to the set
seen.add(arr[r])
# swap arr[l] with arr[r]
arr[l], arr[r] = arr[r], arr[l]
# advance `l`
l += 1
print arr
On your test case, this produces
[1, 7, 4, 8, 2, 6, 3, 9, 10, 8, 7]
I would use an additional map, where the key is the integer value from the array and the value is an integer set to 0 in the beginning. Now I would go through the array and increase the values in the map if the key is already in the map.
In the end I would go again through the array. When the integer from the array has a value of one in the map, I would not change anything. When it has a value of 2 or more in the map I would swap the integer from the array with the last one.
This should result in a runtime of O(n*log(n))
The way I would do this would be to create an array twice the size of the original and create a set of integers.
Then Loop through the original array, add each element to the set, if it already exists add it to the 2nd half of the new array, else add it to the first half of the new array.
In the end you would get an array that looks like: (using your example)
1,7,4,8,2,6,3,9,10,-,-,8,7,-,-,-,-,-,-,-,-,-
Then I would loop through the original array again and make each spot equal to the next non-null position (or 0'd or whatever you decided)
That would make the original array turn into your solution...
This ends up being O(n) which is about as efficient as I can think of
Edit: since you can not use another array, when you find a value that is already in the
set you can move every value after it forward one and set the last value equal to the
number you just checked, this would in effect do the same thing but with a lot more operations.
I have been out of touch for a while, but I'd probably start out with something like this and see how it scales with larger input. I know you didn't ask for code but in some cases it's easier to understand than an explanation.
Edit: Sorry I missed the requirement that you cannot use a buffer array.
// returns new vector with dupes a the end
std::vector<int> move_dupes_to_end(std::vector<int> input)
{
std::set<int> counter;
std::vector<int> result;
std::vector<int> repeats;
for (std::vector<int>::iterator i = input.begin(); i < input.end(); i++)
{
if (counter.find(*i) == counter.end())
result.push_back(*i);
else
repeats.push_back(*i);
counter.insert(*i);
}
result.insert(result.end(), repeats.begin(), repeats.end());
return result;
}
#include <algorithm>
T * array = [your array];
size_t size = [array size];
// Complexity:
sort( array, array + size ); // n * log(n) and could be threaded
// (if merge sort)
T * last = unique( array, array + size ); // n, but the elements after the last
// unique element are not defined
Check sort and unique.
void remove_dup(int* data, int count) {
int* L=data; //place to put next unique number
int* R=data+count; //place to place next repeat number
std::unordered_set<int> found(count); //keep track of what's been seen
for(int* cur=data; cur<R; ++cur) { //until we reach repeats
if(found.insert(*cur).second == false) { //if we've seen it
std::swap(*cur,*--R); //put at the beginning of the repeats
} else //or else
std::swap(*cur,*L++); //put it next in the unique list
}
std::reverse(R, data+count); //reverse the repeats to be in origional order
}
http://ideone.com/3choA
Not that I would turn in code this poorly commented. Also note that unordered_set probably uses it's own array internally, bigger than data. (This has been rewritten based on aix's answer, to be much faster)
If you know the bounds on what the integer values are, B, and the size of the integer array, SZ, then you can do something like the following:
Create an array of booleans seen_before with B elements, initialized to 0.
Create a result array result of integers with SZ elements.
Create two integers, one for front_pos = 0, one for back_pos = SZ - 1.
Iterate across the original list:
Set an integer variable val to the value of the current element
If seen_before[val] is set to 1, put the number at result[back_pos] then decrement back_pos
If seen_before[val] is not set to 1, put the number at result[front_pos] then increment front_pos and set seen_before[val] to 1.
Once you finish iterating across the main list, all the unique numbers will be at the front of the list while the duplicate numbers will be at the back. Fun part is that the entire process is done in one pass. Note that this only works if you know the bounds of the values appearing in the original array.
Edit: It was pointed out that there's no bounds on the integers used, so instead of initializing seen_before as an array with B elements, initialize it as a map<int, bool>, then continue as usual. That should get you n*log(n) performance.
This can be done by iterating the array & marking index of the first change.
later on swaping that mark index value with next unique value
& then incrementing that mark index for next swap
Java Implementation:
public static void solve() {
Integer[] arr = new Integer[] { 1, 7, 4, 8, 2, 6, 8, 3, 7, 9, 10 };
final HashSet<Integer> seen = new HashSet<Integer>();
int l = -1;
for (int i = 0; i < arr.length; i++) {
if (seen.contains(arr[i])) {
if (l == -1) {
l = i;
}
continue;
}
if (l > -1) {
final int temp = arr[i];
arr[i] = arr[l];
arr[l] = temp;
l++;
}
seen.add(arr[i]);
}
}
output is 1 7 4 8 2 6 3 9 10 8 7
It's ugly, but it meets the requirements of moving the duplicates to the end in place (no buffer array)
// warning, some light C++11
void dup2end(int* arr, size_t cnt)
{
std::set<int> k;
auto end = arr + cnt-1;
auto max = arr + cnt;
auto curr = arr;
while(curr < max)
{
auto res = k.insert(*curr);
// first time encountered
if(res.second)
{
++curr;
}
else
{
// duplicate:
std::swap(*curr, *end);
--end;
--max;
}
}
}
void move_duplicates_to_end(vector<int> &A) {
if(A.empty()) return;
int i = 0, tail = A.size()-1;
while(i <= tail) {
bool is_first = true; // check of current number is first-shown
for(int k=0; k<i; k++) { // always compare with numbers before A[i]
if(A[k] == A[i]) {
is_first = false;
break;
}
}
if(is_first == true) i++;
else {
int tmp = A[i]; // swap with tail
A[i] = A[tail];
A[tail] = tmp;
tail--;
}
}
If the input array is {1,7,4,8,2,6,8,3,7,9,10}, then the output is {1,7,4,8,2,6,10,3,9,7,8}. Comparing with your answer {1,7,4,8,2,6,3,9,10,8,7}, the first half is the same, while the right half is different, because I swap all duplicates with the tail of the array. As you mentioned, the order of the duplicates can be arbitrary.

Merge sorted arrays - Efficient solution

Goal here is to merge multiple arrays which are already sorted into a resultant array.
I've written the following solution and wondering if there is a way to improve the solution
/*
Goal is to merge all sorted arrays
*/
void mergeAll(const vector< vector<int> >& listOfIntegers, vector<int>& result)
{
int totalNumbers = listOfIntegers.size();
vector<int> curpos;
int currow = 0 , minElement , foundMinAt = 0;
curpos.reserve(totalNumbers);
// Set the current position that was travered to 0 in all the array elements
for ( int i = 0; i < totalNumbers; ++i)
{
curpos.push_back(0);
}
for ( ; ; )
{
/* Find the first minimum
Which is basically the first element in the array that hasn't been fully traversed
*/
for ( currow = 0 ; currow < totalNumbers ; ++currow)
{
if ( curpos[currow] < listOfIntegers[currow].size() )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
break;
}
}
/* If all the elements were traversed in all the arrays, then no further work needs to be done */
if ( !(currow < totalNumbers ) )
break;
/*
Traverse each of the array and find out the first available minimum value
*/
for ( ;currow < totalNumbers; ++currow)
{
if ( listOfIntegers[currow][curpos[currow] ] < minElement )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
}
}
/*
Store the minimum into the resultant array
and increment the element traversed
*/
result.push_back(minElement);
++curpos[foundMinAt];
}
}
The corresponding main goes like this.
int main()
{
vector< vector<int> > myInt;
vector<int> result;
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt[0].push_back(10);
myInt[0].push_back(12);
myInt[0].push_back(15);
myInt[1].push_back(20);
myInt[1].push_back(21);
myInt[1].push_back(22);
myInt[2].push_back(14);
myInt[2].push_back(17);
myInt[2].push_back(30);
mergeAll(myInt,result);
for ( int i = 0; i < result.size() ; ++i)
{
cout << result[i] << endl;
}
}
You can generalize Merge Sort algorithm and work with multiple pointers. Initially, all of them are pointing to the beginning of each array. You maintain these pointers sorted (by the values they point to) in a priority queue. In each step, you remove the smallest element in the heap in O(log n) (n is the number of arrays). You then output the element pointed by the extracted pointer. Now you increment this pointer in one position and if you didn't reach the end of the array, reinsert in the priority queue in O(log n). Proceed this way until the heap is not empty. If there are a total of m elements, the complexity is O(m log n). The elements are output in sorted order this way.
Perhaps I'm misunderstanding the question...and I feel like I'm misunderstanding your solution.
That said, maybe this answer is totally off-base and not helpful.
But, especially with the number of vectors and push_back's you're already using, why do you not just use std::sort?
#include <algorithm>
void mergeAll(const vector<vector<int>> &origList, vector<int> &resultList)
{
for(int i = 0; i < origList.size(); ++i)
{
resultList.insert(resultList.end(), origList[i].begin(), origList[i].end());
}
std::sort(resultList.begin(), resultList.end());
}
I apologize if this is totally off from what you're looking for. But it's how I understood the problem and the solution.
std::sort runs in O(N log (N)) http://www.cppreference.com/wiki/stl/algorithm/sort
I've seen some solution on the internet to merge two sorted arrays, but most of them were quite cumbersome. I changed some of the logic to provide the shortest version I can come up with:
void merge(const int list1[], int size1, const int list2[], int size2, int list3[]) {
// Declaration & Initialization
int index1 = 0, index2 = 0, index3 = 0;
// Loop untill both arrays have reached their upper bound.
while (index1 < size1 || index2 < size2) {
// Make sure the first array hasn't reached
// its upper bound already and make sure we
// don't compare outside bounds of the second
// array.
if ((list1[index1] <= list2[index2] && index1 < size1) || index2 >= size2) {
list3[index3] = list1[index1];
index1++;
}
else {
list3[index3] = list2[index2];
index2++;
}
index3++;
}
}
If you want to take advantage of multi-threading then a fairly good solution would be to just merge 2 lists at a time.
ie suppose you have 9 lists.
merge list 0 with 1.
merge list 2 with 3.
merge list 4 with 5.
merge list 6 with 7.
These can be performed concurrently.
Then:
merge list 0&1 with 2&3
merge list 4&5 with 6&7
Again these can be performed concurrently.
then merge list 0,1,2&3 with list 4,5,6&7
finally merge list 0,1,2,3,4,5,6&7 with list 8.
Job done.
I'm not sure on the complexity of that but it seems the obvious solution and DOES have the bonus of being multi-threadable to some extent.
Consider the priority-queue implementation in this answer linked in a comment above: Merging 8 sorted lists in c++, which algorithm should I use
It's O(n lg m) time (where n = total number of items and m = number of lists).
All you need is two pointers (or just int index counters), checking for minimum between array A and B, copying the value over to the resultant list, and incrementing the pointer of the array the minimum came from. If you run out of elements on one source array, copy the remainder of the second to the resultant and you're done.
Edit:
You can trivially expand this to N arrays.
Edit:
Don't trivially expand this to N arrays :-). Do two at a time. Silly me.
If you are merging very many vector together, then you could speed up performance by using a sort of tree to determine which vector contains the smallest element. This is probably not necessary for your application, but comment if it is and I'll try to work it out.
You could just stick them all into a multiset. That will handle the sorting for you.