c++ vector random shuffle part of it - c++

Whats the best way to shuffle a certain percentage of elements in a vector.
Say I want 10% or 90% of the vector shuffled.
Not necessarily the first 10% but just 10% across the board.
TIA

Modify a Fisher-Yates shuffle to do nothing on 10% of the indices in the array.
This is java code that I'm posting (from Wikipedia) and modifying, but I think you can make the translation to C++, because this is more of an algorithms problem than a language problem.
public static void shuffleNinetyPercent(int[] array)
{
Random rng = new Random(); // java.util.Random.
int n = array.length; // The number of items left to shuffle (loop invariant).
while (n > 1)
{
n--; // n is now the last pertinent index
if (rng.nextDouble() < 0.1) continue; //<-- ADD THIS LINE
int k = rng.nextInt(n + 1); // 0 <= k <= n.
// Simple swap of variables
int tmp = array[k];
array[k] = array[n];
array[n] = tmp;
}
}

You could try this:
Assign a random number to each element of the vector. Shuffle the elements whose random number is in the smallest 10% of the random numbers you assigned: You could even imagine replacing that 10% in the vector with placeholders, then sort your 10% according to their random number, and insert them back into the vector where your placeholders are.

How about writing your own random iterator and using random_shuffle, something like this: (Completely untested, just to get an idea)
template<class T>
class myRandomIterator : public std::iterator<std::random_access_iterator_tag, T>
{
public:
myRandomIterator(std::vector<T>& vec, size_t pos = 0): myVec(vec), myIndex(0), myPos(pos)
{
srand(time(NULL));
}
bool operator==(const myRandomIterator& rhs) const
{
return myPos == rhs.myPos;
}
bool operator!=(const myRandomIterator& rhs) const
{
return ! (myPos == rhs.myPos);
}
bool operator<(const myRandomIterator& rhs) const
{
return myPos < rhs.myPos;
}
myRandomIterator& operator++()
{
++myPos;
return fill();
}
myRandomIterator& operator++(int)
{
++myPos;
return fill();
}
myRandomIterator& operator--()
{
--myPos;
return fill();
}
myRandomIterator& operator--(int)
{
--myPos;
return fill();
}
myRandomIterator& operator+(size_t n)
{
++myPos;
return fill();
}
myRandomIterator& operator-(size_t n)
{
--myPos;
return fill();
}
const T& operator*() const
{
return myVec[myIndex];
}
T& operator*()
{
return myVec[myIndex];
}
private:
myRandomIterator& fill()
{
myIndex = rand() % myVec.size();
return *this;
}
private:
size_t myIndex;
std::vector<T>& myVec;
size_t myPos;
};
int main()
{
std::vector<int> a;
for(int i = 0; i < 100; ++i)
{
a.push_back(i);
}
myRandomIterator<int> begin(a);
myRandomIterator<int> end(a, a.size() * 0.4);
std::random_shuffle(begin, end);
return 0;
}

one way may using , std::random_shuffle() , control % by controlling input range ....

Why not perform N swaps of randomly selected positions, where N is determined by the percentage?
So if I have 100 elements, a 10% shuffle will perform 10 swaps. Each swap randomly picks two elements in the array and switches them.

If you have SGI's std::random_sample extension, you can do this. If not, it's easy to implement random_sample on top of a function which returns uniformly-distributed random integers in a specified range (Knuth, Volume 2, "Algorithm R").
#include <algorithm>
#include <vector>
using std::vector;
void shuffle_fraction(vector<int> &data, double fraction) {
assert(fraction >= 0.0 && fraction <= 1.0);
// randomly choose the indices to be shuffled
vector<int> bag(data.size());
for(int i = 0; i < bag.size(); ++i) bag[i] = i;
vector<int> selected(static_cast<int>(data.size() * fraction));
std::random_sample(bag.begin(), bag.end(), selected.begin(), selected.end());
// take a copy of the values being shuffled
vector<int> old_value(selected.size());
for (int i = 0; i < selected.size(); ++i) {
old_value[i] = data[selected[i]];
}
// choose a new order for the selected indices
vector<int> shuffled(selected);
std::random_shuffle(shuffled.begin(), shuffled.end());
// apply the shuffle to the data: each of the selected indices
// is replaced by the value for the corresponding shuffled indices
for (int i = 0; i < selected.size(); ++i) {
data[selected[i]] = old_value[shuffled[i]];
}
}
Not the most efficient, since it uses three "small" vectors, but avoids having to adapt the Fisher-Yates algorithm to operate on a subset of the vector. In practice you'd probably want this to be a function template operating on a pair of random-access iterators rather than a vector. I haven't done that because I think it would obfuscate the code a little, and you didn't ask for it. I'd also take a size instead of a proportion, leaving it up to the caller to decide how to round fractions.

you can use the shuffle bag algorithm to select 10% of your array. Then use the normal shuffle on that selection.

Related

How to efficiently permute an array in-place (using std::swap)

How can I apply a permutation in-place? My permutations are effectively size_t[] where perm[i] represents the target index for an input index i.
I know how to apply a permutation if I have an input and output array:
struct Permutation {
std::vector<size_t> perm;
template <typename T>
void apply(const T in[], T out[]) const
{
for (size_t i = 0; i < size(); ++i) {
out[i] = std::move(in[perm[i]]);
}
}
}
However, I would like to do this with only one array, similar to how std::sort works, so just using std::swap. My idea so far is:
struct Permutation {
std::vector<size_t> perm;
template <typename T>
void apply(T data[]) const
{
for (size_t i = 0; i < size(); ++i) {
std::swap(data[i], data[perm[i]]);
}
}
}
But this wouldn't work. For example:
Permutation perm = {2, 1, 0};
char data[] {'a', 'b', 'c'};
perm.apply(data);
// because I swap indices 0 and 2 twice, I end up with the input array
data == {'a', 'b', 'c'};
So how do I correctly permute an array in-place? It is okay if additional memory is allocated, as long as this happens in a pre-computation step when the Permutation is constructed. I want the in-place permutation to happen fast and from the looks of it, demanding that no additional memory is allocated at all will lead to some severe performance sacrifices.
I am specifically referencing Algorithm to apply permutation in constant memory space, where all of the provided answers either cheat by using negative-integer space to avoid an allocation or enter "nasty" nested loops which blow up the time complexity to O(n²).
Edits
Please pay attention before suggesting std::next_permutation. I am not trying to generate all possible permutations, which I could do with std::next_permutation. I am instead trying to apply a single particular permutation to an array.
The hint to find the cycles and permute each cycle worked for me. To sum up my approach, I find the start indices of all cycles in the constructor.
Then, in apply(), I permute each cycle by just repeatedly using std::swap.
struct Permutation {
private:
/// The single vector which stores both the permutation
/// AND the indices of the cycles starts.
std::vector<size_t> perm;
/// The size of the permutation / index of first cycle index.
size_t permSize;
public:
Permutation(std::vector<size_t> table)
: perm{std::move(table)}, permSize{perm.size()} {
findCycles();
}
template <typename T>
void apply(T data[]) const {
for (size_t cycle = permSize; cycle < perm.size(); ++cycle) {
const size_t start = perm[cycle];
for (size_t prev = start, next = perm[prev];
next != start;
prev = next, next = perm[next]) {
std::swap(data[prev], data[next]);
}
}
}
size_t size() const {
return permSize;
}
private:
void findCycles();
}
findCycles() is also easy to implement, but requires the temporary allocation of a bit-vector.
void Permutation::findCycles() {
std::vector<bool> visited(size());
for (size_t i = 0; i < size(); ++i) {
if (visited[i]) {
continue;
}
for (size_t j = i; not visited[j]; ) {
visited[j] = true;
j = perm[j];
}
perm.push_back(i);
}
}

Maintain an unordered_map but at the same time need the lowest of it's mapped values at every step

I have an unordered_map<int, int> which is updated at every step of a for loop. But at the end of the loop, I also need the lowest of the mapped values. Traversing it to find the minimum in O(n) is too slow. I know there exists MultiIndex container in boost but I can't use boost. What is the simplest way it can be done using only STL?
Question:
Given an array A of positive integers, call a (contiguous, not
necessarily distinct) subarray of A good if the number of different
integers in that subarray is exactly K.
(For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.)
Return the number of good subarrays of A.
My code:
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
unordered_map<int, int> M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M[A[right]] = right;
if (right == A.size())
return 0;
int smallest, count;
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count = smallest - left + 1;
for (; right < A.size(); ++right)
{
M[A[right]] = right;
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count += smallest - left + 1;
}
return count;
}
};
Link to the question: https://leetcode.com/problems/subarrays-with-k-different-integers/
O(n) is not slow, in fact it is the theoretically fastest possible way to find the minimum, as it's obviously not possible to find the minimum of n items without actually considering each of them.
You could update the minimum during the loop, which is trivial if the loop only adds new items to the map but becomes much harder if the loop may change existing items (and may increase the value of the until-then minimum item!), but ultimately, this also adds O(n) amount of work, or more, so complexity-wise, it's not different from doing an extra loop at the end (obviously, the constant can be different - the extra loop may be slower than reusing the original loop, but the complexity is the same).
As you said, there are data structures that make it more efficient (O(log n) or even O(1)) to retrieve the minimum item, but at the cost of increased complexity to maintain this data structure during insertion. These data structures only make sense if you frequently need to check the minimum item while inserting or changing items - not if you only need to know the minimum only at the end of the loop, as you described.
I made a simple class to make it work although it's far from perfect, it's good enough for the above linked question.
class BiMap
{
public:
void insert(int key, int value)
{
auto itr = M.find(key);
if (itr == M.cend())
M.emplace(key, S.insert(value).first);
else
{
S.erase(itr->second);
M[key] = S.insert(value).first;
}
}
void erase(int key)
{
auto itr = M.find(key);
S.erase(itr->second);
M.erase(itr);
}
int operator[] (int key)
{
return *M.find(key)->second;
}
int size()
{
return M.size();
}
int minimum()
{
return *S.cbegin();
}
private:
unordered_map<int, set<int>::const_iterator> M;
set<int> S;
};
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
BiMap M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M.insert(A[right], right);
if (right == A.size())
return 0;
int count = M.minimum() - left + 1;
for (; right < A.size(); ++right)
{
M.insert(A[right], right);
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
count += M.minimum() - left + 1;
}
return count;
}
};

How to optimize reusing a large std::unordered_map as a temporary in a frequently called function?

Simplified question with a working example: I want to reuse a std::unordered_map (let's call it umap) multiple times, similar to the following dummy code (which does not do anything meaningful). How can I make this code run faster?
#include <iostream>
#include <unordered_map>
#include <time.h>
unsigned size = 1000000;
void foo(){
std::unordered_map<int, double> umap;
umap.reserve(size);
for (int i = 0; i < size; i++) {
// in my real program: umap gets filled with meaningful data here
umap.emplace(i, i * 0.1);
}
// ... some code here which does something meaningful with umap
}
int main() {
clock_t t = clock();
for(int i = 0; i < 50; i++){
foo();
}
t = clock() - t;
printf ("%f s\n",((float)t)/CLOCKS_PER_SEC);
return 0;
}
In my original code, I want to store matrix entries in umap. In each call to foo, the key values start from 0 up to N, and N can be different in each call to foo, but there is an upper limit of 10M for indices. Also, values can be different (contrary to the dummy code here which is always i*0.1).
I tried to make umap a non-local variable, for avoiding the repeated memory allocation of umap.reserve() in each call. This requires to call umap.clear() at the end of foo, but that turned out to be actually slower than using a local variable (I measured it).
I don't think there is any good way to accomplish what you're looking for directly -- i.e. you can't clear the map without clearing the map. I suppose you could allocate a number of maps up-front, and just use each one of them a single time as a "disposable map", and then go on to use the next map during your next call, but I doubt this would give you any overall speedup, since at the end of it all you'd have to clear all of them at once, and in any case it would be very RAM-intensive and cache-unfriendly (in modern CPUs, RAM access is very often the performance bottleneck, and therefore minimizing the number cache misses is the way to maximize effiency).
My suggestion would be that if clear-speed is so critical, you may need to move away from using unordered_map entirely, and instead use something simpler like a std::vector -- in that case you can simply keep a number-of-valid-items-in-the-vector integer, and "clearing" the vector is a matter of just setting the count back to zero. (Of course, that means you sacrifice unordered_map's quick-lookup properties, but perhaps you don't need them at this stage of your computation?)
A simple and effective way is reusing same container and memory again and again with pass-by-reference as follows.
In this method, you can avoid their recursive memory allocation std::unordered_map::reserve and std::unordered_map::~unordered_map which both have the complexity O(num. of elemenrs):
void foo(std::unordered_map<int, double>& umap)
{
std::size_t N = ...// set N here
for (int i = 0; i < N; ++i)
{
// overwrite umap[0], ..., umap[N-1]
// If umap does not have key=i, then it is inserted.
umap[i] = i*0.1;
}
// do something and not access to umap[N], ..., umap[size-1] !
}
The caller side would be as follows:
std::unordered_map<int,double> umap;
umap.reserve(size);
for(int i=0; i<50; ++i){
foo(umap);
}
But since your key set is always continuous integers {1,2,...,N}, I think that std::vector which enables you to avoid hash calculations would be more preferable to save values umap[0], ..., umap[N]:
void foo(std::vector<double>& vec)
{
int N = ...// set N here
for(int i = 0; i<N; ++i)
{
// overwrite vec[0], ..., vec[N-1]
vec[i] = i*0.1;
}
// do something and not access to vec[N], ..., vec[size-1] !
}
Have you tried to avoid all memory allocation by using a simple array? You've said above that you know the maximum size of umap over all calls to foo():
#include <iostream>
#include <unordered_map>
#include <time.h>
constexpr int size = 1000000;
double af[size];
void foo(int N) {
// assert(N<=size);
for (int i = 0; i < N; i++) {
af[i] = i;
}
// ... af
}
int main() {
clock_t t = clock();
for(int i = 0; i < 50; i++){
foo(size /* or some other N<=size */);
}
t = clock() - t;
printf ("%f s\n",((float)t)/CLOCKS_PER_SEC);
return 0;
}
As I suggested in the comments, closed hashing would be better for your use case. Here's a quick&dirty closed hash map with a fixed hashtable size you could experiment with:
template<class Key, class T, size_t size = 1000003, class Hash = std::hash<Key>>
class closed_hash_map {
typedef std::pair<const Key, T> value_type;
typedef typename std::vector<value_type>::iterator iterator;
std::array<int, size> hashtable;
std::vector<value_type> data;
public:
iterator begin() { return data.begin(); }
iterator end() { return data.end(); }
iterator find(const Key &k) {
size_t h = Hash()(k) % size;
while (hashtable[h]) {
if (data[hashtable[h]-1].first == k)
return data.begin() + (hashtable[h] - 1);
if (++h == size) h = 0; }
return data.end(); }
std::pair<iterator, bool> insert(const value_type& obj) {
size_t h = Hash()(obj.first) % size;
while (hashtable[h]) {
if (data[hashtable[h]-1].first == obj.first)
return std::make_pair(data.begin() + (hashtable[h] - 1), false);
if (++h == size) h = 0; }
data.emplace_back(obj);
hashtable[h] = data.size();
return std::make_pair(data.end() - 1, true); }
void clear() {
data.clear();
hashtable.fill(0); }
};
It can be made more flexible by dynamically resizing the hashtable on demand when appropriate, and more efficient by using robin-hood replacment.

Arranging by indexes vector

I have two vectors: a vector and index vector. How can I make the vector be arranged by the indexes vector? Like:
Indexes 5 0 2 1 3 4
Values a b c d e f
Values after operation b d c e f a
The indexes vector will always contain the range [0, n) and each index only once.
I need this operation to be done in place because the code is going to be run on a device with low memory.
How can I do this in c++? I can use c++11
Since you know that your index array is a permutation of [0, N), you can do this in linear time and in-place (plus one temporary) by working cycle-by-cycle. Something like this:
size_t indices[N];
data_t values[N];
for (size_t pos = 0; pos < N; ++pos) // \
{ // } this loops _over_ cycles
if (indices[pos] == pos) continue; // /
size_t i = pos;
const data_t tmp = values[pos];
while (true) // --> this loops _through_ one cycle
{
const size_t next = indices[i];
indices[i] = i;
values[i] = values[next];
if (next == pos) break;
i = next;
}
values[i] = tmp;
}
This implementation has the advantage over using swap each time that we only need to use the temporary variable once per cycle.
If the data type is move-only, this still works if all the assignments are surrounded by std::move().
std::vector<int> indices = { 5, 0, 2, 1, 3, 4};
std::vector<char> values = {'a', 'b', 'c', 'd', 'e', 'f'};
for(size_t n = 0; n < indices.size(); ++n)
{
while(indices[n] != n)
{
std::swap(values[n], values[indices[n]]);
std::swap(indices[n], indices[indices[n]]);
}
}
EDIT:
I think this should be O(n), anyone disagree?
for(int i=0;i<=indexes.size();++i)
for(int j=i+1;j<=indexes.size();++j)
if(indexes[i] > indexes[j] )
swap(indexes[i],indexes[j]),
swap(values[i],values[j]);
It's O(N²) complexity, but should work fine on small number values.
You can also pass a comparison function to the C++ STL sort function if you want O(N*logN)
You can just sort the vector, your comparison operation should compare the indices. Of course, when moving the data around, you have to move the indices, too.
At the end, your indices will be just 0, 1, ... (n-1), and the data will be at the corresponding places.
As implementation note: you can store the values and indices together in a structure:
struct SortEntry
{
Data value;
size_t index;
};
and define the comparison operator to look only at indices:
bool operator< (const SortEntry& lhs, const SortEntry& rhs)
{
return lhs.index < rhs.index;
}
This solution runs in O(n) time:
int tmp;
for(int i = 0; i < n; i++)
while(indexes[i] != i){
swap(values[i], values[indexes[i]]);
tmp = indexes[i];
swap(indexes[i], indexes[tmp]);
}
This will run in O(n) time without any error.Check it on ideone
int main(int argc, char *argv[])
{
int indexes[6]={2,3,5,1,0,4};
char values[6]={'a','b','c','d','e','f'};
int result[sizeof(indexes)/4]; //creating array of size indexes or values
int a,i;
for( i=0;i<(sizeof(indexes)/4);i++)
{
a=indexes[i]; //saving the index value at i of array indexes
result[a]=values[i]; //saving the result in result array
}
for ( i=0;i<(sizeof(indexes)/4);i++)
printf("%c",result[i]); //printing the result
system("PAUSE");
return 0;
}

Find which numbers appears most in a vector

I have some numbers stored in a std::vector<int>. I want to find which number appears most in the vector.
e.g. in the vector
1 3 4 3 4 2 1 3 2 3
the element that occurs the most is 3.
Is there any algorithm (STL or whatever) that does this ?
Sort it, then iterate through it and keep a counter that you increment when the current number is the same as the previous number and reset to 0 otherwise. Also keep track of what was the highest value of the counter thus far and what the current number was when that value was reached. This solution is O(n log n) (because of the sort).
Alternatively you can use a hashmap from int to int (or if you know the numbers are within a limited range, you could just use an array) and iterate over the vector, increasing the_hashmap[current_number] by 1 for each number. Afterwards iterate through the hashmap to find its largest value (and the key belonging to it). This requires a hashmap datastructure though (unless you can use arrays which will also be faster), which isn't part of STL.
If you want to avoid sorting your vector v, use a map:
int max = 0;
int most_common = -1;
map<int,int> m;
for (vi = v.begin(); vi != v.end(); vi++) {
m[*vi]++;
if (m[*vi] > max) {
max = m[*vi];
most_common = *vi;
}
}
This requires more memory and has a very similar expected runtime. The memory required should be on the order of a full vector copy, less if there are many duplicate entries.
Try this
int FindMode(vector<int> value)
{
int index = 0;
int highest = 0;
for (unsigned int a = 0; a < value.size(); a++)
{
int count = 1;
int Position = value.at(a);
for (unsigned int b = a + 1; b < value.size(); b++)
{
if (value.at(b) == Position)
{
count++;
}
}
if (count >= index)
{
index = count;
highest = Position;
}
}
return highest;
}
This is how i did it:
int max=0,mostvalue=a[0];
for(i=0;i<a.size();i++)
{
co = (int)count(a.begin(), a.end(), a[i]);
if(co > max)
{ max = co;
mostvalue = a[i];
}
}
I just don't know how fast it is, i.e. O() ? If someone could calculate it and post it here that would be fine.
Here is an O(n) generic solution for finding the most common element in an iterator range. You use it simply by doing:
int commonest = most_common(my_vector.begin(), my_vector.end());
The value type is extracted from the iterator using iterator_traits<>.
template<class InputIt, class T = typename std::iterator_traits<InputIt>::value_type>
T most_common(InputIt begin, InputIt end)
{
std::map<T, int> counts;
for (InputIt it = begin; it != end; ++it) {
if (counts.find(*it) != counts.end()) {
++counts[*it];
}
else {
counts[*it] = 1;
}
}
return std::max_element(counts.begin(), counts.end(),
[] (const std::pair<T, int>& pair1, const std::pair<T, int>& pair2) {
return pair1.second < pair2.second;})->first;
}