All combinations of a binary number where only certain bits can change - c++

I was wondering if there is an algorithm to generate all the possible combinations of a binary number where only bits in certain positions can change, for example, we have the following bitstream, but only bits marked in the position x can change(this example has 8 places that can change to make different combinations, a total of 2^8):
x00x0000x000000x00x00x000x000x
one solution is to look at the number as an 8-bit number in the first place and just calculate all the combinations for xxxxxxxx.
However, this doesn't quite satisfy my needs, as I want to use the number in a Linear shift register (LFSR) later on, currently, I'm looking for an answer which utilizes std::bitset.

So, there is alsready an answer. But just dumped code, without any explanation. Not good. Anyway...
I would like to add an answer with a different approach, and explain the steps.
Basically, if you want to have all combinations of a binary number, then you could simply "count" or "increment by one". Example for a 3 bit value. This would be decimal 0, 1, 2, 3, 4, 5, 6, 7 and binary 000, 001, 010, 011, 100, 101, 110, 111. You see that it is simple counting.
If we think far back to school days, where we learned boolean algebra and a little bit of automata theory, then we rember how this counting operation is done on low level. We always flip the least significant bit, and, if there is a transition from 1 to 0, then we basically had an overflow and must also flip the next bit. That's the principle of a binary adder. We want to add always 1 in our example. So, add 1 to 0, result is 1, then no overflow. But add 1 to 1, result is 0, then we we have an overlow and must add 1 to the next bit. This will effectively flip the next bit, and so on and so on.
The advantage of this method is, that we do not always need to operate on all bits. So, the complexity is not O(n), but rather O(log n).
Additional advantage: It fits very good to your request for using a std::bitset.
3rd advantage, and maybe not that obvious: You can decouple the task of calculating the next combination from the rest of your program. No need to integrate your real-task-code in such a function. That is also the reason, why std::next_permutation is implemented like this.
And, the algorithms desribed above works for all values, no sorting or something necessary.
That part was for the algorithm you have asked for.
Next part is for your request that only certain bits can change. Of course, we need to specify these bits. And because your are working with std::bitset masking is no solution here. The better approach is to use indices. Meaning, give the bit positions of the bits that are allowed to be changed.
And then we can use the above described algorithm, with just one additional indirection. So, we do not use bits[pos], but bits[index[pos]].
The indices can easily be stored in a std::vector using an initializer list. We can also derive the indices-vector from a string or whatever. I used a std::string as example.
All the above will result in some short / compact code, with just a few lines and is easy to understand. I also added some driver code that makes use of this function.
Please see:
#include <iostream>
#include <vector>
#include <string>
#include <bitset>
#include <algorithm>
#include <cassert>
constexpr size_t BitSetSize = 32U;
void nextCombination(std::bitset<BitSetSize>& bits, const std::vector<size_t>& indices) {
for (size_t i{}; i < indices.size(); ++i) {
// Get the current index, and check, if it is valid
if (const size_t pos = indices[i]; pos < BitSetSize) {
// Flip bit at lowest positions
bits[pos].flip();
// If there is no transition of the just flipped bit, then stop
// If there is a transition from high to low, then we need to flip the next bit
if (bits.test(pos))
break;
}
}
}
// Some driver code
int main() {
// Use any kind of mechanism to indicate which index should be changed or not
std::string mask{ "x00x0000x000000x00x00x000x000x" };
// Here, we will store the indices
std::vector<size_t> index{};
// Populated the indices vector from the string
std::for_each(mask.crbegin(), mask.crend(), [&, i = 0U](const char c) mutable {if ('x' == c) index.push_back(i); ++i; });
// The bitset, for which we want to calculate the combinations
std::bitset<BitSetSize> bits(0);
// Play around
for (size_t combination{}; combination < (1 << (index.size())); ++combination) {
// This is the do something
std::cout << bits.to_string() << '\n';
// calculate the next permutation
nextCombination(bits, index);
}
return 0;
}
This software has been compiled wth MSVC 19 Community Edition using C++17
If you should have additional questions or need more clarifications, then I am happy to answer

The integers that satisfy the pattern can be enumerated by iterating with a "masked increment", which increments the variable bits but leaves the fixed bits the same. For convenience I will assume that the "fixed bits" are zero, but it they weren't it would still work with minor changes. mask is 1 for the fixed bits and 0 for the variable bits.
uint32_t x = 0;
do {
// use x
...
// masked increment
x = (x | mask) + 1 & ~mask;
} while (x != 0);
x | mask sets the fixed bits, so that the carry will "go through" the fixed bits. +1 increments the variable bits. &~mask cleans up the extra bits that were set, turning the fixed bits back into zeroes.
std::bitset cannot be incremented so it is difficult to use directly, but integers can be converted to std::bitset if necessary.

Something like this
// sample indexes
static const int indexes[8] = { 0, 4, 8, 11, 13, 16, 22, 25 };
std::bitset<32> clear_bit_n(std::bitset<32> number, int n)
{
return number.reset(indexes[n]);
}
std::bitset<32> set_bit_n(std::bitset<32> number, int n)
{
return number.set(indexes[n]);
}
void all_combinations(std::bitset<32> number, int n)
{
if (n == 8)
{
// do something with number
}
else
{
all_combinations(clear_bit_n(number, n), n + 1);
all_combinations(set_bit_n(number, n), n + 1);
}
}
all_combinations(std::bitset<32>(), 0);

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
{
int n, i, sum, j;
scanf("%d", &n);
sum=0;
for(i=0;i<n;i++)
scanf("%d", &num[i]);
for(i=0;i<n-1;i++)
{
for(j=i+1;j<n;j++)
{
sum+=(num[i]^num[j]);
}
}
printf("%d\n", sum);
return 0;
}
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
{
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
}
return res;
}
Live Demo.
Explanation:
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

Equilibrium index of an array of large numbers, how to prevent overflow?

Problem statement:
An equilibrium index of an array is an index into the array such that the sum of elements at lower indices is equal to the sum of elements at higher indices.
For example, in {-7, 1, 5, 2, -4, 3, 0}, 3 is an equilibrium index, because:
-7 + 1 + 5 = -4 + 3 + 0
Write a function that, given an vector of ints, returns its equilibrium index (if any). Assume that the vector may be very long.
Question:
All solutions (efficient), that I found are based on the fact that given the sum of all elements and current running sum of one part we can obtain via deduction sum of elements of residual part.
I don't think that solutions are correct, because if we provide large vector with MAX_INT elements, sum of elements will result on an overflow.
How issue with an overflow can be solved?
references on suggested solutions, all of them fail to address overflow issue
(I am referring only to C++ implementation, in Java as far as I know BigInteger class exist that solves it)
http://blog.codility.com/2011/03/solutions-for-task-equi.html
Supplementary material:
#include <algorithm>
#include <iostream>
#include <numeric>
#include <vector>
template <typename T>
std::vector<size_t> equilibrium(T first, T last)
{
typedef typename std::iterator_traits<T>::value_type value_t;
value_t left = 0;
value_t right = std::accumulate(first, last, value_t(0));
std::vector<size_t> result;
for (size_t index = 0; first != last; ++first, ++index)
{
right -= *first;
if (left == right)
{
result.push_back(index);
}
left += *first;
}
return result;
}
template <typename T>
void print(const T& value)
{
std::cout << value << "\n";
}
int main()
{
const int data[] = { -7, 1, 5, 2, -4, 3, 0 };
std::vector<size_t> indices(equilibrium(data, data + 7));
std::for_each(indices.begin(), indices.end(), print<size_t>);
}
The short answer is that ultimately it can't be completely cured/solved, unless you limit the number/magnitude of the inputs--not even with something like Java's BigInt (or equivalents for C++ such as gmp, NTL, etc.)
The problem is pretty simple: the memory in any computer is finite. There will always be some finite limit on the numbers we can represent. An arbitrary precision integer type can increase the limit to numbers far larger than most of use work with on a regular basis, but regardless of what the limit might be, there will always be dramatically larger numbers that can't be represented (at least without changing to some other representation--but if we're going to have precision to the units place for arbitrary numbers, there are distinct limits on how clever we can get in representing gargantuan numbers).
For the conditions given in the linked problem, the long long type in C and C++ is adequate. If we want to increase the limit to some ridiculous size with a solution in C++, it's pretty simple. Although they're not a required part of a C++ implementation, there are many arbitrary precision integer libraries available for C++.
I suppose there could be some way to compute an answer to this problem that doesn't involve actually summing the numbers--but at least at first glance, this idea doesn't seem very promising to me. The statement of the problem is specifically about computing sums. While you could certainly carry out various machinations to keep the summing from looking like summing, the fact is that the basic statement of the problem involves sums, which tends to suggest that solutions that don't involve sums may well be difficult to find.
Yes it is possible. Notice that if data[0] < data[len - 1], then data[1] shall belong to the "left" part; similarly if data[0] > data[len-1] then data[len-2] belongs to the "right" part. This observation allows an inductive proof of correctness of the following algorithm:
left_weight = 0; right_weight = 0
left_index = 0; right_index = 0
while left_index < right_index
if left_weight < right_weight
left_weight += data[left_index++];
else
right_weight += data[--right_index]
Still there is an accumulation, but it is easy to deal with by keeping track of imbalance and a boolean indicator of which side is heavier:
while left_index < right_index
if heavier_side == right
weight = data[left_index++]
else
weight = data[--right_index]
if weight < imbalance
imbalance = imbalance - weight
else
heavier_side = !heavier_side
imbalance = weight - imbalance
At least for unsigned data there is no possibility of overflow. Some tinkering might be required for signed values.

Recursion vs bitmasking for getting all combinations of vector elements

While practicing for programming competitions (like ACM, Code Jam, etc) I've met some problems that require me to generate all possible combinations of some vector elements.
Let's say that I have the vector {1,2,3}, I'd need to generate the following combinations (order is not important) :
1
2
3
1 2
1 3
2 3
1 2 3
So far I've done it with the following code :
void getCombinations(int a)
{
printCombination();
for(int j=a;j<vec.size();j++)
{
combination.pb(vec.at(j));
getCombinations(j+1);
combination.pop_back();
}
}
Calling getCombinations(0); does the job for me. But is there a better (faster) way? I've recently heard of bitmasking. As I understood it's simply for all numbers between 1 and 2^N-1 I turn that number into a binary where the 1s and 0s would represent whether or not that element is included in the combinations.
How do I implement this efficiently though? If I turn every number into binary the standard way (by dividing with 2 all the time) and then check all the digits, it seems to waste a lot of time. Is there any faster way? Should I keep on using the recursion (unless I run into some big numbers where recursion can't do the job (stack limit))?
The number of combinations you can get is 2^n, where n is the number of your elements. You can interpret every integer from 0 to 2^n -1 as a mask. In your example (elements 1, 2, 3) you have 3 elements and the masks would therefore be 000, 001, 010, 011, 100, 101, 110, and 111. Let every place in the mask represent one of your elements. For place that has a 1, take the corresponding element, otherwise if the place contains a 0, leave the element out. For example the the number 5 would be the mask 101 and it would generate this combination: 1, 3.
If you want to have a fast and relatively short code for it, you could do it like this:
#include <cstdio>
#include <vector>
using namespace std;
int main(){
vector<int> elements;
elements.push_back(1);
elements.push_back(2);
elements.push_back(3);
// 1<<n is essentially pow(2, n), but much faster and only for integers
// the iterator i will be our mask i.e. its binary form will tell us which elements to use and which not
for (int i=0;i<(1<<elements.size());++i){
printf("Combination #%d:", i+1);
for (int j=0;j<elements.size();++j){
// 1<<j shifts the 1 for j places and then we check j-th binary digit of i
if (i&(1<<j)){
printf(" %d", elements[j]);
}
}
printf("\n");
}
return 0;
}

What is the most efficient way to generate unique pseudo-random numbers? [duplicate]

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}