What is performance-wise the best way to generate random bools? - c++

I need to generate random Boolean values on a performance-critical path.
The code which I wrote for this is
std::random_device rd;
std::uniform_int_distribution<> randomizer(0, 1);
const int val randomizer(std::mt19937(rd()));
const bool isDirectionChanged = static_cast<bool>(val);
But do not think that this is the best way to do this as I do not like doing static_cast<bool>.
On the web I have found a few more solutions
1. std::bernoulli_distribution
2. bool randbool = rand() & 1; Remember to call srand() at the beginning.

For the purpose of performance, at a price of less "randomness" than e.g. std::mt19937_64, you can use Xorshift+ to generate 64-bit numbers and then use the bits of those numbers as pseudo-random booleans.
Quoting the Wikipedia:
This generator is one of the fastest generators passing BigCrush
Details: http://xorshift.di.unimi.it/ . There is a comparison table in the middle of the page, showing that mt19937_64 is 2 times slower and is systematic.
Below is sample code (the real code should wrap it in a class):
#include <cstdint>
#include <random>
using namespace std;
random_device rd;
/* The state must be seeded so that it is not everywhere zero. */
uint64_t s[2] = { (uint64_t(rd()) << 32) ^ (rd()),
(uint64_t(rd()) << 32) ^ (rd()) };
uint64_t curRand;
uint8_t bit = 63;
uint64_t xorshift128plus(void) {
uint64_t x = s[0];
uint64_t const y = s[1];
s[0] = y;
x ^= x << 23; // a
s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
return s[1] + y;
}
bool randBool()
{
if(bit >= 63)
{
curRand = xorshift128plus();
bit = 0;
return curRand & 1;
}
else
{
bit++;
return curRand & (1<<bit);
}
}

Some quick benchmarks (code):
647921509 RandomizerXorshiftPlus
821202158 BoolGenerator2 (reusing the same buffer)
1065582517 modified Randomizer
1130958451 BoolGenerator2 (creating a new buffer as needed)
1140139042 xorshift128plus
2738780431 xorshift1024star
4629217068 std::mt19937
6613608092 rand()
8606805191 std::bernoulli_distribution
11454538279 BoolGenerator
19288820587 std::uniform_int_distribution
For those who want ready-to-use code, I present XorShift128PlusBitShifterPseudoRandomBooleanGenerator, a tweaked version of RandomizerXorshiftPlus from the above link. On my machine, it is about as fast as #SergeRogatch's solution, but consistently about 10-20% faster when the loop count is high (≳100,000), and up to ~30% slower with smaller loop counts.
class XorShift128PlusBitShifterPseudoRandomBooleanGenerator {
public:
bool randBool() {
if (counter == 0) {
counter = sizeof(GeneratorType::result_type) * CHAR_BIT;
random_integer = generator();
}
return (random_integer >> --counter) & 1;
}
private:
class XorShift128Plus {
public:
using result_type = uint64_t;
XorShift128Plus() {
std::random_device rd;
state[0] = rd();
state[1] = rd();
}
result_type operator()() {
auto x = state[0];
auto y = state[1];
state[0] = y;
x ^= x << 23;
state[1] = x ^ y ^ (x >> 17) ^ (y >> 26);
return state[1] + y;
}
private:
result_type state[2];
};
using GeneratorType = XorShift128Plus;
GeneratorType generator;
GeneratorType::result_type random_integer;
int counter = 0;
};

A way would be to just generate a unsigned long long for every 64 random calls as stated in the comments. An example:
#include <random>
class Randomizer
{
public:
Randomizer() : m_rand(0), counter(0), randomizer(0, std::numeric_limits<unsigned long long>::max()) {}
bool RandomBool()
{
if (!counter)
{
m_rand = randomizer(std::mt19937(rd()));
counter = sizeof(unsigned long long) * 8;
}
return (m_rand >> --counter) & 1;
}
private:
std::random_device rd;
std::uniform_int_distribution<unsigned long long> randomizer;
unsigned long long m_rand;
int counter;
};

I would prefill a (long enough) (circular) buffer of 64bit random values, and then take very quickly one bit at a time when in need of a boolean random value
#include <stdint.h>
class BoolGenerator {
private:
const int BUFFER_SIZE = 65536;
uint64_t randomBuffer[BUFFER_SIZE];
uint64_t mask;
int counter;
void advanceCounter {
counter++;
if (counter == BUFFER_SIZE) {
counter = 0;
}
}
public:
BoolGenerator() {
//HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
mask = 1;
counter = 0;
}
bool generate() {
mask <<= 1;
if (!mask) { //After 64 shifts the mask becomes zero
mask = 1;//reset mask
advanceCounter();//get the next value in the buffer
}
return randomBuffer[counter] & mask;
}
}
Of course the class can be made general to the buffer size, the random generator, the base type (doesn't necessarily have to be uint64_t) etc.
Accessing the buffer only once every 64 calls:
#include <stdint.h> //...and much more
class BoolGenerator {
private:
static const int BUFFER_SIZE = 65536;
uint64_t randomBuffer[BUFFER_SIZE];
uint64_t currValue;
int bufferCounter;
int bitCounter;
void advanceBufferCounter() {
bufferCounter++;
if (bufferCounter == BUFFER_SIZE) {
bufferCounter = 0;
}
}
void getNextValue() {
currValue = randomBuffer[bufferCounter];
bitCounter = sizeof(uint64_t) * 8;
advanceBufferCounter();
}
//HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
void initializeBuffer() {
//Anything will do, taken from here: http://stackoverflow.com/a/19728404/2436175
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<uint64_t> uni(0,std::numeric_limits<uint64_t>::max());
for (int i = 0; i < BUFFER_SIZE; i++ ) {
randomBuffer[i] = uni(rng);
}
}
public:
BoolGenerator() {
initializeBuffer();
bufferCounter = 0;
getNextValue();
}
bool generate() {
if (!bitCounter) {
getNextValue();
}
//A variation of other methods seen around
bitCounter--;
bool retVal = currValue & 0x01;
currValue >>= 1;
return retVal;
}
};

Unless you have further constraints on the randomness you need, the fastest way to generate a random bool is:
bool RandomBool() { return false; }
To be more specific, there are thousands of ways to generate random boolean numbers, all satisfying different constraints, and many of them do not deliver "truly" random numbers (that includes all the other answers so far). The word "random" alone does not tell anyone what properties you really need.

If performance is your only criterion, then the answer is:
bool get_random()
{
return true; // chosen by fair coin flip.
// guaranteed to be random.
}
Unfortunately, the entropy of this random number is zero, but the performance is quite fast.
Since I suspect that this random number generator is not very useful to you, you will need to quantify how random you want your booleans to be. How about a cycle length of 2048? One million? 2^19937-1? Until the end of the universe?
I suspect that, since you explicitly stated that performance is your utmost concern, then a good old fashioned linear congruential generator might be "good enough". Based on this article, I'm guessing that this generator's period is around 32*((2^31)-5), or about 68 trillion iterations. If that's not "good enough", you can drop in any C++11 compatible generator you like instead of minstd_rand.
For extra credit, and a small performance hit, modify the below code to use the biased coin algorithm to remove bias in the generator.
#include <iostream>
#include <random>
bool get_random()
{
typedef std::minstd_rand generator_type;
typedef generator_type::result_type result_type;
static generator_type generator;
static unsigned int bits_remaining = 0;
static result_type random_bits;
if ( bits_remaining == 0 )
{
random_bits = generator();
bits_remaining = sizeof( result_type ) * CHAR_BIT - 1;
}
return ( ( random_bits & ( 1 << bits_remaining-- ) ) != 0 );
}
int main()
{
for ( unsigned int i = 0; i < 1000; i++ )
{
std::cout << " Choice " << i << ": ";
if ( get_random() )
std::cout << "true";
else
std::cout << "false";
std::cout << std::endl;
}
}

if performance is important, perhaps it's a good idea to generate a 32 bit random number and use each separate bit of it, something like this:
bool getRandBool() {
static uint32_t randomnumber;
static int i=0;
if (i==0) {
randomnumber = <whatever your favorite randonnumbergenerator is>;
i=32;
}
return (randomnumber & 1<<--i);
}
this way the generation only impacts every 32th call

iI think that best way is an using of precalculated random array:
uint8_t g_rand[UINT16_MAX];
bool InitRand()
{
for (size_t i = 0, n = UINT16_MAX; i < n; ++i)
g_rand[i] = ::rand() & 1;
return true;
}
bool g_inited = InitRand();
inline const uint8_t * Rand()
{
return g_rand + (::rand()&INT16_MAX);
}
It using to fill some array dst[size]:
const size_t size = 10000;
bool dst[size];
for (size_t i = 0; i < size; i += INT16_MAX)
memcpy(dst + i, Rand(), std::min<size_t>(INT16_MAX, size - col));
Of course you can initialize pre-calculated array with using of another random function.

Apparently I have to add another answer. Just figured out that starting with Ivy Bridge architecture Intel added RdRand CPU instruction and AMD added it later in June 2015. So if you are targeting a processor that is new enough and don't mind using (inline) assembly, the fastest way to generate random bools should be in calling RdRand CPU instruction to get a 64-bit random number as described here (scroll to approximately the middle of the page for code examples) (at that link there is also a code example for checking the current CPU for support of RdRand instruction, and see also the Wikipedia for an explanation of how to do this with CPUID instruction), and then use the bits of that number for booleans as described in my Xorshit+ based answer.

Related

How can I approach this CP task?

The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}

c++ My random gen randomly gens something but it does it in the same order every time u run it

So ive made a MAC Address generator. But the random part is very strange. It randomly generates a number that i use to choose something from an array. But each time you run the exe. It gens the same number.
Here is my code
#include <random>
#include <string>
//Mac Addr example -> 82-F5-4D-72-C1-EA
//6 two char sets
//Dont include spaces/dashes/dots
std::string chars[] = { "A","B","C","D","E","F" };
int nums[] = { 0,1,2,3,4,5,6,7,8,9 };
std::string GenMacAddr()
{
std::string final;
std::string CharSet;
int choice;
for (int i = 0; i < 6; i++) {
choice = 1 + rand() % 4;
if (choice == 1) { //Char Set only int
for (int x = 0; x < 2; x++) { //Makes action happen twice
final += std::to_string(nums[rand()%10]);
}
}
else if (choice == 2) { //Char set only str
for (int x = 0; x < 2; x++) { //Makes action happen twice
final += chars[rand() % 6];
}
}
else if (choice == 3) {
final += chars[rand() % 6];
final += std::to_string(nums[rand() % 10]);
}
else if (choice == 4) {
final += std::to_string(nums[rand() % 10]);
final += chars[rand() % 6] ;
}
}
return final;
}
rand() is a deterministic random number generator . In order to achieve actual pseudo-random results you should first seed it with something like srand(time(NULL)) .
If you look around this you will realize that this is a bad approach and you should instead give up rand() altogether , instead use <random> from C++11 . Stephan T. Lavavej has a really nice talk about it , you should see it here .
Here is also the code snippet he recommends from that talk .
#include <random>
#include <iostream>
int main() {
std::random_device random_dev; // (Non?) deterministic random number generator
std::mt19937 mers_t(random_dev()); // Seed mersenne twister with it .
std::uniform_int_distribution<int> distribution(0, 100); // Bound the output.
// Print a random integer in the range [0,100] ( included ) .
std::cout << distribution(mers_t) << '\n';
}
EDIT:
As François noted, std::random_device isn't required to be non-deterministic and it's actually implementation dependent.
One indication to tell if it is or not is by checking the value of entropy() method call but then again some implementations return just a fixed value. In that case you might consider using std::chrono to generate a seed the way Ted describes.
rand() is a pseudo random number generator. It will generate numbers according to an algorithm designed to have a long period (before it starts repeating itself) - but it needs a starting point. This is called the seed. You seed rand() with the srand() function and you should only seed it once during the program's execution. Seeding is often done with time but since time commonly returns whole seconds (since the epoch) you risk using the same seed if you start the program more than once (within a second).
You could instead use std::random_device to generate the seed if it has entropy and use a time based seed only as a fallback.
Example:
#include <cstdlib>
#include <chrono>
#include <iostream>
#include <random>
#include <string>
#include <type_traits>
unsigned seed() { // a function to generate a seed
std::random_device rd;
if(rd.entropy() > 0.) return rd(); // if random_device has entropy, use it
// fallback, use duration since the epoch
auto dse = (std::chrono::steady_clock::now() -
std::chrono::steady_clock::time_point{}).count();
return static_cast<std::make_unsigned_t<decltype(dse)>>(dse);
}
std::string GenMacAddr() {
static const char chars[] = {'0','1','2','3','4','5','6','7',
'8','9','A','B','C','D','E','F'};
std::string result(6 * 2, ' ');
for(char& ch : result) ch = chars[std::rand() % std::size(chars)];
return result;
}
int main() {
std::srand(seed()); // seed rand()
// generate 10 mac addresses
for(int i = 0; i < 10; ++i) std::cout << GenMacAddr() << '\n';
}
That said, you could use one of the more modern pseudo random number generators, like std::mt19937, instead of rand() and use std::uniform_int_distribution instead of the modulus operation:
template<class PRNG = std::mt19937>
auto& prng() {
// same seed() function as in the previous example:
thread_local PRNG prng_instance(seed());
return prng_instance;
}
std::string GenMacAddr() {
static const char chars[] = {'0','1','2','3','4','5','6','7',
'8','9','A','B','C','D','E','F'};
thread_local std::uniform_int_distribution<unsigned> dist(0, std::size(chars) - 1);
std::string result(6 * 2, ' ');
for(char& ch : result) ch = chars[dist(prng())];
return result;
}

What is the fastest method to create uniform random numbers in c++

I need to generate uniform random numbers in a for loop. for loop goes for 1000000 numbers. There is another for loop which goes to 2000 inside it. So I generate 2*10^9 uniform random numbers. I use below method:
#include <random>
double zeta;
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
auto uniform_rand = bind(uniform_real_distribution<double>(0,1), mt19937(seed));
for(int j=0; j<1000000; j++)
for(int i=0; i<2000; i++)
zeta=-eta/2.0+uniform_rand()*eta; // in the range (-eta/2, +eta/2)
theta[i]+=zeta;
end
end
It's almost the same as yours. I just don't see any need for a binder or a lambda.
EDIT: I also changed the generator from mt19937 to minstd_rand which has made the code 88 times faster. With mt19937 the code has aproximately the same performance as the one in the question.
#include <random>
int main() {
/*std::mt19937 gen(std::random_device{}());*/
// Changing mt19937 to minstd_rand makes the code run 88 times faster!
std::minstd_rand gen(std::random_device{}());
std::uniform_real_distribution<double> dist(0, 1);
for(unsigned int i = 0; i < 1000000; ++i) {
for(unsigned int j = 0; j < 2000; ++j) {
double anotherRandomNumber = dist(gen);
// Do whatever you want with generated random number.
}
}
}
Use this algorithm from here:
uint64_t s[2] = { 0x41, 0x29837592 };
static inline uint64_t rotl(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
uint64_t next(void) {
const uint64_t s0 = s[0];
uint64_t s1 = s[1];
const uint64_t result = s0 + s1;
s1 ^= s0;
s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
s[1] = rotl(s1, 36); // c
return result;
}
double uniform() {
return next()*(1.0/18446744073709551616.0);
}
This is 4 times faster than your example on my machine.
Note: you need to seed s, perhaps with std::random_device.

c/c++ - random number between 0 and 1 without using rand() [duplicate]

I want to generate (pseudo) random numbers between 0 and some integer. I don't mind if they aren't too random. I have access to the current time of the day but not the rand function. Can anyone think of a sufficiently robust way to generate these? Perhaps, discarding some bits from time of day and taking modulo my integer or something?
I am using c.
If you're after an ultra-simple pseudo-random generator, you can just use a Linear Feedback shift Register.
The wikipedia article has some code snippets for you to look at, but basically the code for a 16-bit generator will look something like this (lightly massaged from that page...)
unsigned short lfsr = 0xACE1u;
unsigned bit;
unsigned rand()
{
bit = ((lfsr >> 0) ^ (lfsr >> 2) ^ (lfsr >> 3) ^ (lfsr >> 5) ) & 1;
return lfsr = (lfsr >> 1) | (bit << 15);
}
For "not too random" integers, you could start with the current UNIX time, then use the recursive formula r = ((r * 7621) + 1) % 32768;. The nth random integer between 0 (inclusive) and M (exclusive) would be r % M after the nth iteration.
This is called a linear congruential generator.
The recursion formula is what bzip2 uses to select the pivot in its quicksort implementation. I wouldn't know about other purposes, but it works pretty well for this particular one...
Look at implementing a pseudo-random generator (what's "inside" rand()) of your own, for instance the Mersenne twister is highly-regarded.
#include <chrono>
int get_rand(int lo, int hi) {
auto moment = std::chrono::steady_clock::now().time_since_epoch().count();
int num = moment % (hi - lo + 1);
return num + lo;
}
The only "robust" (not easily predictable) way of doing this is writing your own pseudo-random number generator and seeding it with the current time. Obligatory wikipedia link: http://en.wikipedia.org/wiki/Pseudorandom_number_generator
You can get the "Tiny Mersenne Twister" here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html
it is pure c and simple to use. E.g. just using time:
#include "tinymt32.h"
// And if you can't link:
#include "tinymt32.c"
#include <time.h>
#include <stdio.h>
int main(int argc, const char* argv[])
{
tinymt32_t state;
uint32_t seed = time(0);
tinymt32_init(&state, seed);
for (int i=0; i<10; i++)
printf("random number %d: %u\n", i, (unsigned int)tinymt32_generate_uint32(&state));
}
The smallest and simple random generator which work with ranges is provided below with fully working example.
unsigned int MyRand(unsigned int start_range,unsigned int end_range)
{
static unsigned int rand = 0xACE1U; /* Any nonzero start state will work. */
/*check for valid range.*/
if(start_range == end_range) {
return start_range;
}
/*get the random in end-range.*/
rand += 0x3AD;
rand %= end_range;
/*get the random in start-range.*/
while(rand < start_range){
rand = rand + end_range - start_range;
}
return rand;
}
int main(void)
{
int i;
for (i = 0; i < 0xFF; i++)
{
printf("%u\t",MyRand(10,20));
}
return 0;
}
If you're not generating your numbers too fast (*1) and your upper limit is low enough (*2) and your "time of day" includes nanoseconds, just use those nanoseconds.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int nanorand(void) {
struct timespec p[1];
clock_gettime(CLOCK_MONOTONIC, p);
return p->tv_nsec % 1000;
}
int main(void) {
int r, x;
for (;;) {
r = nanorand();
do {
printf("please type %d (< 50 quits): ", r);
fflush(stdout);
if (scanf("%d", &x) != 1) exit(EXIT_FAILURE);
} while (x != r);
if (r < 50) break;
}
puts("");
return 0;
}
And a sample run ...
please type 769 (< 50 quits): 769
please type 185 (< 50 quits): 185
please type 44 (< 50 quits): 44
(*1) if you're using them interactively, one at a time
(*2) if you want numbers up to about 1000
You can write your own rand() function. Like:
Method 1: Using the Concept of static variable:
example code:
int random_number_gen(int min_range, int max_range){
static int rand_number = 199198; // any random number
rand_number = ((rand_number * rand_number) / 10 ) % 9890;
return rand_number % (max_range+1-min_range) + min_range ;
}
Method 2. Using a random/unique value, for example, the current time in microseconds.
#include<time.h>
#include <chrono>
using namespace std;
uint64_t timeSinceEpochMicrosec() {
using namespace std::chrono;
return duration_cast<microseconds>(system_clock::now().time_since_epoch()).count();
}
int random_number_gen(int min_range, int max_range){
long long int current_time = timeSinceEpochMicrosec();
int current_time_in_sec = current_time % 10000000;
int rand_number = current_time_in_sec % (max_range+1-min_range) + min_range ;
return rand_number;
}
import java.io.*;
public class random{
public static class p{
}
static long reg=0;
static long lfsr()
{
if(reg==0)
{
reg=145896027340307l;
}
long bit=(reg>>0^reg>>2^reg>>3^reg>>5)&1;
reg=reg>>1|bit<<62;
return reg;
}
static long getRand()
{
String s=String.valueOf(new p());
//System.out.println(s);
long n=0;
lfsr();
for(int i=0;i<s.length();i++)
{
n=n<<8|+s.charAt(i);
}
System.out.print(n+" "+System.currentTimeMillis()+" "+reg+" ");
n=n^System.currentTimeMillis()^reg;
return n;
}
public static void main(String args[])throws IOException
{
for(int i=0;i<400;i++)
{
System.out.println(getRand());
}
}
}
This is a random number generator where it is guaranteed that the sequence never repeats itself. I have paired time with object value (randomly put by java) with LFSR.
Advantages:
The sequence doesn't repeat itself
The sequence is new on every run
Disadvantages:
Only compatible with java. In C++, new object that is created is same on every run.
But there too time and LFSR parameters would put in enough randomness
It is slower than most PRNGs as an object needs to be created everytime a number is needed
#include<time.h>
int main(){
int num;
time_t sec;
sec=time(NULL);
printf("Enter the Range under which you want Random number:\n");
scanf("%d",&num);
if(num>0)
{
for(;;)
{
sec=sec%3600;
if(num>=sec)
{
printf("%ld\n",sec);
break;
}
sec=sec%num;
}
}
else
{
printf("Please Enter Positive Value!\n");
}
return 0;
}
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
int main()
{
unsigned int x,r,i;
// no of random no you want to generate
scanf("%d",&x);
// put the range of random no
scanf("%d",&r);
unsigned int *a=(unsigned int*)malloc(sizeof(unsigned int)*x);
for(i=0;i<x;i++)
printf("%d ",(a[i]%r)+1);
free(a);
getch();
return 0;
}
One of the simplest random number generator which not return allways the same value:
uint16_t simpleRand(void)
{
static uint16_t r = 5531; //dont realy care about start value
r+=941; //this value must be relative prime to 2^16, so we use all values
return r;
}
You can maybe get the time to set the start value if you dont want that the sequence starts always with the same value.

Generate all combinations in bit version

I'd like to generate all possible combination (without repetitions) in bit representation. I can't use any library like boost or stl::next_combination - it has to be my own code (computation time is very important).
Here's my code (modified from ones StackOverflow user):
int combination = (1 << k) - 1;
int new_combination = 0;
int change = 0;
while (true)
{
// return next combination
cout << combination << endl;
// find first index to update
int indexToUpdate = k;
while (indexToUpdate > 0 && GetBitPositionByNr(combination, indexToUpdate)>= n - k + indexToUpdate)
indexToUpdate--;
if (indexToUpdate == 1) change = 1; // move all bites to the left by one position
if (indexToUpdate <= 0) break; // done
// update combination indices
new_combination = 0;
for (int combIndex = GetBitPositionByNr(combination, indexToUpdate) - 1; indexToUpdate <= k; indexToUpdate++, combIndex++)
{
if(change)
{
new_combination |= (1 << (combIndex + 1));
}
else
{
combination = combination & (~(1 << combIndex));
combination |= (1 << (combIndex + 1));
}
}
if(change) combination = new_combination;
change = 0;
}
where n - all elements, k - number of elements in combination.
GetBitPositionByNr - return position of k-th bit.
GetBitPositionByNr(13,2) = 3 cause 13 is 1101 and second bit is on third position.
It gives me correct output for n=4, k=2 which is:
0011 (3 - decimal representation - printed value)
0101 (5)
1001 (9)
0110 (6)
1010 (10)
1100 (12)
Also it gives me correct output for k=1 and k=4, but gives me wrong outpu for k=3 which is:
0111 (7)
1011 (11)
1011 (9) - wrong, should be 13
1110 (14)
I guess the problem is in inner while condition (second) but I don't know how to fix this.
Maybe some of you know better (faster) algorithm to do want I want to achieve? It can't use additional memory (arrays).
Here is code to run on ideone: IDEONE
When in doubt, use brute force. Alas, generate all variations with repetition, then filter out the unnecessary patterns:
unsigned bit_count(unsigned n)
{
unsigned i = 0;
while (n) {
i += n & 1;
n >>= 1;
}
return i;
}
int main()
{
std::vector<unsigned> combs;
const unsigned N = 4;
const unsigned K = 3;
for (int i = 0; i < (1 << N); i++) {
if (bit_count(i) == K) {
combs.push_back(i);
}
}
// and print 'combs' here
}
Edit: Someone else already pointed out a solution without filtering and brute force, but I'm still going to give you a few hints about this algorithm:
most compilers offer some sort of intrinsic population count function. I know of GCC and Clang which have __builtin_popcount(). Using this intrinsic function, I was able to double the speed of the code.
Since you seem to be working on GPUs, you can parallelize the code. I have done it using C++11's standard threading facilities, and I've managed to compute all 32-bit repetitions for arbitrarily-chosen popcounts 1, 16 and 19 in 7.1 seconds on my 8-core Intel machine.
Here's the final code I've written:
#include <vector>
#include <cstdio>
#include <thread>
#include <utility>
#include <future>
unsigned popcount_range(unsigned popcount, unsigned long min, unsigned long max)
{
unsigned n = 0;
for (unsigned long i = min; i < max; i++) {
n += __builtin_popcount(i) == popcount;
}
return n;
}
int main()
{
const unsigned N = 32;
const unsigned K = 16;
const unsigned N_cores = 8;
const unsigned long Max = 1ul << N;
const unsigned long N_per_core = Max / N_cores;
std::vector<std::future<unsigned>> v;
for (unsigned core = 0; core < N_cores; core++) {
unsigned long core_min = N_per_core * core;
unsigned long core_max = core_min + N_per_core;
auto fut = std::async(
std::launch::async,
popcount_range,
K,
core_min,
core_max
);
v.push_back(std::move(fut));
}
unsigned final_count = 0;
for (auto &fut : v) {
final_count += fut.get();
}
printf("%u\n", final_count);
return 0;
}