Various questions about RSA encryption

Various questions about RSA encryption - c++

I'm currently writing my own ASE/RSA encryption program in C++ for Unix. I've been going through the literature for about a week now, and I've started to wrap my head around it all but I'm still left with some pressing questions:
1) Based on my understanding, an RSA key in its most basic form is the combination of the product of the two primes (R) used and the exponents. It's obvious to me that storing the key in such a form in plaintext would defeat the purpose of encryption anything at all. Therefore, in what form can I store my generated public and private keys? Ask the user for a password and do some "simple" shift/replacing on the individual digits of the key with an ASCII table? Or is there some other standard I haven't run across? Also, when the keys are generated, are R and the respective exponent simply stored sequentially? i.e. ##primeproduct####exponent##? In that case, how would a decryption algorithm parse the key into the two separate values?
2) How would I go about programatically generating the private exponent, given that I've decided to use 65537 as my public exponent for all encryptions? I've got the equation P*Q = 1mod(M), where P and Q and the exponents and M is the result of Euler's Totient Function. Is this simply a matter of generating random numbers and testing their relative primality to the public exponent until you hit pay dirt? I know you can't simply start from 1 and increment until you find such a number, as anyone could simply do the same thing and get your private exponent themselves.
3) When generating the character equivalence set, I understand that the numbers used in the set can't be must be less than and relatively prime to P*Q. Again, this is a matter of testing relative primality of numbers to P*Q. Is the speed of testing relative primality independent of the size of the numbers you're working with? Or are special algorithms necessary?
Thanks in advance to anyone who takes the time to read and answer, cheers!

There are some standard formats for storing/exchanging RSA keys such as RFC 3447. For better or worse, most (many, anyway) use ASN.1 encoding, which adds more complexity than most people like, all by itself. A few use Base64 encoding, which is a lot easier to implement.
As far as what constitutes a key goes: in its most basic form, you're correct; the public key includes the modulus (usually called n) and an exponent (usually called e).
To compute a key pair, you start from two large prime numbers, usually called p and q. You compute the modulus n as p * q. You also compute a number (often called r) that's (p-1) * (q-1).
e is then a more or less randomly chosen number that's prime relative to r. Warning: you don't want e to be really small though -- log(e) >= log(n)/4 as a bare minimum.
You then compute d (the private decryption key) as a number satisfying the relation:
d * e = 1 (mod r)
You typically compute this using Euclid's algorithm, though there are other options (see below). Again, you don't want d to be really small either, so if it works out to a really small number, you probably want to try another value for e, and compute a new d to match.
There is another way to compute your e and d. You can start by finding some number K that's congruent to 1 mod r, then factor it. Put the prime factors together to get two factors of roughly equal size, and use them as e and d.
As far as an attacker computing your d goes: you need r to compute this, and knowing r depends on knowing p and q. That's exactly why/where/how factoring comes into breaking RSA. If you factor n, then you know p and q. From them, you can find r, and from r you can compute the d that matches a known e.
So, let's work through the math to create a key pair. We're going to use primes that are much too small to be effective, but should be sufficient to demonstrate the ideas involved.
So let's start by picking a p and q (of course, both need to be primes):
p = 9999991
q = 11999989
From those we compute n and r:
n = 119999782000099
r = 119999760000120
Next we need to either pick e or else compute K, then factor it to get e and d. For the moment, we'll go with your suggestion of e=65537 (since 65537 is prime, the only possibility for it and r not being relative primes would be if r was an exact multiple of 65537, which we can verify is not the case quite easily).
From that, we need to compute our d. We can do that fairly easily (though not necessarily very quickly) using the "Extended" version of Euclid's algorithm, (as you mentioned) Euler's Totient, Gauss' method, or any of a number of others.
For the moment, I'll compute it using Gauss' method:
template <class num>
num gcd(num a, num b) {
num r;
while (b > 0) {
r = a % b;
a = b;
b = r;
}
return a;
}
template <class num>
num find_inverse(num a, num p) {
num g, z;
if (gcd(a, p) > 1) return 0;
z = 1;
while (a > 1) {
z += p;
if ((g=gcd(a, z))> 1) {
a /= g;
z /= g;
}
}
return z;
}
The result we get is:
d = 38110914516113
Then we can plug these into an implementation of RSA, and use them to encrypt and decrypt a message.
So, let's encrypt "Very Secret Message!". Using the e and n given above, that encrypts to:
74603288122996
49544151279887
83011912841578
96347106356362
20256165166509
66272049143842
49544151279887
22863535059597
83011912841578
49544151279887
96446347654908
20256165166509
87232607087245
49544151279887
68304272579690
68304272579690
87665372487589
26633960965444
49544151279887
15733234551614
And, using the d given above, that decrypts back to the original. Code to do the encryption/decryption (using hard-coded keys and modulus) looks like this:
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <functional>
typedef unsigned long long num;
const num e_key = 65537;
const num d_key = 38110914516113;
const num n = 119999782000099;
template <class T>
T mul_mod(T a, T b, T m) {
if (m == 0) return a * b;
T r = T();
while (a > 0) {
if (a & 1)
if ((r += b) > m) r %= m;
a >>= 1;
if ((b <<= 1) > m) b %= m;
}
return r;
}
template <class T>
T pow_mod(T a, T n, T m) {
T r = 1;
while (n > 0) {
if (n & 1)
r = mul_mod(r, a, m);
a = mul_mod(a, a, m);
n >>= 1;
}
return r;
}
int main() {
std::string msg = "Very Secret Message!";
std::vector<num> encrypted;
std::cout << "Original message: " << msg << '\n';
std::transform(msg.begin(), msg.end(),
std::back_inserter(encrypted),
[&](num val) { return pow_mod(val, e_key, n); });
std::cout << "Encrypted message:\n";
std::copy(encrypted.begin(), encrypted.end(), std::ostream_iterator<num>(std::cout, "\n"));
std::cout << "\n";
std::cout << "Decrypted message: ";
std::transform(encrypted.begin(), encrypted.end(),
std::ostream_iterator<char>(std::cout, ""),
[](num val) { return pow_mod(val, d_key, n); });
std::cout << "\n";
}
To have even a hope of security, you need to use a much larger modulus though--hundreds of bits at the very least (and perhaps a thousand or more for the paranoid). You could do that with a normal arbitrary precision integer library, or routines written specifically for the task at hand. RSA is inherently fairly slow, so at one time most implementations used code with lots of hairy optimization to do the job. Nowadays, hardware is fast enough that you can probably get away with a fairly average large-integer library fairly easily (especially since in real use, you only want to use RSA to encrypt/decrypt a key for a symmetrical algorithm, not to encrypt the raw data).
Even with a modulus of suitable size (and the code modified to support the large numbers needed), this is still what's sometimes referred to as "textbook RSA", and it's not really suitable for much in the way of real encryption. For example, right now, it's encrypting one byte of the input at a time. This leaves noticeable patterns in the encrypted data. It's trivial to look at the encrypted data above and see than the second and seventh words are identical--because both are the encrypted form of e (which also occurs a couple of other places in the message).
As it stands right now, this can be attacked as a simple substitution code. e is the most common letter in English, so we can (correctly) guess that the most common word in the encrypted data represents e (and relative frequencies of letters in various languages are well known). Worse, we can also look at things like pairs and triplets of letters to improve the attack. For example, if we see the same word twice in succession in the encrypted data, we know we're seeing a double letter, which can only be a few letters in normal English text. Bottom line: even though RSA itself can be quite strong, the way of using it shown above definitely is not.
To prevent that problem, with a (say) 512-bit key, we'd also process the input in 512-bit chunks. That means we only have a repetition if there are two places in the original input that go for 512 bits at a time that are all entirely identical. Even if that happens, it's relatively difficult to guess that that would be, so although it's undesirable, it's not nearly as vulnerable as with the byte-by-byte version shown above. In addition, you always want to pad the input to a multiple of the size being encrypted.
Reference
https://crypto.stackexchange.com/questions/1448/definition-of-textbook-rsa

Related

Problem with numbers and power of numbers

Problem:
In a given range (a, b) ( a <= b, 2 <= a, b <= 1000000 ) find all natural numbers that can be expressed in format x ^ n ( x and n are natural numbers ). If there are more than one possibility to present expressed number, present it with a bigger exponential value.
U1.txt
Screen
40 110
49 = 7^2; 64 = 2^6; 81 = 3^4; 100 = 10^2;
#include <iostream>
#include <fstream>
#include <cmath>
int Power(int number, int base);
int main()
{
int a, b;
std::ifstream fin("U1.txt");
fin >> a >> b;
fin.close();
for (int i = a; i <= b; i++)
{
int max_power = 0;
int min_base = 10;
bool found = false;
for (int j = 2; j <= 10; j++)
{
int power = Power(i, j);
if (power > 0)
{
if (max_power < power) { max_power = power; }
if (min_base > j) { min_base = j; }
found = true;
}
}
if (found)
{
std::cout << i << " = " << min_base << " ^ " << max_power << "; ";
}
}
return 0;
}
int Power(int number, int base)
{
int power = (log(number) / log(base) + 0.5);
if (pow(base, power) == number)
{
return power;
}
return 0;
}
I solved the problem. However, I don't understand few things:
How the int Power(int number, int base) function works. Why the log function is used? Why after division of two log functions the 0.5 is added? I found the Idea on the Internet.
I am not sure if this solution works on all cases. I didn't know what could be the biggest value of the base number so my for (int j = 2; j <= 10; j++) loop is going from 2 to 10. If there is a number that base is bigger the solution won't work.
Are there any easier ways to solve this problem?

How does the function work?
That's something the OP should have asked to the authors of that snippet (assuming it was copied verbatim or close).
The intent seems to check if a whole number power exists, such that in combination with the integral arguments number and base the following equation is satisfied:
number = base power
The function returns it or 0 if it doesn't exist, meaning that number is not an integral power of some integral base. To do so,
it uses a property of the logarithms:
n = bp
log(n) = p log(b)
p = log(n) / log(b)
it rounds the number[1] to the "closest" integer, to avoid cases where the limited precision of floating-point types and operations would have yield incorrect results in case of a simple truncation.
In the comments I've already made the example of std::log(1000)/std::log(10), which may produce a double result close to 3.0, but less than 3.0 (something like 2.9999999999999996). When stored in an int it would be truncated to 2.
It checks if the number found is the exact power which solve the previous equation, but that comparison has the same problems I mentioned before.
pow(base, power) == number // It compares a double with an int
Just like std::log, std::pow returns a double value, making all the calculations performed with those functions prone to subtle numerical errors (either by rounding or by accumulation when multiple operations are involved). It's often preferable to use integral types and operations, if possible, when accuracy (or absolute exactness[2]) is needed.
Is the algorithm correct?
I didn't know what could be the biggest value of the base number so my for loop is going from 2 to 10
That's just wrong. One of the constraints of the problem is b <= 1'000'000, but the posted solution couldn't find any power greater than 102.
An extimate of the greatest possible base is the square root of said b.
Are there any easier ways to solve this problem?
Easiness is subjective and we don't know all the requirements and constraints of OP's assignment. I'll describe an alternative solution without posting the code I wrote to test it[3].
OP's code considers all the numbers between a and b checking for every (well, up to 10) base if there exists a whole power.
My proposal uses only integral variables, of a wide enough type, say long (any 32-bit integer is enough).
The outer loop starts from base = 2 and increments it by one at every step.
Inside this loop, exponent is set to 2 and value to base * base
If value is greater than b, the algorithm stops.
While value is less than a, updates it (multiplying it by base) and the exponent (it's incremented by one). We need to find the first power of base which is greater or equal to a.
While value is less than or equal to b, store the triplet of variables value, base and exponent in suitable container.
Consider a std::map<long, std::pair<long, long>>, it lets us associate all the values with the corresponding pair of base and exponent. Also, it could be later traversed to obtain all the values in ascending order.
The assignment requires, in case of multiple powers, to present only the one with the bigger exponent. In the example, it shows 64 = 26, ignoring 64 = 43. Note the needed one is the one with the smaller base, so that it's enough to ignore any further value if it's already present in the map.
value and exponent are updated as before.
Note that this algorithm only consider bases up to the square root of b (in the outer loop) and the number of iterations of the inner loop is much more limited (with base = 2, it would be less than 20, beeing 220 > 1'000'000. Greater bases would stop sooner and sooner).
[1] See e.g. Why do lots of (old) programs use floor(0.5 + input) instead of round(input)?
[2] See e.g. The most efficient way to implement an integer based power function pow(int, int)
[3] How do I ask and answer homework questions?

Return non-duplicate random values from a very large range

I would like a function that will produce k pseudo-random values from a set of n integers, zero to n-1, without repeating any previous result. k is less than or equal to n. O(n) memory is unacceptable because of the large size of n and the frequency with which I'll need to re-shuffle.
These are the methods I've considered so far:
Array:
Normally if I wanted duplicate-free random values I'd shuffle an array, but that's O(n) memory. n is likely to be too large for that to work.
long nextvalue(void) {
static long array[4000000000];
static int s = 0;
if (s == 0) {
for (int i = 0; i < 4000000000; i++) array[i] = i;
shuffle(array, 4000000000);
}
return array[s++];
}
n-state PRNG:
There are a variety of random number generators that can be designed so as to have a period of n and to visit n unique states over that period. The simplest example would be:
long nextvalue(void) {
static long s = 0;
static const long i = 1009; // assumed co-prime to n
s = (s + i) % n;
return s;
}
The problem with this is that it's not necessarily easy to design a good PRNG on the fly for a given n, and it's unlikely that that PRNG will approximate a fair shuffle if it doesn't have a lot of variable parameters (even harder to design). But maybe there's a good one I don't know about.
m-bit hash:
If the size of the set is a power of two, then it's possible to devise a perfect hash function f() which performs a 1:1 mapping from any value in the range to some other value in the range, where every input produces a unique output. Using this function I could simply maintain a static counter s, and implement a generator as:
long nextvalue(void) {
static long s = 0;
return f(s++);
}
This isn't ideal because the order of the results is determined by f(), rather than random values, so it's subject to all the same problems as above.
NPOT hash:
In principle I can use the same design principles as above to define a version of f() which works in an arbitrary base, or even a composite, that is compatible with the range needed; but that's potentially difficult, and I'm likely to get it wrong. Instead a function can be defined for the next power of two greater than or equal to n, and used in this construction:
long nextvalue(void) {
static long s = 0;
long x = s++;
do { x = f(x); } while (x >= n);
}
But this still have the same problem (unlikely to give a good approximation of a fair shuffle).
Is there a better way to handle this situation? Or perhaps I just need a good function for f() that is highly parameterisable and easy to design to visit exactly n discrete states.
One thing I'm thinking of is a hash-like operation where I contrive to have the first j results perfectly random through carefully designed mapping, and then any results between j and k would simply extrapolate on that pattern (albeit in a predictable way). The value j could then be chosen to find a compromise between a fair shuffle and a tolerable memory footprint.

First of all, it seems unreasonable to discount anything that uses O(n) memory and then discuss a solution that refers to an underlying array. You have an array. Shuffle it. If that doesn't work or isn't fast enough, come back to us with a question about it.
You only need to perform a complete shuffle once. After that, draw from index n, swap that element with an element located randomly before it and increase n, modulo element count. For example, with such a large dataset I'd use something like this.
Prime numbers are an option for hashes, but probably not the same way you think. Using two Mersenne primes (low and high, perhaps 0xefff and 0xefffffff) you should be able to come up with a much more general-purpose hashing algorithm.
size_t hash(unsigned char *value, size_t value_size, size_t low, size_t high) {
size_t x = 0;
while (value_size--) {
x += *value++;
x *= low;
}
return x % high;
}
#define hash(value, value_size, low, high) (hash((void *) value, value_size, low, high))
This should produce something fairly well distributed for all inputs larger than about two octets for example, with the minor troublesome exception for zero byte prefixes. You might want to treat those differently.

So... what I've ended up doing is digging deeper into pre-existing methods to
try to confirm their ability to approximate a fair shuffle.
I take a simple counter, which itself is guaranteed to visit
every in-range value exactly once, and then 'encrypt' it with an n-bit block
cypher. Rather, I round the range up to a power of two, and apply some 1:1
function; then if the result is out of range I repeat the permutation until the
result is in range.
This can be guaranteed to complete eventually because there are only a finite
number of out-of-range values within the power-of-two range, and they cannot
enter into a always-out-of-range cycle because that would imply that something
in the cycle was mapped from two different previous states (one from the
in-range set, and another from the out-of-range set), which would make the
function not bijective.
So all I need to do is devise a parameterisable function which I can tune to an
arbitrary number of bits. Like this one:
uint64_t mix(uint64_t x, uint64_t k) {
const int s0 = BITS * 4 / 5;
const int s1 = BITS / 5 + (k & 1);
const int s2 = BITS * 2 / 5;
k |= 1;
x *= k;
x ^= (x & BITMASK) >> s0;
x ^= (x << s1) & BITMASK;
x ^= (x & BITMASK) >> s2;
x += 0x9e3779b97f4a7c15;
return x & BITMASK;
}
I know it's bijective because I happen to have its inverse function handy:
uint64_t unmix(uint64_t x, uint64_t k) {
const int s0 = BITS * 4 / 5;
const int s1 = BITS / 5 + (k & 1);
const int s2 = BITS * 2 / 5;
k |= 1;
uint64_t kp = k * k;
while ((kp & BITMASK) > 1) {
k *= kp;
kp *= kp;
}
x -= 0x9e3779b97f4a7c15;
x ^= ((x & BITMASK) >> s2) ^ ((x & BITMASK) >> s2 * 2);
x ^= (x << s1) ^ (x << s1 * 2) ^ (x << s1 * 3) ^ (x << s1 * 4) ^ (x << s1 * 5);
x ^= (x & BITMASK) >> s0;
x *= k;
return x & BITMASK;
}
This allows me to define a simple parameterisable PRNG like this:
uint64_t key[ROUNDS];
uint64_t seed = 0;
uint64_t rand_no_rep(void) {
uint64_t x = seed++;
do {
for (int i = 0; i < ROUNDS; i++) x = mix(x, key[i]);
} while (x >= RANGE);
return x;
}
Initialise seed and key to random values and you're good to go.
Using the inverse function to lets me determine what seed must be to force
rand_no_rep() to produce a given output; making it much easier to test.
So far I've checked the cases where constant a, it is followed by constant
b. For ROUNDS==1 pairs collide on exactly 50% of the keys (and each
pair of collisions is with a different pair of a and b; they don't all converge on 0, 1 or whatever). That is, for
various k, a specific a-followed-by-b cases occurs for more than one k
(this must happen at least one). Subsequent values values do not collide in
that case, so different keys aren't falling into the same cycle at different
positions. Every k gives a unique cycle.
50% collisions comes from 25% being not unique when they're added to the list (count itself, and count the guy it ran into). That might sound bad but it's actually lower than birthday paradox logic would suggest. Selecting randomly, the percentage of new entries that fail to be unique looks to converge between 36% and 37%. Being "better than random" is obviously worse than random, as far as randomness goes, but that's why they're called pseudo-random numbers.
Extending that to ROUNDS==2, we want to make sure that a second round doesn't
cancel out or simply repeat the effects of the first.
This is important because it would mean that multiple rounds are a waste of
time and memory, and that the function cannot be paramaterised to any
substantial degree. It could happen trivially if mix() contained all linear
operations (say, multiply and add, mod RANGE). In that case all of the
parameters could be multiplied/added together to produce a single parameter for
a single round that would have the same effect. That would be disappointing,
as it would reduce the number of attainable permutations to the size of just
that one parameter, and if the set is as small as that then more work would be
needed to ensure that it's a good, representative set.
So what we want to see from two rounds is a large set of outcomes that could
never be achieved by one round. One way to demonstrate this is to look for the
original b-follows-a cases with an additional parameter c, where we want
to see every possible c following a and b.
We know from the one-round testing that in 50% of cases there is only one c
that can follow a and b because there is only one k that places b
immediately after a. We also know that 25% of the pairs of a and b were
unreachable (being the gap left behind by half the pairs that went into
collisions rather than new unique values), and the last 25% appear for two
different k.
The result that I get is that given a free choice of both keys, it's possible
to find about five eights of the values of c following a given a and b.
About a quarter of the a/b pairs are unreachable (it's a less predictable,
now, because of the potential intermediate mappings into or out of the
duplicate or unreachable cases) and a quarter have a, b, and c appear
together in two sequences (which diverge afterwards).
I think there's a lot to be inferred from the difference between one round and
two, but I could be wrong about that and I need to double-check. Further
testing gets harder; or at least slower unless I think more carefully about how
I'm going to do it.
I haven't yet demonstrated that amongst the set of permutations it can produce, that they're all equally likely; but this is normally not guaranteed for any other PRNG either.
It's fairly slow for a PRNG, but it would fit SIMD trivially.

Montgomery Multiplication in RSA: c=m^e%n

How does Montgomery Multiplication work in speeding up the encryption process for computing c=m^e%n as used in RSA encryption?
I understand that Montgomery multiplication can efficiently multiply a*b%n but when trying to find m^e%n, is there a more efficient way of multiplying m*m e number of times than just looping through and computing a Montgomery multiplication each time?
mpz_class mod(mpz_class &m, mpz_class &exp, mpz_class &n) {
//End goal is to return m^exp%n
// cout << "Begin mod";
mpz_class orig_m = m; //the original message
mpz_class loc_m = m; //local value of m (to be changed as you cycle through)
cout << "m: " << m << " exp: " << exp << " n: " << n << endl;
//Conversion to the montgomery world
mpz_class mm_xp = (loc_m*r)%n;
mpz_class mm_yp = (orig_m*r)%n;
for(int i=0; i < exp-1; i++) //Repeat multiplaction "exp" number of times
{
mm(mm_xp, mm_yp, n); //montgomery multiplication algorithm returns m*orig_m%n but in the montgomery world form
}
mm_xp = (mm_xp*r_p)%n; //convert from montgomery world to normal numbers
return mm_xp;
}
I'm using the gmp libraries so I can work with larger numbers here. r and r_p are being pre-calculated in a separate function and are global. In this example I'm working in powers of 10 (though i realize it would be more efficient to work with powers of 2)
I convert to montgomery form prior the the multiplications and repeated multiply m*m in the for loop, converting back to normal world at the end of the m^e step. I'm curious as to know whether there is another way to compute operation m^e%n in a different way, rather than just cycling through in a for loop? As of now, i believe this to be the bottle neck of the computation however I could very well be wrong.
the actual montgomery multiplication step occurs in the function below.
void mm(mpz_class &ret, const mpz_class &y, const mpz_class &n)
{
mpz_class a = ret*y;
while(a%r != 0)
{
a += n;
}
ret = a/r; //ret*y%n in montgomery form
// cout << ret << endl;
}
Is this at all how RSA encryption works with the montgomery multiplication optimization?

No, you do not want to do e multiplications of m by itself to compute RSA.
You normally want to do me mod n by doing repeated squaring (there are other possibilities, but this is a simple one that's adequate for many typical purposes).
In a previous post on RSA, I included an implementation that used a pow_mod function. That, in turn, used a mul_mod function. Montgomery multiplication is (basically) an implementation of that mul_mod function that's better suited to working with large numbers. To make it useful, however, you just about need something on at least the general order of the pow_mod function, not just a loop to make e calls to mul_mod.
Given the magnitude of numbers involved in real use of RSA, trying to compute me mod n just using repeated multiplication would probably take years (quite possibly quite a few years) to complete even a single encryption. In other words, a different algorithm isn't just a nice optimization--it's absolutely necessary for use to be practical at all.
To put this in algorithmic terms, raising AB using plain multiplication is basically O(B). Doing it with the repeated squaring algorithm shown there, it's basically O(log B) instead. If B is very large at all, the difference between the two is immense.

Generating e value in RSA Encryption

I've generated p,q,n, and totient, and need to generate e where 1 < e < totient and e and totient are coprime. The problem I'm running into with my code is that I'm first generating totient (normal (p-1)*(q-1) way) but when i try to generate a coprime e, it usually runs forever with this code
const mpz_class RsaKeys::compute_e(mpz_class totient) const {
dgrandint e(bits_);
while ((e.get_mpz_class() < totient) ||
!is_coprime(e.get_mpz_class(), totient)) {
std::cerr<<e.get_mpz_class()<< " is not coprime with "<<totient<<std::endl;
e.reroll();
}
return e.get_mpz_class();
I'm testing with low bit integers 8-32, and will actually need to handle 1024 bit values, but I need a way to first check if the totient generated has any arbitrary number of possible values that would make it coprime. I have only found ways of checking whether values are coprime, but not if there exists a complementary coprime value for a number that already exists.

The value of e doesn't need to be random, indeed most RSA systems use one of a small number of common e values, with the most widely used being 65537.

Simulate random iteration of array

I have an array of given size. I want to traverse it in pseudorandom order, keeping array intact and visiting each element once. It will be best if current state can be stored in a few integers.
I know you can't have full randomness without storing full array, but I don't need the order to be really random. I need it to be perceived as random by user. The solution should use sub-linear space.
One possible suggestion - using large prime number - is given here. The problem with this solution is that there is an obvious fixed step (taken module array size). I would prefer a solution which is not so obviously non-random. Is there a better solution?

How about this algorithm?
To pseudo-pseudo randomly traverse an array of size n.
Create a small array of size k
Use the large prime number method to fill the small array, i = 0
Randomly remove a position using a RNG from the small array, i += 1
if i < n - k then add a new position using the large prime number method
if i < n goto 3.
the higher k is the more randomness you get. This approach will allow you to delay generating numbers from the prime number method.
A similar approach can be done to generate a number earlier than expected in the sequence by creating another array, "skip-list". Randomly pick items later in the sequence, use them to traverse the next position, and then add them to the skip-list. When they naturally arrive they are searched for in the skip-list and suppressed and then removed from the skip-list at which point you can randomly add another item to the skip-list.

The idea of a random generator that simulates a shuffle is good if you can get one whose maximum period you can control.
A Linear Congruential Generator calculates a random number with the formula:
x[i + 1] = (a * x[i] + c) % m;
The maximum period is m and it is achieved when the following properties hold:
The parameters c and m are relatively prime.
For every prime number r dividing m, a - 1 is a multiple of r.
If m is a multiple of 4 then also a - 1 is multiple of 4.
My first darft involved making m the next multiple of 4 after the array length and then finding suitable a and c values. This was (a) a lot of work and (b) yielded very obvious results sometimes.
I've rethought this approach. We can make m the smallest power of two that the array length will fit in. The only prime factor of m is then 2, which will make every odd number relatively prime to it. With the exception of 1 and 2, m will be divisible by 4, which means that we must make a - 1 a multiple of 4.
Having a greater m than the array length means that we must discard all values that are illegal array indices. This will happen at most every other turn and should be negligible.
The following code yields pseudo random numbers with a period of exaclty m. I've avoided trivial values for a and c and on my (not too numerous) spot cheks, the results looked okay. At least there was no obvious cycling pattern.
So:
class RandomIndexer
{
public:
RandomIndexer(size_t length) : len(length)
{
m = 8;
while (m < length) m <<= 1;
c = m / 6 + uniform(5 * m / 6);
c |= 1;
a = m / 12 * uniform(m / 6);
a = 4*a + 1;
x = uniform(m);
}
size_t next()
{
do { x = (a*x + c) % m; } while (x >= len);
return x;
}
private:
static size_t uniform(size_t m)
{
double p = std::rand() / (1.0 + RAND_MAX);
return static_cast<int>(m * p);
}
size_t len;
size_t x;
size_t a;
size_t c;
size_t m;
};
You can then use the generator like this:
std::vector<int> list;
for (size_t i = 0; i < 3; i++) list.push_back(i);
RandomIndexer ix(list.size());
for (size_t i = 0; i < list.size(); i++) {
std::cout << list[ix.next()]<< std::endl;
}
I am aware that this still isn't a great random number generator, but it is reasonably fast, doesn't require a copy of the array and seems to work okay.
If the approach of picking a and c randomly yields bad results, it might be a good idea to restrict the generator to some powers of two and to hard-code literature values that have proven to be good.

As pointed out by others, you can create a sort of "flight plan" upfront by shuffling an array of array indices and then follow it. This violates the "it will be best if current state can be stored in a few integers" constraint but does it really matter? Are there tight performance constraints? After all, I believe that if you don't accept repetitions, than you need to store the items you already visited somewhere or somehow.
Alternatively, you can opt for an intrusive solution and store a bool inside each element of the array, telling you whether the element was already selected or not. This can be done in an almost clean way by employing inheritance (multiple as needed).
Many problems come with this solution, e.g. thread safety, and of course it violates the "keep the array intact" constraint.

Quadratic residues which you have mentioned ("using a large prime") are well-known, will work, and guarantee iterating each and every element exactly once (if that is required, but it seems that's not strictly the case?). Unluckily they are not "very random looking", and there are a few other requirements to the modulo in addition to being prime for it to work.
There is a page on Jeff Preshing's site which describes the technique in detail and suggests to feed the output of the residue generator into the generator again with a fixed offset.
However, since you said that you merely need "perceived as random by user", it seems that you might be able to do with feeding a hash function (say, cityhash or siphash) with consecutive integers. The output will be a "random" integer, and at least so far there will be a strict 1:1 mapping (since there are a lot more possible hash values than there are inputs).
Now the problem is that your array is most likely not that large, so you need to somehow reduce the range of these generated indices without generating duplicates (which is tough).
The obvious solution (taking the modulo) will not work, as it pretty much guarantees that you get a lot of duplicates.
Using a bitmask to limit the range to the next greater power of two should work without introducing bias, and discarding indices that are out of bounds (generating a new index) should work as well. Note that this needs non-deterministic time -- but the combination of these two should work reasonably well (a couple of tries at most) on the average.
Otherwise, the only solution that "really works" is shuffling an array of indices as pointed out by Kamil Kilolajczyk (though you don't want that).

Here is a java solution, which can be easily converted to C++ and similar to M Oehm's solution above, albeit with a different way of choosing LCG parameters.
import java.util.Enumeration;
import java.util.Random;
public class RandomPermuteIterator implements Enumeration<Long> {
int c = 1013904223, a = 1664525;
long seed, N, m, next;
boolean hasNext = true;
public RandomPermuteIterator(long N) throws Exception {
if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
this.N = N;
m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
}
public static void main(String[] args) throws Exception {
RandomPermuteIterator r = new RandomPermuteIterator(100);
while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
//output:50 52 3 6 45 40 26 49 92 11 80 2 4 19 86 61 65 44 27 62 5 32 82 9 84 35 38 77 72 7 ...
}
#Override
public boolean hasMoreElements() {
return hasNext;
}
#Override
public Long nextElement() {
next = (a * next + c) % m;
while (next >= N) next = (a * next + c) % m;
if (next == seed) hasNext = false;
return next;
}
}

maybe you could use this one: http://www.cplusplus.com/reference/algorithm/random_shuffle/ ?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js