Generating e value in RSA Encryption - c++

I've generated p,q,n, and totient, and need to generate e where 1 < e < totient and e and totient are coprime. The problem I'm running into with my code is that I'm first generating totient (normal (p-1)*(q-1) way) but when i try to generate a coprime e, it usually runs forever with this code
const mpz_class RsaKeys::compute_e(mpz_class totient) const {
dgrandint e(bits_);
while ((e.get_mpz_class() < totient) ||
!is_coprime(e.get_mpz_class(), totient)) {
std::cerr<<e.get_mpz_class()<< " is not coprime with "<<totient<<std::endl;
e.reroll();
}
return e.get_mpz_class();
I'm testing with low bit integers 8-32, and will actually need to handle 1024 bit values, but I need a way to first check if the totient generated has any arbitrary number of possible values that would make it coprime. I have only found ways of checking whether values are coprime, but not if there exists a complementary coprime value for a number that already exists.

The value of e doesn't need to be random, indeed most RSA systems use one of a small number of common e values, with the most widely used being 65537.

Related

Pick a matrix cell according to its probability

I have a 2D matrix of positive real values, stored as follow:
vector<vector<double>> matrix;
Each cell can have a value equal or greater to 0, and this value represents the possibility of the cell to be chosen. In particular, for example, a cell with a value equals to 3 has three times the probability to be chosen compared to a cell with value 1.
I need to select N cells of the matrix (0 <= N <= total number of cells) randomly, but according to their probability to be selected.
How can I do that?
The algorithm should be as fast as possible.
I describe two methods, A and B.
A works in time approximately N * number of cells, and uses space O(log number of cells). It is good when N is small.
B works in time approximately (number of cells + N) * O(log number of cells), and uses space O(number of cells). So, it is good when N is large (or even, 'medium') but uses a lot more memory, in practice it might be slower in some regimes for that reason.
Method A:
The first thing you need to do is normalize the entries. (It's not clear to me if you assume they are normalized or not.) That means, sum all the entries and divide by the sum. (This part is potentially slow, so it's better if you assume or require that it already happened.)
Then you sample like this:
Choose a random [i,j] entry of the matrix (by choosing i,j each uniformly randomly from the range of integers 0 to n-1).
Choose a uniformly random real number p in the range [0, 1].
Check if matrix[i][j] > p. If so, return the pair [i][j]. If not, go back to step 1.
Why does this work? The probability that we end at step 3 with any particular output, is equal to, the probability that [i][j] was selected (this is the same for each entry), times the probality that the number p was small enough. This is proportional to the value matrix[i][j], so the sampling is choosing each entry with the correct proportions. It's also possible that at step 3 we go back to the start -- does that bias things? Basically, no. The reason is, suppose we arbitrarily choose a number k and then consider the distribution of the algorithm, conditioned on stopping exactly after k rounds. Conditioned on the assumption that we stop at the k'th round, no matter what value k we choose, the distribution we sample has to be exactly right by the above argument. Since if we eliminate the case that p is too small, the other possibilities all have their proportions correct. Since the distribution is perfect for each value of k that we might condition on, and the overall distribution (not conditioned on k) is an average of the distributions for each value of k, the overall distribution is perfect also.
If you want to analyze the number of rounds that typically needed in a rigorous way, you can do it by analyzing the probability that we actually stop at step 3 for any particular round. Since the rounds are independent, this is the same for every round, and statistically, it means that the running time of the algorithm is poisson distributed. That means it is tightly concentrated around its mean, and we can determine the mean by knowing that probability.
The probability that we stop at step 3 can be determined by considering the conditional probability that we stop at step 3, given that we chose any particular entry [i][j]. By the formulas for conditional expectation, you get that
Pr[ stop at step 3 ] = sum_{i,j} ( 1/(n^2) * Matrix[i,j] )
Since we assumed the matrix is normalized, this sum reduces to just 1/n^2. So, the expected number of rounds is about n^2 (that is, n^2 up to a constant factor) no matter what the entries in the matrix are. You can't hope to do a lot better than that I think -- that's about the same amount of time it takes to just read all the entries of the matrix, and it's hard to sample from a distribution that you cannot even read all of.
Note: What I described is a way to correctly sample a single element -- to get N elements from one matrix, you can just repeat it N times.
Method B:
Basically you just want to compute a histogram and sample inversely from it, so that you know you get exactly the right distribution. Computing the histogram is expensive, but once you have it, getting samples is cheap and easy.
In C++ it might look like this:
// Make histogram
typedef unsigned int uint;
typedef std::pair<uint, uint> upair;
typedef std::map<double, upair> histogram_type;
histogram_type histogram;
double cumulative = 0.0f;
for (uint i = 0; i < Matrix.size(); ++i) {
for (uint j = 0; j < Matrix[i].size(); ++j) {
cumulative += Matrix[i][j];
histogram[cumulative] = std::make_pair(i,j);
}
}
std::vector<upair> result;
for (uint k = 0; k < N; ++k) {
// Do a sample (this should never repeat... if it does not find a lower bound you could also assert false quite reasonably since it means something is wrong with rand() implementation)
while(1) {
double p = cumulative * rand(); // Or, for best results use std::mt19937 or boost::mt19937 and sample a real in the range [0,1] here.
histogram_type::iterator it = histogram::lower_bound(p);
if (it != histogram.end()) {
result.push_back(it->second);
break;
}
}
}
return result;
Here the time to make the histogram is something like number of cells * O(log number of cells) since inserting into the map takes time O(log n). You need an ordered data structure in order to get cheap lookup N * O(log number of cells) later when you do repeated sampling. Possibly you could choose a more specialized data structure to go faster, but I think there's only limited room for improvement.
Edit: As #Bob__ points out in comments, in method (B) a written there is potentially going to be some error due to floating point round-off if the matrices are quite large, even using type double, at this line:
cumulative += Matrix[i][j];
The problem is that, if cumulative is much larger than Matrix[i][j] beyond what the floating point precision can handle then these each time this statement is executed you may observe significant errors which accumulate to introduce significant inaccuracy.
As he suggests, if that happens, the most straightforward way to fix it is to sort the values Matrix[i][j] first. You could even do this in the general implementation to be safe -- sorting these guys isn't going to take more time asymptotically than you already have anyways.

Simulate random iteration of array

I have an array of given size. I want to traverse it in pseudorandom order, keeping array intact and visiting each element once. It will be best if current state can be stored in a few integers.
I know you can't have full randomness without storing full array, but I don't need the order to be really random. I need it to be perceived as random by user. The solution should use sub-linear space.
One possible suggestion - using large prime number - is given here. The problem with this solution is that there is an obvious fixed step (taken module array size). I would prefer a solution which is not so obviously non-random. Is there a better solution?
How about this algorithm?
To pseudo-pseudo randomly traverse an array of size n.
Create a small array of size k
Use the large prime number method to fill the small array, i = 0
Randomly remove a position using a RNG from the small array, i += 1
if i < n - k then add a new position using the large prime number method
if i < n goto 3.
the higher k is the more randomness you get. This approach will allow you to delay generating numbers from the prime number method.
A similar approach can be done to generate a number earlier than expected in the sequence by creating another array, "skip-list". Randomly pick items later in the sequence, use them to traverse the next position, and then add them to the skip-list. When they naturally arrive they are searched for in the skip-list and suppressed and then removed from the skip-list at which point you can randomly add another item to the skip-list.
The idea of a random generator that simulates a shuffle is good if you can get one whose maximum period you can control.
A Linear Congruential Generator calculates a random number with the formula:
x[i + 1] = (a * x[i] + c) % m;
The maximum period is m and it is achieved when the following properties hold:
The parameters c and m are relatively prime.
For every prime number r dividing m, a - 1 is a multiple of r.
If m is a multiple of 4 then also a - 1 is multiple of 4.
My first darft involved making m the next multiple of 4 after the array length and then finding suitable a and c values. This was (a) a lot of work and (b) yielded very obvious results sometimes.
I've rethought this approach. We can make m the smallest power of two that the array length will fit in. The only prime factor of m is then 2, which will make every odd number relatively prime to it. With the exception of 1 and 2, m will be divisible by 4, which means that we must make a - 1 a multiple of 4.
Having a greater m than the array length means that we must discard all values that are illegal array indices. This will happen at most every other turn and should be negligible.
The following code yields pseudo random numbers with a period of exaclty m. I've avoided trivial values for a and c and on my (not too numerous) spot cheks, the results looked okay. At least there was no obvious cycling pattern.
So:
class RandomIndexer
{
public:
RandomIndexer(size_t length) : len(length)
{
m = 8;
while (m < length) m <<= 1;
c = m / 6 + uniform(5 * m / 6);
c |= 1;
a = m / 12 * uniform(m / 6);
a = 4*a + 1;
x = uniform(m);
}
size_t next()
{
do { x = (a*x + c) % m; } while (x >= len);
return x;
}
private:
static size_t uniform(size_t m)
{
double p = std::rand() / (1.0 + RAND_MAX);
return static_cast<int>(m * p);
}
size_t len;
size_t x;
size_t a;
size_t c;
size_t m;
};
You can then use the generator like this:
std::vector<int> list;
for (size_t i = 0; i < 3; i++) list.push_back(i);
RandomIndexer ix(list.size());
for (size_t i = 0; i < list.size(); i++) {
std::cout << list[ix.next()]<< std::endl;
}
I am aware that this still isn't a great random number generator, but it is reasonably fast, doesn't require a copy of the array and seems to work okay.
If the approach of picking a and c randomly yields bad results, it might be a good idea to restrict the generator to some powers of two and to hard-code literature values that have proven to be good.
As pointed out by others, you can create a sort of "flight plan" upfront by shuffling an array of array indices and then follow it. This violates the "it will be best if current state can be stored in a few integers" constraint but does it really matter? Are there tight performance constraints? After all, I believe that if you don't accept repetitions, than you need to store the items you already visited somewhere or somehow.
Alternatively, you can opt for an intrusive solution and store a bool inside each element of the array, telling you whether the element was already selected or not. This can be done in an almost clean way by employing inheritance (multiple as needed).
Many problems come with this solution, e.g. thread safety, and of course it violates the "keep the array intact" constraint.
Quadratic residues which you have mentioned ("using a large prime") are well-known, will work, and guarantee iterating each and every element exactly once (if that is required, but it seems that's not strictly the case?). Unluckily they are not "very random looking", and there are a few other requirements to the modulo in addition to being prime for it to work.
There is a page on Jeff Preshing's site which describes the technique in detail and suggests to feed the output of the residue generator into the generator again with a fixed offset.
However, since you said that you merely need "perceived as random by user", it seems that you might be able to do with feeding a hash function (say, cityhash or siphash) with consecutive integers. The output will be a "random" integer, and at least so far there will be a strict 1:1 mapping (since there are a lot more possible hash values than there are inputs).
Now the problem is that your array is most likely not that large, so you need to somehow reduce the range of these generated indices without generating duplicates (which is tough).
The obvious solution (taking the modulo) will not work, as it pretty much guarantees that you get a lot of duplicates.
Using a bitmask to limit the range to the next greater power of two should work without introducing bias, and discarding indices that are out of bounds (generating a new index) should work as well. Note that this needs non-deterministic time -- but the combination of these two should work reasonably well (a couple of tries at most) on the average.
Otherwise, the only solution that "really works" is shuffling an array of indices as pointed out by Kamil Kilolajczyk (though you don't want that).
Here is a java solution, which can be easily converted to C++ and similar to M Oehm's solution above, albeit with a different way of choosing LCG parameters.
import java.util.Enumeration;
import java.util.Random;
public class RandomPermuteIterator implements Enumeration<Long> {
int c = 1013904223, a = 1664525;
long seed, N, m, next;
boolean hasNext = true;
public RandomPermuteIterator(long N) throws Exception {
if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
this.N = N;
m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
}
public static void main(String[] args) throws Exception {
RandomPermuteIterator r = new RandomPermuteIterator(100);
while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
//output:50 52 3 6 45 40 26 49 92 11 80 2 4 19 86 61 65 44 27 62 5 32 82 9 84 35 38 77 72 7 ...
}
#Override
public boolean hasMoreElements() {
return hasNext;
}
#Override
public Long nextElement() {
next = (a * next + c) % m;
while (next >= N) next = (a * next + c) % m;
if (next == seed) hasNext = false;
return next;
}
}
maybe you could use this one: http://www.cplusplus.com/reference/algorithm/random_shuffle/ ?

Generating random integers with a difference constraint

I have the following problem:
Generate M uniformly random integers from the range 0-N, where N >> M, and where no pair has a difference less than K. where M >> K.
At the moment the best method I can think of is to maintain a sorted list, then determine the lower bound of the current generated integer and test it with the lower and upper elements, if it's ok to then insert the element in between. This is of complexity O(nlogn).
Would there happen to be a more efficient algorithm?
An example of the problem:
Generate 1000 uniformly random integers between zero and 100million where the difference between any two integers is no less than 1000
A comprehensive way to solve this would be to:
Determine all the combinations of n-choose-m that satisfy the constraint, lets called it set X
Select a uniformly random integer i in the range [0,|X|).
Select the i'th combination from X as the result.
This solution is problematic when the n-choose-m is large, as enumerating and storing all possible combinations will be extremely costly. Hence an efficient online generating solution is sought.
Note: The following is a C++ implementation of the solution provided by pentadecagon
std::vector<int> generate_random(const int n, const int m, const int k)
{
if ((n < m) || (m < k))
return std::vector<int>();
std::random_device source;
std::mt19937 generator(source());
std::uniform_int_distribution<> distribution(0, n - (m - 1) * k);
std::vector<int> result_list;
result_list.reserve(m);
for (int i = 0; i < m; ++i)
{
result_list.push_back(distribution(generator));
}
std::sort(std::begin(result_list),std::end(result_list));
for (int i = 0; i < m; ++i)
{
result_list[i] += (i * k);
}
return result_list;
}
http://ideone.com/KOeR4R
.
EDIT: I adapted the text for the requirement to create ordered sequences, each with the same probability.
Create random numbers a_i for i=0..M-1 without duplicates. Sort them. Then create numbers
b_i=a_i + i*(K-1)
Given the construction, those numbers b_i have the required gaps, because the a_i already have gaps of at least 1. In order to make sure those b values cover exactly the required range [1..N], you must ensure a_i are picked from a range [1..N-(M-1)*(K-1)]. This way you get truly independent numbers. Well, as independent as possible given the required gap. Because of the sorting you get O(M log M) performance again, but this shouldn't be too bad. Sorting is typically very fast. In Python it looks like this:
import random
def random_list( N, M, K ):
s = set()
while len(s) < M:
s.add( random.randint( 1, N-(M-1)*(K-1) ) )
res = sorted( s )
for i in range(M):
res[i] += i * (K-1)
return res
First off: this will be an attempt to show that there's a bijection between the (M+1)- compositions (with the slight modification that we will allow addends to be 0) of the value N - (M-1)*K and the valid solutions to your problem. After that, we only have to pick one of those compositions uniformly at random and apply the bijection.
Bijection:
Let
Then the xi form an M+1-composition (with 0 addends allowed) of the value on the left (notice that the xi do not have to be monotonically increasing!).
From this we get a valid solution
by setting the values mi as follows:
We see that the distance between mi and mi + 1 is at least K, and mM is at most N (compare the choice of the composition we started out with). This means that every (M+1)-composition that fulfills the conditions above defines exactly one valid solution to your problem. (You'll notice that we only use the xM as a way to make the sum turn out right, we don't use it for the construction of the mi.)
To see that this gives a bijection, we need to see that the construction can be reversed; for this purpose, let
be a given solution fulfilling your conditions. To get the composition this is constructed from, define the xi as follows:
Now first, all xi are at least 0, so that's alright. To see that they form a valid composition (again, every xi is allowed to be 0) of the value given above, consider:
The third equality follows since we have this telescoping sum that cancels out almost all mi.
So we've seen that the described construction gives a bijection between the described compositions of N - (M-1)*K and the valid solutions to your problem. All we have to do now is pick one of those compositions uniformly at random and apply the construction to get a solution.
Picking a composition uniformly at random
Each of the described compositions can be uniquely identified in the following way (compare this for illustration): reserve N - (M-1)*K spaces for the unary notation of that value, and another M spaces for M commas. We get an (M+1)- composition of N - (M-1)*K by choosing M of the N - (M-1)*K + M spaces, putting commas there, and filling the rest with |. Then let x0 be the number of | before the first comma, xM+1 the number of | after the last comma, and all other xi the number of | between commas i and i+1. So all we have to do is pick an M-element subset of the integer interval[1; N - (M-1)*K + M] uniformly at random, which we can do for example with the Fisher-Yates shuffle in O(N + M log M) (we need to sort the M delimiters to build the composition) since M*K needs to be in O(N) for any solutions to exist. So if N is bigger than M by at least a logarithmic factor, then this is linear in N.
Note: #DavidEisenstat suggested that there are more space efficient ways of picking the M-element subset of that interval; I'm not aware of any, I'm afraid.
You can get an error-proof algorithm out of this by doing the simple input validation we get from the construction above that N ≥ (M-1) * K and that all three values are at least 1 (or 0, if you define the empty set as a valid solution for that case).
Why not do this:
for (int i = 0; i < M; ++i) {
pick a random number between K and N/M
add this number to (N/M)* i;
Now you have M random numbers, distributed evenly along N, all of which have a difference of at least K. It's in O(n) time. As an added bonus, it's already sorted. :-)
EDIT:
Actually, the "pick a random number" part shouldn't be between K and N/M, but between min(K, [K - (N/M * i - previous value)]). That would ensure that the differences are still at least K, and not exclude values that should not be missed.
Second EDIT:
Well, the first case shouldn't be between K and N/M - it should be between 0 and N/M. Just like you need special casing for when you get close to the N/M*i border, we need special initial casing.
Aside from that, the issue you brought up in your comments was fair representation, and you're right. As my pseudocode is presented, it currently completely misses the excess between N/M*M and N. It's another edge case; simply change the random values of your last range.
Now, in this case, your distribution will be different for the last range. Since you have more numbers, you have slightly less chance for each number than you do for all the other ranges. My understanding is that because you're using ">>", this shouldn't really impact the distribution, i.e. the difference in size in the sample set should be nominal. But if you want to make it more fair, you divide the excess equally among each range. This makes your initial range calculation more complex - you'll have to augment each range based on how much remainder there is divided by M.
There are lots of special cases to look out for, but they're all able to be handled. I kept the pseudocode very basic just to make sure that the general concept came through clearly. If nothing else, it should be a good starting point.
Third and Final EDIT:
For those worried that the distribution has a forced evenness, I still claim that there's nothing saying it can't. The selection is uniformly distributed in each segment. There is a linear way to keep it uneven, but that also has a trade-off: if one value is selected extremely high (which should be unlikely given a very large N), then all the other values are constrained:
int prevValue = 0;
int maxRange;
for (int i = 0; i < M; ++i) {
maxRange = N - (((M - 1) - i) * K) - prevValue;
int nextValue = random(0, maxRange);
prevValue += nextValue;
store previous value;
prevValue += K;
}
This is still linear and random and allows unevenness, but the bigger prevValue gets, the more constrained the other numbers become. Personally, I prefer my second edit answer, but this is an available option that given a large enough N is likely to satisfy all the posted requirements.
Come to think of it, here's one other idea. It requires a lot more data maintenance, but is still O(M) and is probably the most fair distribution:
What you need to do is maintain a vector of your valid data ranges and a vector of probability scales. A valid data range is just the list of high-low values where K is still valid. The idea is you first use the scaled probability to pick a random data range, then you randomly pick a value within that range. You remove the old valid data range and replace it with 0, 1 or 2 new data ranges in the same position, depending on how many are still valid. All of these actions are constant time other than handling the weighted probability, which is O(M), done in a loop M times, so the total should be O(M^2), which should be much better than O(NlogN) because N >> M.
Rather than pseudocode, let me work an example using OP's original example:
0th iteration: valid data ranges are from [0...100Mill], and the weight for this range is 1.0.
1st iteration: Randomly pick one element in the one element vector, then randomly pick one element in that range.
If the element is, e.g. 12345678, then we remove the [0...100Mill] and replace it with [0...12344678] and [12346678...100Mill]
If the element is, e.g. 500, then we remove the [0...100Mill] and replace it with just [1500...100Mill], since [0...500] is no longer a valid range. The only time we will replace it with 0 ranges is in the unlikely event that you have a range with only one number in it and it gets picked. (In that case, you'll have 3 numbers in a row that are exactly K apart from each other.)
The weight for the ranges are their length over the total length, e.g. 12344678/(12344678 + (100Mill - 12346678)) and (100Mill - 12346678)/(12344678 + (100Mill - 12346678))
In the next iterations, you do the same thing: randomly pick a number between 0 and 1 and determine which of the ranges that scale falls into. Then randomly pick a number in that range, and replace your ranges and scales.
By the time it's done, we're no longer acting in O(M), but we're still only dependent on the time of M instead of N. And this actually is both uniform and fair distribution.
Hope one of these ideas works for you!

Various questions about RSA encryption

I'm currently writing my own ASE/RSA encryption program in C++ for Unix. I've been going through the literature for about a week now, and I've started to wrap my head around it all but I'm still left with some pressing questions:
1) Based on my understanding, an RSA key in its most basic form is the combination of the product of the two primes (R) used and the exponents. It's obvious to me that storing the key in such a form in plaintext would defeat the purpose of encryption anything at all. Therefore, in what form can I store my generated public and private keys? Ask the user for a password and do some "simple" shift/replacing on the individual digits of the key with an ASCII table? Or is there some other standard I haven't run across? Also, when the keys are generated, are R and the respective exponent simply stored sequentially? i.e. ##primeproduct####exponent##? In that case, how would a decryption algorithm parse the key into the two separate values?
2) How would I go about programatically generating the private exponent, given that I've decided to use 65537 as my public exponent for all encryptions? I've got the equation P*Q = 1mod(M), where P and Q and the exponents and M is the result of Euler's Totient Function. Is this simply a matter of generating random numbers and testing their relative primality to the public exponent until you hit pay dirt? I know you can't simply start from 1 and increment until you find such a number, as anyone could simply do the same thing and get your private exponent themselves.
3) When generating the character equivalence set, I understand that the numbers used in the set can't be must be less than and relatively prime to P*Q. Again, this is a matter of testing relative primality of numbers to P*Q. Is the speed of testing relative primality independent of the size of the numbers you're working with? Or are special algorithms necessary?
Thanks in advance to anyone who takes the time to read and answer, cheers!
There are some standard formats for storing/exchanging RSA keys such as RFC 3447. For better or worse, most (many, anyway) use ASN.1 encoding, which adds more complexity than most people like, all by itself. A few use Base64 encoding, which is a lot easier to implement.
As far as what constitutes a key goes: in its most basic form, you're correct; the public key includes the modulus (usually called n) and an exponent (usually called e).
To compute a key pair, you start from two large prime numbers, usually called p and q. You compute the modulus n as p * q. You also compute a number (often called r) that's (p-1) * (q-1).
e is then a more or less randomly chosen number that's prime relative to r. Warning: you don't want e to be really small though -- log(e) >= log(n)/4 as a bare minimum.
You then compute d (the private decryption key) as a number satisfying the relation:
d * e = 1 (mod r)
You typically compute this using Euclid's algorithm, though there are other options (see below). Again, you don't want d to be really small either, so if it works out to a really small number, you probably want to try another value for e, and compute a new d to match.
There is another way to compute your e and d. You can start by finding some number K that's congruent to 1 mod r, then factor it. Put the prime factors together to get two factors of roughly equal size, and use them as e and d.
As far as an attacker computing your d goes: you need r to compute this, and knowing r depends on knowing p and q. That's exactly why/where/how factoring comes into breaking RSA. If you factor n, then you know p and q. From them, you can find r, and from r you can compute the d that matches a known e.
So, let's work through the math to create a key pair. We're going to use primes that are much too small to be effective, but should be sufficient to demonstrate the ideas involved.
So let's start by picking a p and q (of course, both need to be primes):
p = 9999991
q = 11999989
From those we compute n and r:
n = 119999782000099
r = 119999760000120
Next we need to either pick e or else compute K, then factor it to get e and d. For the moment, we'll go with your suggestion of e=65537 (since 65537 is prime, the only possibility for it and r not being relative primes would be if r was an exact multiple of 65537, which we can verify is not the case quite easily).
From that, we need to compute our d. We can do that fairly easily (though not necessarily very quickly) using the "Extended" version of Euclid's algorithm, (as you mentioned) Euler's Totient, Gauss' method, or any of a number of others.
For the moment, I'll compute it using Gauss' method:
template <class num>
num gcd(num a, num b) {
num r;
while (b > 0) {
r = a % b;
a = b;
b = r;
}
return a;
}
template <class num>
num find_inverse(num a, num p) {
num g, z;
if (gcd(a, p) > 1) return 0;
z = 1;
while (a > 1) {
z += p;
if ((g=gcd(a, z))> 1) {
a /= g;
z /= g;
}
}
return z;
}
The result we get is:
d = 38110914516113
Then we can plug these into an implementation of RSA, and use them to encrypt and decrypt a message.
So, let's encrypt "Very Secret Message!". Using the e and n given above, that encrypts to:
74603288122996
49544151279887
83011912841578
96347106356362
20256165166509
66272049143842
49544151279887
22863535059597
83011912841578
49544151279887
96446347654908
20256165166509
87232607087245
49544151279887
68304272579690
68304272579690
87665372487589
26633960965444
49544151279887
15733234551614
And, using the d given above, that decrypts back to the original. Code to do the encryption/decryption (using hard-coded keys and modulus) looks like this:
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <functional>
typedef unsigned long long num;
const num e_key = 65537;
const num d_key = 38110914516113;
const num n = 119999782000099;
template <class T>
T mul_mod(T a, T b, T m) {
if (m == 0) return a * b;
T r = T();
while (a > 0) {
if (a & 1)
if ((r += b) > m) r %= m;
a >>= 1;
if ((b <<= 1) > m) b %= m;
}
return r;
}
template <class T>
T pow_mod(T a, T n, T m) {
T r = 1;
while (n > 0) {
if (n & 1)
r = mul_mod(r, a, m);
a = mul_mod(a, a, m);
n >>= 1;
}
return r;
}
int main() {
std::string msg = "Very Secret Message!";
std::vector<num> encrypted;
std::cout << "Original message: " << msg << '\n';
std::transform(msg.begin(), msg.end(),
std::back_inserter(encrypted),
[&](num val) { return pow_mod(val, e_key, n); });
std::cout << "Encrypted message:\n";
std::copy(encrypted.begin(), encrypted.end(), std::ostream_iterator<num>(std::cout, "\n"));
std::cout << "\n";
std::cout << "Decrypted message: ";
std::transform(encrypted.begin(), encrypted.end(),
std::ostream_iterator<char>(std::cout, ""),
[](num val) { return pow_mod(val, d_key, n); });
std::cout << "\n";
}
To have even a hope of security, you need to use a much larger modulus though--hundreds of bits at the very least (and perhaps a thousand or more for the paranoid). You could do that with a normal arbitrary precision integer library, or routines written specifically for the task at hand. RSA is inherently fairly slow, so at one time most implementations used code with lots of hairy optimization to do the job. Nowadays, hardware is fast enough that you can probably get away with a fairly average large-integer library fairly easily (especially since in real use, you only want to use RSA to encrypt/decrypt a key for a symmetrical algorithm, not to encrypt the raw data).
Even with a modulus of suitable size (and the code modified to support the large numbers needed), this is still what's sometimes referred to as "textbook RSA", and it's not really suitable for much in the way of real encryption. For example, right now, it's encrypting one byte of the input at a time. This leaves noticeable patterns in the encrypted data. It's trivial to look at the encrypted data above and see than the second and seventh words are identical--because both are the encrypted form of e (which also occurs a couple of other places in the message).
As it stands right now, this can be attacked as a simple substitution code. e is the most common letter in English, so we can (correctly) guess that the most common word in the encrypted data represents e (and relative frequencies of letters in various languages are well known). Worse, we can also look at things like pairs and triplets of letters to improve the attack. For example, if we see the same word twice in succession in the encrypted data, we know we're seeing a double letter, which can only be a few letters in normal English text. Bottom line: even though RSA itself can be quite strong, the way of using it shown above definitely is not.
To prevent that problem, with a (say) 512-bit key, we'd also process the input in 512-bit chunks. That means we only have a repetition if there are two places in the original input that go for 512 bits at a time that are all entirely identical. Even if that happens, it's relatively difficult to guess that that would be, so although it's undesirable, it's not nearly as vulnerable as with the byte-by-byte version shown above. In addition, you always want to pad the input to a multiple of the size being encrypted.
Reference
https://crypto.stackexchange.com/questions/1448/definition-of-textbook-rsa

Using Boost PRNG to make a huge lookup table of random numbers

I'm trying to use Boost's normal distribution to generate random numbers given different seeds. In other words, I need the same random numbers produced for seed1, seed2, etc.; thousands of seeds will be passed to the function over the course of the simulation. The random number generator will never be used unseeded. [Edit: "Key" is a better word than "seed"--see final description block below.] I'm not sure whether it makes the most sense to generate a single RNG and reseed it (and if so, how) or if it's easier to generate a new one each time. Here's what I have so far, which involves the construction of a new, seeded rng at each request for a random normal number:
double rnorm( int thisSeed ) {
boost::mt19937 rng( thisSeed );
boost::normal_distribution<> nd( 0.0, 1.0 ); // (mean, sd)
boost::variate_generator > var_nor( rng, nd );
return var_nor();
}
Is this dumb? I'm new to PRNGs and especially Boost's implementation.
A more thorough description of why I'm doing this:
I am creating a huge random energy landscape to simulate protein interactions: each sequence has a particular energy that's calculated as the sum of quenched Gaussian random numbers that depend on the values of particular amino acids at particular positions (and a few other sequence attributes). I want to use the PRNG to calculate what these pseudorandom values are: these values must be consistent (the same sequence should yield the same values), but there are way too many to store. As a simple example, I might have a sequence ARNDAMR and compute its total energy based on two subenergies: one is a random normal number that depends on having A in position 1 and D at position 4, and the other subenergy is a random number that depends on the last three amino acids. I'm converting the configurations into keys for use as seeds (arguments) for my PRNG. Many thousands of sequences will be constructed and mutated, so I need a way to compute energies quickly--so I need to know how best to seed and call my RNG. I will not be using the Boost RNG for anything other than these energy value "lookups."
Further (tl;dr) explanation:
I am going to have "key" values that are integers between 1 and 10^6 or 10^7. I want each to map to a Gaussian random number. There should not be any cross-correlation between the key values and their numbers (e.g., keys 145-148 should not map to autocorrelated "random" numbers).
I need a given key to return the same random number each time it (the key) is called in the simulation. I do not want to store the key-random number pairs in a lookup table.
Your approach fundmentally misunderstands how PRNGs work. If you reseed on every use, then you won't get random numbers at all, you'll just get a bad hash function of the seed. In particular, your numbers won't be normally distributed even if you're calling the PRNG's normal distribution function, because the PRNG only guarantees that the random numbers generated from a particular seed will be normal.
If you need a large set of random numbers to be repeatable for a specific set of inputs, then generate a single number which is a function of those inputs, seed the PRNG with that, then get numbers from the PRNG in a predictable sequence; it will produce the same sequence for the same inputs, and the numbers will be properly distributed by the PRNG.
If the set of inputs you use to determine the random sequence is large (and in particular, larger that the size of the seed for your PRNG), then you won't have a unique sequence for every set of inputs. That might be OK for your application, or you might want to use a PRNG with larger seeds.
Take a look at my public domain ojrandlib. It uses big seeds, and generates normally distributed numbers with the fast Ziggurat algorithm.
Edit after seeing your clarification:
Ah, now I see. There's no such thing as "a" Gaussian random. Distribution only makes sense with regard to the whole sequence from one seed, so what you need to do is create and seed a single generator, then fetch the Nth random value from that generator for each of your keys N. If you're not doing this in order (that is, if you're fetching from keys totally at random and not as part of a sequence) this will be very slow, but still possible. You may want to see if you can force a sequence, say by sorting the keys before you fetch them.
ojrandlib has a function discard() for this too, so that if you need to find the 1,000,000th number in a sequence, you can seed the PRNG and discard 999,999 of them, which is faster than actually generating them, but will still be pretty slow.
Probably better: instead of using your key to seed a Gaussian generator, compute a good hash function of the key + fixed seed (which will result in uniformly distributed random bits), then interpret those hash bits as two uniform floats, then do the Box-Muller or Ziggurat with those to transform the distribution. That way, the numbers you get will be all from the same "seed" (which is the input to the hash), but normally distributed. You don't need a cryptographically secure hash, so something like MurMurHash might work well, though you would probably be better off rolling your own for such a special purpose.
Thought users of my library might have similar problems to yours, so I investigated some possibilities. Here's some code that might work for you:
/* Thomas Wang's 32-bit integer hash */
uint32_t nth_rand32(uint32_t a) {
a -= a << 6;
a ^= a >> 17;
a -= a << 9;
a ^= a << 4;
a -= a << 3;
a ^= a << 10;
a ^= a >> 15;
return a;
}
/* Marsaglia polar method */
double nth_normal(int index) {
double f, g, w;
int skip = 0;
uint64_t x, y;
do {
x = (uint64_t)nth_rand32((index & ~1) + skip);
y = (uint64_t)nth_rand32((index | 1) + skip);
skip += 0x40000001;
x = (x << 20) | 0x3ff0000000000000ull;
f = *(double *)(&x) * 2.0 - 3.0;
y = (y << 20) | 0x3ff0000000000000ull;
g = *(double *)(&y) * 2.0 - 3.0;
w = f * f + g * g;
} while (w >= 1.0 || w == 0.0);
w = sqrt((-2.0 * log(w)) / w);
if (index & 1) w *= f;
else w *= g;
return w;
}
The hash doesn't pass diehard, but it's pretty good. I generated 10,000,000 random normals, and got this distribution (if this image upload works):
Not perfect, but not too bad. It would be a lot better with a more expensive hash, but I'll let you decide where the speed/accuracy tradeoff is for you.