How to entrust primality test to unreliable clients? - primes

Let's think about a two player game where each player should say distinct prime numbers every turn. If you repeat a prime previously said or say a composite number, you lose. Let's call this Prime Number Game. I want to implement this game without burdening the server with all the primality tests. I want instead the clients to burden the primality test for each prime they said. But since they are unreliable clients there should be a verification process, but repeating the primality test from scratch on the server is against the purpose. So I want the client to send some hints gotten during its primality tests to the server to significantly ease the burden.
For example, Composite Number Game can achieve this by requesting clients to send the composite number c and a factor c'(1 < c' < c) of it too. But I have no idea what hints should clients send to server for Prime Number Game. How could I achieve this? What primality test algorithm should the clients use for this purpose?

Related

Make random numbers without using time seed

I hope you are well!
I use random numbers that depend on time for my encryption.
But this makes my encryption hackable...
My question is how can I make randoms that are not the same every time after running the randoms and also do not use seed time.
Most of the time we randomize like this:
srand ( ( unsigned ) time ( 0 ) );
cout << "the random number is: " << rand() % 11;
But we used time here and the hacker can understand the random numbers by having the program run time.
srand and/or rand aren't suited to your use at all.
Although it's not 100% guaranteed, the C++ standard library provides a random_device that's at least intended to provide access to true random number generation hardware. It has a member named entropy() that's intended to give an estimate of the actual entropy available--but it can be implemented as a pseudo-random number generator, in which case this should return 0. Assuming entropy() returns a non-zero result, use random_device to retrieve as much random data as you need.
Most operating systems provide their own access to random number generation hardware. On Windows you'd typically use BCryptGenRandom. On UNIXesque OSes, you can read an appropriate number of bytes from /dev/random/ (there's also /dev/urandom, but it's potentially non-blocking, so it could return data that's less random if there isn't enough entropy available immediately--that's probably fine if you're simulating rolling dice or shuffling cards in a game or something like that, but probably not acceptable for your purpose).
There are pseudo random number generators (PRNG) of very different quality and typical usage. Typical characteristics of PRNGs are:
The repeating period. This is the number of different numbers, that a PRNG can produce and for good generators this is almost always two to the power of the bitwidth.
Hidden patterns. Some bad PRNGs have hidden patterns. Obviously they decrease the quality of the random numbers. There are tests, that can be used to quantify the quality in this regard. This property, as well as the speed are mainly important for scientific usage like monte carlo simulation.
Cryptographic security. For cryptographic operations there is the need of special PRNGs with very specific use cases. You can reed more about them for example on wikipedia.
What I am trying to show you here, is that the choice of the right PRNG is just as important as the right choice of the seed. It turns out that the C rand() performs very bad in all three of the above categories (with exception maybe of the speed). That means if you seed rand() once and repeatedly call it and print say rand() % 11, an attacker will be able to synchronize to the PRNG after a short period of time even if you used the most secure and random seed. The rand() as well as most other, better random generators in the C++ standard library, were designed to be used for scientific calculations and are not suitable for cryptographic purposes.
If you need cryptographically secure random numbers I would suggest you to use a library that is build for exactly that purpose. A widely used crypto library, that can be used cross platform is OpenSSL. It includes a function called RAND_bytes() that can be used for secure random numbers. Do not forget to carefully read the manual page if you want to use it. You have to check the return value for errors!

How do I optimize the parallelization of Monte Carlo data generation with MPI?

I am currently building a Monte Carlo application in C++ and I have a question regarding parallelization with MPI.
The process I want to parallelize is the MC generation of data. To have good precision in my final results, I specify the goal number of data points. Each data point is generated independently, but might require vastly differing amounts of time.
How do I organize the parallelization and workload distribution of the data generation most efficiently?
What I have done so far
So far I have come up with three possible ways of organizing the MPI part of the code:
The simplest way, but most likely inefficient way: I divide the desired sample size by the number of workers and let every worker generate that amount of data in isolation. However, when the slowest worker finishes, all other workers have been idling for a potentially long time. They could have been "supporting" the slowest worker by sharing its workload.
Use a master: A master communicates with the workers who work continuously until the master process registers that we have enough data and tells everybody to stop what they are doing. The disadvantage I see is that the master process might not be necessary and could be generating data instead (especially when I don't have a lot of workers).
A "ring communication" algorithm I came up with myself: A message is continuously sent and updated in a circle (1->2, 2->3, ... , N ->1). This message contains the global number of generated data point. Once the desired goal is met, the message is tagged, circles one more time and thereby tells everybody to stop working. Important here is I use non-blocking communication (with MPI_Iprobe before receiving via MPI_Recv, and sending via MPI_Isend). This way, everybody works, and no one ever idles.
No matter, which solution is chosen, in the end I reduce all data sets to one big set and continue to process the data.
The concrete questions:
Is there an "optimal" way of parallelizing such a fairly simple process? Would you prefer any of the proposed solutions for some reason?
What do you think of this "ring communication" solution?
I'm sure I'm not the first one to come up with e.g. the ring communication algorithm. I have tried to google this problem, but it seems to me that I do not know the right terminology in this context. I'm sure there must be a lot of material and literature on such simple algorithms, but I never had a formal course on MPI/parallelization. What are the "keywords" to look for?
Any advice and tips are much appreciated.

Akka streaming - accessible examples of manual flow control, especially cycles or recursion

See my other question: How can I modify my Akka streams Prime sieve to exclude modulo checks for known primes? for an example of the kind of problem I have in mind.
In studying Akka streams thus far I have been surprised by how little discussion is devoted towards the specifics on flow control within the pipe, especially in terms of 1) timing/waiting, 2) futures within GraphDSL.create, and 3) making use of existing, already processed values.
Certain flows, for instance, have a "costly calculation step" that is limiting. For primes, that's checking for all p to Math.floor(Math.sqrt(whatvever)). Well, it would be best if I only checked the known p modulo, but I end up checking ALL numbers less than the root because I don't know how to use the data I am gathering. With a cheap lookup step in plcae, I could save a lot of time.
Similarly, maybe I want to work on implementing a flow at such a low level it might a well be item by item. I was expecting to find examples where I could do something like... `Item(n = 3) => Add three seconds of runtime, Item(n = -1) => Hold for 1 second... whateer. Is there a way to work at that level and remain within akka?

Organizing a task-based scientific computation

I have a computational algebra task I need to code up. The problem is broken into well-defined individuals tasks that naturally form a tree - the task is combinatorial in nature, so there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large.
I had previously coded this up in a functional fashion, calling the functions as needed and storing everything in RAM. This was a terrible approach, but I was more concerned about the theory then.
I'm planning to rewrite the code in C++ for a variety of reasons. I have a few requirements:
Checkpointing: The calculation takes a long time, so I need to be able to stop at any point and resume later.
Separate individual tasks as objects: This helps me keep a good handle of where I am in the computations, and offers a clean way to do checkpointing via serialization.
Multi-threading: The task is clearly embarrassingly parallel, so it'd be neat to exploit that. I'd probably want to use Boost threads for this.
I would like suggestions on how to actually implement such a system. Ways I've thought of doing it:
Implement tasks as a simple stack. When you hit a task that needs subcalculations done, it checks if it has all the subcalculations it requires. If not, it creates the subtasks and throws them onto the stack. If it does, then it calculates its result and pops itself from the stack.
Store the tasks as a tree and do something like a depth-first visitor pattern. This would create all the tasks at the start and then computation would just traverse the tree.
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
I feel like I'm over-thinking it and there's already a simple, well-established way to do something like this. Is there one?
Technical details in case they matter:
The task tree has 5 levels.
Branching factor of the tree is really small (say, between 2 and 5) for all levels except the lowest which is on the order of a few million.
Each individual task would only need to store a result tens of bytes large. I don't mind using the disk as much as possible, so long as it doesn't kill performance.
For debugging, I'd have to be able to recall/recalculate any individual task.
All the calculations are discrete mathematics: calculations with integers, polynomials, and groups. No floating point at all.
there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large... blah blah resuming, multi-threading, etc.
Correct me if I'm wrong, but it seems to me that you are exactly describing a map-reduce algorithm.
Just read what wikipedia says about map-reduce :
"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.
Using an existing mapreduce framework could save you a huge amount of time.
I just google "map reduce C++" and I start to get results, notably one in boost http://www.craighenderson.co.uk/mapreduce/
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
You definitely do not want millions of CPU-bound threads. You want at most N CPU-bound threads, where N is the product of the number of CPUs and the number of cores per CPU on your machine. Exceed N by a little bit and you are slowing things down a bit. Exceed N by a lot and you are slowing things down a whole lot. The machine will spend almost all its time swapping threads in and out of context, spending very little time executing the threads themselves. Exceed N by a whole lot and you will most likely crash your machine (or hit some limit on threads). If you want to farm lots and lots (and lots and lots) of parallel tasks out at once, you either need to use multiple machines or use your graphics card.

Complexity of external merge sort

What is the complexity of a 2 phase multi-way external sort using quick sort (nlogn) as internal sort.
Not an expert here but...
If I understand correctly, what you describe as phases are the number of passes your algorithm will make over the input, right? In this case, a running time approximation would be the number of passes (2 in your case) * the time necessary to read and write the whole input to the external device.
When evaluating complexity of such algorithms it is hard to put it in usual running time terms. There are many aspects that could influence the result (sequential/non-sequential access, technology, etc). The common approach is to provide complexity in terms of passes, which accounts from the number of devices used, the number of items in the input, and the number of items that can fit in memory.
The point is that the sorting algorithm is dominated by the IO operations. Internal quick sort should be ok (although its quadratic worst case).
Also, I'm not sure if you counted the initial distribution. This is also a pass.