Learning about multithreading. Tried to make a prime number finder - c++

I'm studying for a uni project and one of the requirements is to include multithreading. I decided to make a prime number finder and - while it works - it's rather slow. My best guess is that this has to do with the amount of threads I'm creating and destroying.
My approach was to take the range of primes that are below N, and distribute these evenly across M threads (where M = number of cores (in my case 8)), however these threads are being created and destroyed every time N increases.
Pseudocode looks like this:
for each core
# new thread
for i in (range / numberOfCores) * currentCore
if !possiblePrimeIsntActuallyPrime
if possiblePrime % i == 0
possiblePrimeIsntActuallyPrime = true
return
else
return
Which does work, but 8 threads being created for every possible prime seems to be slowing the system down.
Any suggestions on how to optimise this further?

Use thread pooling.
Create 8 threads and store them in an array. Feed it new data each time one ends and start it again. This will prevent them from having to be created and destroyed each time.
Also, when calculating your range of numbers to check, only check up to ceil(sqrt(N)) as anything after that is guaranteed to either not go into it or the other corresponding factor has already been checked. i.e. ceil(sqrt(24)) is 5.
Once you check 5 you don't need to check anything else because 6 goes into 24 4 times and 4 has been checked, 8 goes into it 3 times and 3 has been checked, etc.

Related

How to find finish times of processes in cplex

I have a machine,batch scheduling problem. Finish time of a batch is "Z[b]" variable. There are three machines(f represent machines). If a machine starts processing a specific batch at time t X[f][b][t] equals to 1.
"P[b]" parameter is the proccesing time of the batches. I need to find ending times of batches.Tried this constraint.t is the range of time for example 48 hours.
"forall(p in B) Z[p]-(sum(n in F)sum(a in 1..t-P[p]+1)(a+P[p])*X[n][p][a])==0 ;"
I have 3 machines but this constraint just use 2 machines at time 1. Also Z[p] values is not logical.How can i fix this?
Within CPLEX you have CPOptimizer that is good at scheduling.
And to get the end of an interval , endOf(itvs) works fine

Can the minimum time to schedule tasks be found without brute force?

If I have a list of integers representing the time it takes for a task to be completed and I have x workers that can only work on one task until the time it takes to complete is up, can I find the minimum time it could possibly take in a best case scenario? I do not need the exact permutation that makes up this minimum completion time, just the time.
For example, to make it simple, if I have a list [2, 4, 6] and I have 2 workers then if I start with 2 and 4 then when 2 finishes 6 will start meaning that it will take 8 seconds to complete all tasks. However if I start with 6 and 2 then when 2 finishes 4 will start and finish at the same time as 6, therefore the tasks only take 6 seconds if done in this order.
Is there a way of knowing that it will only take 6 seconds that is better than n! or brute force complexity that guarantees it is the minimum time possible? Thank you for any help in advance please feel free to ask questions if I left out any details or you're confused!
edit: please help :(
edit 2: is it even possible? Anyone know?
In the case of a single worker, then the actual total time required is the same as the sum of all task times.
jobs = [ 2, 4, 6, etc... ]
time_required = SUM( jobs )
In the case of two workers, then given a specific ordering of jobs the total-time required can be determined by first assigning each task's required time to whichever worker has the current lowest sum associated with it, then getting the highest sum associated with each worker:
define type worker = vector<time_t>
define type workers = min_priority_queue<worker> using worker.sum() # so the current worker.sum() (the sum of `time_t` values in `vector<time_t>`) is the priority-queue key.
define type task = int
jobs = [ 2, 4, 6, etc... ]
# Use two workers:
workers.add( new worker )
workers.add( new worker )
# Iterate once through each job:
foreach( task t in jobs ) {
minWorker = workers.getMinWorker() # priority queue "find-min" operation
minWorker.add( t )
}
# Determine which worker will work the longest time:
time_required = workers.getMaxWorker().sum() # priority queue "find-max" operation
Because this is an actual solution, then the time_required is a point-sample that exists between the upper and lower-bounds - which isn't exactly what you're after, but because it can be computed in O(n) time it's a good starting point.
The above algorithm can then be generalised to any number of workers just by adding them to the priority queue - as heap-based priority queues' find-min operation is O(1) I believe this algorithm runs in O(n) time where n is the number of jobs, independent of the number of workers. (I may be wrong about the precise runtime complexity).
As for computing bounds in less time than O(n!) time... that's tricky (at least for me, as it's been a few years since I last cracked-open my copy of CLRS).
A minimal lower-bound for x workers for any order of jobs is simply the largest single value in the job set.
A maximal upper-bound for x workers for any order of jobs could be the sum of the largest 100 * (1/x) % of jobs (so given 2 workers it's the sum of the largest 50% jobs, for 3 workers it's the sum of the largest 33% jobs, for 4 workers it's 25%, etc). This will require you to sort the set first (taking O(n log n) if using Quicksort).
jobs = [ 2, 4, 6, etc... ]
worker_count = 2
jobs.sortDescending() # O(n log n)
# if there's 50 jobs and 2 workers, then take the first 25 jobs and sum them...
# ...that's an upper_bound for the time required to complete all tasks by 2 workers, as it assumes that 1 unlucky worker will get all of the longest tasks
upper_bound = jobs.take( jobs.count / worker_count ).sum()

Running three functions at once?

C++
Working on a problem for school, running 3 horses in a race and whoever finishes first is the winner. the 3 horses are supposed to run in sync like this
1|--------H------|
2|-------H-------|
3|---------H-----|
However my code runs the program correctly (generate a random number between 1 and 100 and if that number is less than 50 move the horse 1 space up). but it runs the first horse, then the 2nd and the 3rd last.
tried to look this up but none of the methods seem to work (using codeblocks (latest version Windows 10) for C++).
srand(time(NULL));
Horse1();
Horse2();
Horse3();
Github file: https://gist.github.com/EthanA2020/f16a699f1b8136a1da0350ab48acdda0
I don't think your issue is with the type of function but instead the structure of your program. No matter how you program, one operation must come before the next. Developers work with this by running each operation of the object (in your case the horse movement) side by side and checking later to see the outcome.
For example, lets use your horse scenario:
while "all horses" are less than "finish"
horse 1 moves
horse 2 moves
horse 3 moves
I am sure that you are familiar with loops so we'll use that here. Some set distance must exist to determine when a horse has finished. So you'll want to continue that loop while all horses have a distance less than that value. During each loop, each horse's movement value must either change or not (determined by your random movement function).
Now once this while loop has ended, you can be sure that at least one horse has crossed the finish line. All operations have been completed and you have a data set of the horses positions. This is the point where you check to see which horses have finished (I say horses plural because there is a chance that more than one horse or even all 3 finish at the same time, so be sure to factor that in at the end).
With that, your program structure should be something like:
while "all horses" are less than "finish"
horse 1 moves
horse 2 moves
horse 3 moves
//movement of horses complete
check and print the horses with a movement value of "finish"
I think you should do:
while (!horse(rand() % 100)) {
usleep(100);
}
Where horse(int n) moves horse n 1 position and if it reached the end, it returns true (to end the race). It does nothing if an invalid n (only 1 to 3 is valid) is passed to it.

Go Worker Pool doesn't seem to be processing Concurrently

Hello I'm brand new to go (and concurrent programming in general :() and trying to distribute a slow computation to a pool of workers.
http://play.golang.org/p/lTv4Tm75A4
func main() {
test := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
answer := getSmallestMultiple(test)
fmt.Println(answer)
}
I am trying to find the smallest number that is evenly divisible by all the numbers in test.
I have created a pool of workers and am sending them values until one of the goroutines finds a number that can be evenly divided by all the numbers in test
for w := 0; w < 100; w++ {
go divisibleByAllNumbers(&numbers, jobs, answer)
}
go func() {
for i := max; ; i += max {
fmt.Printf("Sending # %d\n", i)
jobs <- i
}
}()
The program seems to be running at the same speed despite how many workers I start. I have tried many number of workers and it always takes the same number of seconds to run, which seems like the work is not being done concurrently at all.
Each worker is consuming work from the queue using range:
for j := range jobs {}
And i was hoping the more processes consuming off the jobs channel the faster the program would execute.
I have also tried different values for the jobs := make(chan int) buffer value
I have stared at this all day and was hoping someone could see what the issue is. I would expect the more workers I add the faster the computation takes but am not experiencing that. I'm sure I"m missing some key concepts,
Thank you
http://golang.org/doc/effective_go.html#parallel
The current implementation of the Go runtime will not parallelize this code by default. It dedicates only a single core to user-level processing. An arbitrary number of goroutines can be blocked in system calls, but by default only one can be executing user-level code at any time. It should be smarter and one day it will be smarter, but until it is if you want CPU parallelism you must tell the run-time how many goroutines you want executing code simultaneously. There are two related ways to do this. Either run your job with environment variable GOMAXPROCS set to the number of cores to use or import the runtime package and call runtime.GOMAXPROCS(NCPU). A helpful value might be runtime.NumCPU(), which reports the number of logical CPUs on the local machine. Again, this requirement is expected to be retired as the scheduling and run-time improve.

Is a process can send to himself data? Using MPICH2

I have an upper triangular matrix and the result vector b.
My program need to solve the linear system:
Ax = b
using the pipeline method.
And one of the constraints is that the number of process is smaller than the number of
the equations (let's say it can be from 2 to numberOfEquations-1).
I don't have the code right now, I'm thinking about the pseudo code..
My Idea was that one of the processes will create the random upper triangular matrix (A)
the vector b.
lets say this is the random matrix:
1 2 3 4 5 6
0 1 7 8 9 10
0 0 1 12 13 14
0 0 0 1 16 17
0 0 0 0 1 18
0 0 0 0 0 1
and the vector b is [10 5 8 9 10 5]
and I have a smaller amount of processes than the number of equations (lets say 2 processes)
so what I thought is that some process will send to each process line from the matrix and the relevant number from vector b.
so the last line of the matrix and the last number in vector b will be send to
process[numProcs-1] (here i mean to the last process (process 1) )
than he compute the X and sends the result to process 0.
Now process 0 need to compute the 5 line of the matrix and here i'm stuck..
I have the X that was computed by process 1, but how can the process can send to himself
the next line of the matrix and the relevant number from vector b that need to be computed?
Is it possible? I don't think it's right to send to "myself"
Yes, MPI allows a process to send data to itself but one has to be extra careful about possible deadlocks when blocking operations are used. In that case one usually pairs a non-blocking send with blocking receive or vice versa, or one uses calls like MPI_Sendrecv. Sending a message to self usually ends up with the message simply being memory-copied from the source buffer to the destination one with no networking or other heavy machinery involved.
And no, communicating with self is not necessary a bad thing. The most obvious benefit is that it makes the code more symmetric as it removes/reduces the special logic needed to handle self-interaction. Sending to/receiving from self also happens in most collective communication calls. For example, MPI_Scatter also sends part of the data to the root process. To prevent some send-to-self cases that unnecessarily replicate data and decrease performance, MPI allows in-place mode (MPI_IN_PLACE) for most communication-related collectives.
Is it possible? I don't think it's right to send to "myself"
Sure, it is possible to communicate with oneself. There is even a communicator for it: MPI_COMM_SELF. Talking to yourself is not too uncommon.
Your setup sounds like you would rather use MPI collectives. Have a look at MPI_Scatter and MPI_Gather and see if they don't provide you with the functionality, you are looking for.