I need your help to solve this problem. I have a set of tasks, each task has its execution time. I have two types of constraints. first type is the precedence relationships between tasks. The second constraint type is allowing set of tasks to be in execution at the same time. For example : I have a graph G with 6 tasks and the following edges (T1,T2),(T2,T3),(T4,T3),(T4,T5) and (T6,T5). Suppose that T1,T4 are able to execute together and also T1,T6 but not T4,T6. Taking into account the execution time for each task. How to find the schedule which satisfies the precedence relationships between tasks and also minimize the length of the schedule taking into consideration the parallel execution of some tasks.
If the exclusion constraints ("T1,T4 are able to execute together") wouldn't be there (and no other constraints added), you could just start every task by taking the maximum finish time of all of its preceding tasks. That would be optimal and scale well. You would automatically get the shortest makespan. It would not be NP-complete/hard, and not job shop scheduling.
Unfortunately, the exclusion constraint (and potentially any other you add in the future) turn it into job shop scheduling (as mentioned by Lars), which is NP-complete/hard. See this video of job shop scheduling variant of an open source java implementation, that demo's why some tasks start later than their preceding tasks are finished. To solve that, look into heuristics, Metaheuristics (Tabu Search, ...) or other related techniques.
To remain simple, you can use a constructive heuristic based on priority rules along with a schedule generatiin scheme or also called SGS, see this for further reference. The heuristic will generate a ordered list of activities according to some criteria and the SGS will take this list as input and will generate the schedule. In your implementation of the SGS, you will tell if two tasks may or may not be executed in parallel basing in your second constraint.
If you want something more robust, you can use a Metaheuristic, where basically you will generate a solution (list of tasks) and modify this solution using local search techniques, exploring your solutions search space. You could generate solutions based on priority rules and evaluate them with the SGS implementation. This is just a simplified explanation of how a Metaheuristic will works, there are several variances. A good example of Metaheuristic is the Simulated Annealing, applied to the RCPSP problem: http://www.sciencedirect.com/science/article/pii/S0377221702007610.
Related
I am trying to find any easy-to-implement algorithms for the offline scheduling of parallel Jobs comprising ordered tasks among workers to minimize makespan in the special case where the workers are unique in what they can do (rather than the typical case where workers can do any task but may take different times) subject to the constraint that a worker must finish a task before it can move onto another.
I am more concerned with ease of implementation than computational complexity as the number of workers, jobs, and tasks per job are pretty small (orders: ~10, <10, and 10-30 respectively).
The specific property of the agents being distinct in what they can do rather than how long they take to perform a task has made it hard for me to find an algorithm (or near algorithm for me to start from). When searching for algorithm , I have tried recasting this as a tiling problem when (as it's similar to stacking Gantt charts on top of each other) and have looked into how I would cast it into a graph problem to no avail.
The closest I've found so far have been dos Santos 2019, Spegal 2019, Schulz & Skutella 2002, but these require that I cast the problem as some machines taking infinite time for mismatched operations and account for other scheduling properties that are not applicable to this problem--and I do not know enough about these algorithms to know if setting them to bypassed values would break them.
I think what you're describing is known as the (inflexible) job-shop scheduling problem (with precedence ordering and non-preemptive scheduling). If you are looking for a low barrier to implementation, I would recommend an existing module like ortools. It's even defined in a manner similar to the example you provided.
For designing some algorithm I need to simulate the map-reduce environment. I assume that I have couple of jobs and each of them consists of set of map and reduce tasks. I have to make assumption about processing time of maps and reduce tasks.
For example job "j1" has 3 map tasks and 2 reduce tasks. Now is there any constraint in processing time of map tasks vs reduce tasks? How is it usually?
It would be difficult to make any assumptions without knowing what your map and reduce tasks do. The processing time of the map or reduce tasks of depends entirely on what you want them to do, you can't really make a blanket assumption.
For example, your individual map function could be processing an individual file as input, or an individual line, or an individual word, all of which directly effect the processing time.
The reducer is the same way; it could do a lot of processing, a little processing, or even no processing at all. (With Hadoop's implementation of MapReduce, you don't even have to have a reducer for your MapReduce task, evidencing the fact that the amount of processing varies). It just depends what the individual task calls for.
If you have an idea of what the simulated MapReduce jobs would actually be doing, you can use that to determine what the general processing times of the different tasks would be in comparison to each other.
Problem Description:
there is n tasks, and in these tasks, one might be dependent on the others, which means if A is dependent on B, then B must be finished before A gets finished.
1.find a way to finish these tasks as quickly as possible?
2.if take parallelism into account, how to design the program to finish these tasks?
Question:
Apparently, the answer to the first question is, topological-sort these tasks, then finish them in that order.
But how to do the job if parallelism taken into consideration?
My answer was,first topological-sort these tasks, then pick those tasks which are independent and finish them first, then pick and finish those independent ones in the rest...
Am I right?
Topological sort algorithms may give you various different result orders, so you cannot just take the first few elements and assume them to be the independent ones.
Instead of topological sorting I'd suggest to sort your tasks by the number of incoming dependency edges. So, for example if your graph has A --> B, A --> C, B --> C, D-->C you would sort it as A[0], D[0], B[1], C[3] where [i] is the number of incoming edges.
With topological sorting, you could also have gotting A,B,D,C. In that case, it wouldn't be easy to find out that you can execute A and D in parallel.
Note that after a task was completely processed you then have to update the remaining tasks, in particular, the ones that were dependent on the finished task. However, if the number of dependencies going into a task is limited to a relatively small number (say a few hundreds), you can easily rely on something like radix/bucket-sort and keep the sort structure updated in constant time.
With this approach, you can also easily start new tasks, once a single parallel task has finished. Simply update the dependency counts, and start all tasks that now have 0 incoming dependencies.
Note that this approach assumes you have enough processing power to process all tasks that have no dependencies at the same time. If you have limited resources and care for an optimal solution in terms of processing time, then you'd have to invest more effort, as the problem becomes NP-hard (as arne already mentioned).
So to answer your original question: Yes, you are basically right, however, you lacked to explain how to determine those independent tasks efficiently (see my example above).
I would try sorting them in a directed forest structure with task execution time as edge weigths. Order the arborescences from heaviest to lightest and start with the heaviest. Using this approach you can, at the same time, check for circular dependencies.
Using parallelism, you get the bin problem, which is NP-hard. Try looking up approximative solutions for that problem.
Have a look at the Critical Path Method, taken from the are of project management. It basically do what you need: given tasks with dependecies and durations, it produces how much time it will take, and when to activate each task.
(*)Note that this technique is assuming infinite number of resources for optimal solution. For limited resources there are heuristics for greedy algorithms such as: GPRW [current+following tasks time] or MSLK [minimum total slack time].
(*)Also note, it requires knowing [or at least estimating] how long will each task take.
So, the questions are:
1. Is mapreduce overhead too high for the following problem? Does anyone have an idea of how long each map/reduce cycle (in Disco for example) takes for a very light job?
2. Is there a better alternative to mapreduce for this problem?
In map reduce terms my program consists of 60 map phases and 60 reduce phases all of which together need to be completed in 1 second. One of the problems I need to solve this way is a minimum search with about 64000 variables. The hessian matrix for the search is a block matrix, 1000 blocks of size 64x64 along a diagonal, and one row of blocks on the extreme right and bottom. The last section of : block matrix inversion algorithm shows how this is done. Each of the Schur complements S_A and S_D can be computed in one mapreduce step. The computation of the inverse takes one more step.
From my research so far, mpi4py seems like a good bet. Each process can do a compute step and report back to the client after each step, and the client can report back with new state variables for the cycle to continue. This way the process state is not lost computation can be continued with any updates.
http://mpi4py.scipy.org/docs/usrman/index.html
This wiki holds some suggestions, but does anyone have a direction on the most developed solution:
http://wiki.python.org/moin/ParallelProcessing
Thanks !
MPI is a communication protocol that allows for the implementation of parallel processing by passing messages between cluster nodes. The parallel processing model that is implemented with MPI depends upon the programmer.
I haven't had any experience with MapReduce but it seems to me that it is a specific parallel processing model and is designed to be simple to implement. This kind of abstraction should save you programming time and may or may not provide a suitable solution to your problem. It all depends on the nature of what you are trying to do.
The trick with parallel processing is that the most suitable solution is often problem specific and without knowing more specifics about your problem it is hard to make recommendations.
If you can tell us more about the environment that you are running your job on and where your program fits into Flynn's taxonomy, I might be able to provide some more helpful suggestions.
I have a computational algebra task I need to code up. The problem is broken into well-defined individuals tasks that naturally form a tree - the task is combinatorial in nature, so there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large.
I had previously coded this up in a functional fashion, calling the functions as needed and storing everything in RAM. This was a terrible approach, but I was more concerned about the theory then.
I'm planning to rewrite the code in C++ for a variety of reasons. I have a few requirements:
Checkpointing: The calculation takes a long time, so I need to be able to stop at any point and resume later.
Separate individual tasks as objects: This helps me keep a good handle of where I am in the computations, and offers a clean way to do checkpointing via serialization.
Multi-threading: The task is clearly embarrassingly parallel, so it'd be neat to exploit that. I'd probably want to use Boost threads for this.
I would like suggestions on how to actually implement such a system. Ways I've thought of doing it:
Implement tasks as a simple stack. When you hit a task that needs subcalculations done, it checks if it has all the subcalculations it requires. If not, it creates the subtasks and throws them onto the stack. If it does, then it calculates its result and pops itself from the stack.
Store the tasks as a tree and do something like a depth-first visitor pattern. This would create all the tasks at the start and then computation would just traverse the tree.
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
I feel like I'm over-thinking it and there's already a simple, well-established way to do something like this. Is there one?
Technical details in case they matter:
The task tree has 5 levels.
Branching factor of the tree is really small (say, between 2 and 5) for all levels except the lowest which is on the order of a few million.
Each individual task would only need to store a result tens of bytes large. I don't mind using the disk as much as possible, so long as it doesn't kill performance.
For debugging, I'd have to be able to recall/recalculate any individual task.
All the calculations are discrete mathematics: calculations with integers, polynomials, and groups. No floating point at all.
there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large... blah blah resuming, multi-threading, etc.
Correct me if I'm wrong, but it seems to me that you are exactly describing a map-reduce algorithm.
Just read what wikipedia says about map-reduce :
"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.
Using an existing mapreduce framework could save you a huge amount of time.
I just google "map reduce C++" and I start to get results, notably one in boost http://www.craighenderson.co.uk/mapreduce/
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
You definitely do not want millions of CPU-bound threads. You want at most N CPU-bound threads, where N is the product of the number of CPUs and the number of cores per CPU on your machine. Exceed N by a little bit and you are slowing things down a bit. Exceed N by a lot and you are slowing things down a whole lot. The machine will spend almost all its time swapping threads in and out of context, spending very little time executing the threads themselves. Exceed N by a whole lot and you will most likely crash your machine (or hit some limit on threads). If you want to farm lots and lots (and lots and lots) of parallel tasks out at once, you either need to use multiple machines or use your graphics card.