I am currently using SCIP for academic research. I met an insufficient memory error when I was trying to get all solutions of an ILP instance. I wonder if there is any setting that can output the solution immediately when the solver finds it instead of keeping all solutions in memory.
Related
The power flow library PyPSA uses Pyomo. I am trying to reduce the cost of each linear optimal power flow simulation.
I read through the Pyomo docs. Nothing sticks out at me yet. Perhaps it is not possible to split up the processing when solving linear optimisation problems.
Ubuntu 19.04, i5-4210U 1.70 GHz, 8 Gb RAM
When you talk about processing there are two things to consider. Processing to write the .lp file and processing to solve the problem with an optimization solver.
First, writing the .lp file is to my knowledge not yet parallelized in Pyomo. PyPSA developers created Linopy to parallel the processing to reduce RAM requirements and increase the speed.
Second, parallelizing the solver processing depends on the solver. PyPSA-Eur has an example of integration for that for Gurobi, and CPLEX. The performant open-source solver HiGHS can also something like that see here.
I am running a large LP (approximately 5M non-zeros) and I want to speed up the solving process. I tried a concurrent solve to test which algorithm solves my problem the quickest and I found that the barrier method is the clear winner (solver = Xpress MP, but I guess that it would be the same for other solvers).
However, I want to further speed it up. I noticed that the real barrier solve takes less than 1% of the total solution time. The remainder of the time is spend in the crossover (~40%) and the primal solve (to find the corner solution in the new basis) (~60%). Unfortunately, I couldn't find a setting to tell the solver to do the dual crossover (there is one in Cplex, but I don't have a license for Cplex). Therefore I couldn't compare if this would be quicker.
Therefore I tried to turn off the crossover which yields a huge speed increase, but there are some disadvantages according to the documentation. So far, the drawbacks that I know of are:
Barrier solutions tend to be midface solutions.
Barrier without crossover does not produce a basic solution (although that the solver settings mention that "The full primal and dual solution is available whether or not crossover is used.").
Without a basis, you will not be able to optimize the same or similar problems repeatedly using advanced start information.
Without a basis, you will not be able to obtain range information for performing sensitivity analysis.
My question(s) is(are) simple. What other drawbacks are important to justify the very inefficient crossover step (both Cplex and Xpress MP enable the crossover as default setting). Alternatively, is my problem exceptional and is the crossover step very quick in other problems? Finally, what is wrong with having midface solutions (this means that a corner optimum is also not unique)?
Sources:
http://www.cs.cornell.edu/w8/iisi/ilog/cplex101/usrcplex/solveBarrier2.html (Barrier algorithm high-level theory)
http://tomopt.com/docs/xpress/tomlab_xpress008.php (Xpress MP solver settings)
https://d-nb.info/1053702175/34 (p87)
Main disadvantage: the solution will be "ugly", that is many 0.000001 and 0.9999999 in the solution values. Secondly you may get somewhat different duals. Finally a basis is required to do fast "hot starts". One possible way to speed up large models up is to use a simplex method and using an advanced basis from a representative base run.
I ask this question, because we're really stuck at finding the cause of a software crash. I know that questions like "Why does the software crash" are not appreciated, but we really don't know how to find the problem.
We currently do a longterm test of our software. To find potential memory leaks, we used the windows tool Performance monitor to track several memory metrics, such as Private bytes, Working set and Virtual bytes.
The software ran quite a long time (about 30 hours) without any problems. It does the same all the time, reading in an image from the harddrive, doing some inspection and showing some results.
Then suddenly it crashes. Inspecting the memory metrics in the performance monitor, we saw that strange steep rising of the working set bytes graph at 10.17AM. We encountered this several times and according to the dumpfiles, the exception code is always 0xc0000005 : "the thread tried to read from or write to a virtual address for which it does not have the appropriate access", but it appears at different positions, where no pointers are used.
Does someone know, what could be the cause of such a steep rise of the working set and why this could cause a software crash? How could we find out, if our software has a bug, when every time, the crash occurs the position of the crash is at another position?
The application is written in C++ and it runs on a windows 7 32bit pc.
It's actually impossible to know from the information that you have provided, but I would suggest that you have some memory corruption (hence the access violation). It could be a buffer-overflow issue... for example there is a missing null character from a string and so something is being appended indefinitely?
Recommended next step is to download the Debugging Tools for Windows suite. Setup WinDbg with your correct symbol files, and analyse the stack trace, to find the general area of the crash. Depending on the cause of the memory corruption this will be more or less useful. You could have corrupted the memory a long time before your crash occurs.
Ideally also run a static analysis tool on the code.
Given information you have now, there is little chance to get an answer. You need more information, more specifically:
Get more intelligence (is there anything specific about that files which cause crash? What about last-but-one file?)
Insert more tracing and logging (as much as you can without making it 2x slower). At least you'll see where it crashes, and then will be able to insert more tracing/logging around that place
As you're on Windows - consider handling c0000005 via _set_se_translator, converting it into C++ exception, and even more logging on the way this exception is unwinded.
There is no silver bullet for this kind of problems, only gathering more information and figuring it out.
P.S. As an unlikely shot - I've seen similar things to be caused by a bug in MS heap; if you're not using LFH yet (not sure, it might be default now) - there is an 1% chance changing your default heap to LFH will help.
I'm creating a logistic project in C++ where I have to compare the execution time of a solver that I created with an open source solver.
So, what I need is to stop the solver that I created if it will run longer than the open source solver.
The problem is that I didn't find anything about a timer that stops the actual executing program.
Someone can help me?
You could just launch a future, that sleeps for a given time and then call std::exit.
Without further information about what you are solving I would suggest running both in a series of benchmarks using multiple objectives to solve if possible since both might perform differently in different situations. Running both in rigorous benchmarks will help make sure your results are valid. Also even if your solver takes longer knowing that time difference will be able to help you optimize it.
So, the questions are:
1. Is mapreduce overhead too high for the following problem? Does anyone have an idea of how long each map/reduce cycle (in Disco for example) takes for a very light job?
2. Is there a better alternative to mapreduce for this problem?
In map reduce terms my program consists of 60 map phases and 60 reduce phases all of which together need to be completed in 1 second. One of the problems I need to solve this way is a minimum search with about 64000 variables. The hessian matrix for the search is a block matrix, 1000 blocks of size 64x64 along a diagonal, and one row of blocks on the extreme right and bottom. The last section of : block matrix inversion algorithm shows how this is done. Each of the Schur complements S_A and S_D can be computed in one mapreduce step. The computation of the inverse takes one more step.
From my research so far, mpi4py seems like a good bet. Each process can do a compute step and report back to the client after each step, and the client can report back with new state variables for the cycle to continue. This way the process state is not lost computation can be continued with any updates.
http://mpi4py.scipy.org/docs/usrman/index.html
This wiki holds some suggestions, but does anyone have a direction on the most developed solution:
http://wiki.python.org/moin/ParallelProcessing
Thanks !
MPI is a communication protocol that allows for the implementation of parallel processing by passing messages between cluster nodes. The parallel processing model that is implemented with MPI depends upon the programmer.
I haven't had any experience with MapReduce but it seems to me that it is a specific parallel processing model and is designed to be simple to implement. This kind of abstraction should save you programming time and may or may not provide a suitable solution to your problem. It all depends on the nature of what you are trying to do.
The trick with parallel processing is that the most suitable solution is often problem specific and without knowing more specifics about your problem it is hard to make recommendations.
If you can tell us more about the environment that you are running your job on and where your program fits into Flynn's taxonomy, I might be able to provide some more helpful suggestions.