speed up CPLEX C++ code - c++

This is the result of implementing my very first model in CPLEX C++ and I am very much surprised how slow and poor the quality is. I belief that much of it can be avoided by a better formulation. Can anyone help me improve the code, please? Hints, ideas, thoughts ... everything is appreciated!
It is about scheduling exams within 5 days that have 2 timeslots each available. My input is the number of exams (first row first number) and conflicting exam pairs (first row second number) where I also know the number of students taking both exams(in the following rows -> exam1 exam2 #students taking both exams).
The code you can find here and the instance here.
The constraints I am including are:
each exams is scheduled exactly once
conflicting exams cannot be scheduled at the very same period
penalize if conflicting exams are scheduled on same day
penalize if conflicting exams are scheduled on consecutive days
penalize if conflicting exams are scheduled on adjacent periods over night
I have the feeling that something is even wrong in the formulation because I cannot imagine that the value of the objective value is that high. Does anyone see the flaw? I'm stuck.
The problem might be in the loop where I try to figure out whether a soft constraint is violated or not. There I am looping over days but probably I accidentally overwrite my variables all the time. Does anyone have an idea how to determine the binary variable indicating if the soft constraint is violated on any day (and of course it can happen only once but most probably it is not at the end).

Rather than debugging C++ code I just re-implemented the model using a modeling language (I believe that is quicker and a more pleasant way to spend a sunday evening).
(Note: updated after fixing bug). Here is my solution with a total penalty of 751:
You should be able to plug this into your code and verify the results.
Note: it took about 900 seconds on an old laptop to prove optimality with Cplex. If you want just a good solution, you can stop a little bit earlier:
(Blue line is the objective and the red line is the best possible bound). To get good performance I helped Cplex a little bit with setting branching priorities and using some options suggested by a tuning run.

Related

Job scheduling to minimise loss

I have got a job scheduling problem. We are given start time, time to
complete the order, deadline.
It is given that start time + time to
complete <= deadline.
I have also been given the loss that will occur if I am not able to
complete the job before the deadline. I have to design an algorithm to minimize the loss.
I have tried changing the standard algorithm of dynamic programming for maximizing the profit in job scheduling but to no success.
What algorithm can I use to solve the question?
Dynamic Programming isn't the right approach based on what you're aiming to optimize. You can find the optimized schedule by using a greedy approach.
Here's a thorough guide with sample code for your desire language (C++), in the guide it assumes each jobs takes only 1 unit of time, which you can easily modify by using time_to_complete instead.
Your problem is similar to the knapsack one. Using a greedy approach is convenient if you aren't actually looking for the best possible solution, but just a "good enough" one.
The big pro of the greedy approach is that the cost is rather lower than other "more thorough" approaches but, if you need the best solution to your problem, I would say that backtracking is the way to go.
Since the deadline can be violated, the problems looks like a Total Weighted Tardiness Scheduling Problem. There are many flavors of it, but most problems under this umbrella are computationally hard, therefore Dynamic Programming (DP) would not be my first choice. In my experience, DP also poses difficulties during modeling and implementation. Same comment for mathematical programming "as-is". Some approaches that can be implemented more quickly are:
constraint programming: very small learning curve, and there are many libraries out there, included very good open source ones (most have C++ API). Bonus: constraint programming can demonstrate optimality.
ad hoc heuristics: (1) start with a constructive algorithm (like the greedy approach suggested by Ling Zhong and Flavio Giobergia), then (2) use some local search approach to improve if and finally (3) embed the approach into a metaheuristic scheme. This way you can build on top of the previous step, and learn a lot about the problem. Note: in general, heuristics cannot demonstrate optimality
special mention: local solver, a hybrid approach between the two above: it lets you model the problem using a formalism similar to constraint programming and then it solves it using heuristics. It is very easy to learn, it usually lets you get started quickly and, in my tests, it provides remarkably good results.

When is an event too rare for predictive modelling to be worthwhile?

Background
I built a complaints management system for my company. It works fine. I'm interested in using the data it contains to do predictive modelling on complaints. We have ~40,000 customers of whom ~400 have complained.
Problem
I want to use our complaints data to model the probability that any given customer will complain. My concern is that a model giving each customer a probability of 0.000 for complaining would already be 99% accurate and thus hard to improve upon. Is it even possible to build a useful predictive model of the kind I describe trying to predict such a rare event with so little data?
That is why there are alternative measures than just accuracy.
Here, recall is probably what you are interested in. An in order to balance precision and recall, F1 is a popular mixture that takes both into account.
But in general, avoid trying to break down things into a single number.
It's a 1 dimensional result, and too much of a simplification. In practise, you will want to study errors in detail, to avoid a systematic error from happening.

Two way clustering in ordered logit model, restricting rstudent to mitigate outlier effects

I have an ordered dependent variable (1 through 21) and continuous independent variables. I need to run the ordered logit model, clustering by firm and time, eliminating outliers with Studentized Residuals <-2.5 or > 2.5. I just know ologit command and some options for the command; however, I have no idea about how to do two way clustering and eliminate outliers with studentized residuals:
ologit rating3 securitized retained, cluster(firm)
As far as I know, two way clustering has only been extended to a few estimation commands (like ivreg2 from scc and tobit/logit/probit here). Eliminating outliers can easily be done on your own and there's no automated way of doing it.
Use the logit2.ado from the link Dimitriy gave (Mitchell Petersen's website) and modify it to use the ologit command. It's simple enough to do with a little trial and error. Good luck!
If you have a variable with 21 ordinal categories, I would have no problems treating that as a continuous one. If you want to back that up somehow, I wrote a paper on welfare measurement with ordinal variables, see DOI:10.1111/j.1475-4991.2008.00309.x. Then you can use ivreg2. You should be aware of all the issues involved with that estimator, in particular, that it implicitly assumed that the correlations are fully modeled by this two-way structure, and observations for firms i and j and times t and s are definitely uncorrelated for i!=j and t!=s. Sometimes, this is a strong assumption to make -- i.e., New York and New Jersey may be correlated in 2010, but New York 2010 is uncorrelated with New Jersey 2009.
I have no idea of what you might mean by ordinal outliers. Somebody must have piled a bunch of dissertation advice (or worse analysis requests) without really trying to make sense of every bit.

Is there an Integer Linear Programming software that returns also non-optimal solutions?

I have an integer linear optimisation problem and I'm interested in feasible, good solutions. As far as I know, for example the Gnu Linear Programming Kit only returns the optimal solution (given it exists).
This takes endless time and is not exactly what I'm looking for: I would be happy with any good solution, not only the optimal one.
So a LP-Solver that e.g. stops after some time and returns the best solution he found so far, would do the job.
Is there any such software? It would be great if that software was open source or at least free as in beer.
Alternatively: Is there any other way that usually speeds up Integer LP problems?
Is this the right place to ask?
Many solvers provide a time limit parameter; if you set the time limit parameter, they will stop once the time limit is reached. If an integer feasible solution has been found, it will return the best feasible solution found to that point.
As you may know, integer programming is NP-hard, and there is a real art to finding optimal solutions as well as good feasible solutions quickly. To compare the different solvers, see Prof. Hans Mittelmann's Benchmarks for Optimization Software. The MILP benchmarks - particularly MIPLIB2010 and the Feasibility Benchmark should be most relevant.
In addition to selecting a good solver, there are many things that can be done to improve solve times including tuning the parameters of the solver and model reformulation. Many people in research and industry - including myself - spend our careers working on improving the solve times of MIP models, both in general and for specific models.
If you are an academic user, note that the top commercial systems like CPLEX and Gurobi are free for academic use. See the respective websites for details.
Finally, you may want to look at OR-Exchange, a sister site to Stack Overflow that focuses on the field of operations research.
(Disclaimer: I currently work for Gurobi Optimization and formerly worked for ILOG, which provided CPLEX).
If you would like to get a feasibel integer solution fast and if you don't need the optimal solution, you can try
Increase the relative or absolute Gap. Usually solvers have small gaps of say 0.0001% for relative gap. This means that the solver will continue searching for MIP solutions until it the MIP solution is not farther than 0.0001% away from the optimal solution. Increase this gab to e.g. 1%., So you get good solution, but the solver will not spent a long time in proving optimality.
Try different values for solver parameters concerning MIP heuristics.
CPLEX and GUROBI have parameters to control, MIP emphasis. This means that the solver will put more emphasis on looking for feasible solutions or on proving optimality. Set emphasis to feasible MIP solutions.
Most solvers like CPLEX, Gurobi, MOPS or GLPK have settings for gap and heuristics. MIP emphasis can be set - as far as I know - only in CPLEX and Gurobi.
A usual approach for solving ILP is branch-and-bound. This utilized the solution of many sub-LP (without-I). The finally optimal result is the best of all sub-LP. As at least one solution is found you could stop anytime and would have a best-so-far.
One package that could do it, is the free lpsolve. Look there at set_timeout for giving a time limit, and when it is ILP the solve function can return in SUPOPTIMAL the best_so_far value.
As far as I know CPLEX can. It can return the solution pool which contains primal feasible solutions in the search, and if you specify the search focus on feasibility rather on optimality, more faesible solutions can be generated. At the end you can just export the pool. You can use the pool to do a hot start so it's pretty up to you. CPlex is free now at least in my country as you can sign up as a researcher.
Could you take into account Microsoft Solver Foundation? The only restriction is technology stack that you prefer and here you should use, as you guess, Microsoft technologies: C#, vb.net, etc. Here is example how to use it with Excel: http://channel9.msdn.com/posts/Modeling-with-Solver-Foundation-30 .
Regarding to your question it is possible to have not a fully optimized solutions if you set efficiency (for example 85% or 0.85). In outcome you can see all possible solutions for such restriction.

Recommended Open Source Profilers [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm trying to find open source profilers rather than using one of the commercial profilers which I have to pay $$$ for. When I performed a search on SourceForge, I have come across these four C++ profilers that I thought were quite promising:
Shiny: C++ Profiler
Low Fat Profiler
Luke Stackwalker
FreeProfiler
I'm not sure which one of the profilers would be the best one to use in terms of learning about the performance of my program. It would be great to hear some suggestions.
You could try Windows Performance Toolkit. Completely free to use. This blog entry has an example of how to do sample-based profiling.
Valgrind (And related tools like cachegrind, etc.)
Google performance tools
There's more than one way to do it.
Don't forget the no-profiler method.
Most profilers assume you need 1) high statistical precision of timing (lots of samples), and 2) low precision of problem identification (functions & call-graphs).
Those priorities can be reversed. I.e. the problem can be located to the precise machine address, while cost precision is a function of the number of samples.
Most real problems cost at least 10%, where high precision is not essential.
Example: If something is making your program take 2 times as long as it should, that means there is some code in it that costs 50%. If you take 10 samples of the call stack while it is being slow, the precise line(s) of code will be present on roughly 5 of them. The larger the program is, the more likely the problem is a function call somewhere mid-stack.
It's counter-intuiitive, I know.
NOTE: xPerf is nearly there, but not quite (as far as I can tell). It takes samples of the call stack and saves them - that's good. Here's what I think it needs:
It should only take samples when you want them. As it is, you have to filter out the irrelevant ones.
In the stack view it should show specific lines or addresses at which calls take place, not just whole functions. (Maybe it can do this, I couldn't tell from the blog.)
If you click to get the butterfly view, centered on a single call instruction, or leaf instruction, it should show you not the CPU fraction, but the fraction of stack samples containing that instruction. That would be a direct measure of the cost of that instruction, as a fraction of time. (Maybe it can do this, I couldn't tell.)
So, for example, even if an instruction were a call to file-open or something else that idles the thread, it still costs wall clock time, and you need to know that.
NOTE: I just looked over Luke Stackwalker, and the same remarks apply. I think it is on the right track but needs UI work.
ADDED: Having looked over LukeStackwalker more carefully, I'm afraid it falls victim to the assumption that measuring functions is more important than locating statements. So on each sample of the call stack, it updates the function-level timing info, but all it does with the line-number info is keep track of min and max line numbers in each function, which, the more samples it takes, the farther apart those get. So it basically throws away the most important information - the line number information. The reason that is important is that if you decide to optimize a function, you need to know which lines in it need work, and those lines were on the stack samples (before they were discarded).
One might object that if the line number information were retained it would run out of storage quickly. Two answers. 1) There are only so many lines that show up on the samples, and they show up repeatedly. 2) Not so many samples are needed - the assumption that high statistical precision of measurement is necessary has always been assumed, but never justified.
I suspect other stack samplers, like xPerf, have similar issues.
It's not open source, but AMD CodeAnalyst is free. It also works on Intel CPUs despite the name. There are versions available for both Windows (with Visual Studio integration) and Linux.
From those who have listed, I have found Luke Stackwalker to work best - I liked its GUI, it was easy to get running.
Other similar is Very Sleepy - similar functionality, sampling seems more reliable, GUI perhaps a little bit harder to use (not that graphical).
After spending some more time with them, I have found one quite important drawback. While both try to sample at 1 ms resolution, in practice they do not achieve it because their sampling method (StackWalk64 of the attached process) is way too slow. For my application it takes something like 5-20 ms to get a callstack. Not only this makes your results imprecise, it also makes them skewed, as short callstacks are walked faster, therefore tend to get more hits.
We use LtProf and have been happy with it. Not open source, but only $$, not $$$ :-)