How do I add a dependency constraint in LPsolve with binary variables - linear-programming

I am using the Java wrapper for LPsolve and I have a cost function with the traditional initial fixed cost + operating costs in the cost function
How can I add this constraint that if y_i = 0, p_i must also be zero (with y_i being the initial purchase cost, i.e. binary variable whether or not I choose it, and p_i the amount produced)?
I am aware I could technically use less or equal to statements to mimic this behavior, however this adds potential extra optima.

Adding a less or equal to constraint does indeed solve the issue, using a sufficiently high upper bound. The added extrema are irrelevant and will never be an optimum as Erwin stated.

Related

What bucket_count value should I use if I know the intended number of map keys?

I'm creating an std::unordered_map which I will immediately proceed to populate with n key-value pairs - and I know n. After that no more elements will be added - I will only be performing lookups.
What, therefore, should I pass as bucket_count to the constructor?
Notes:
I know it's not terribly critical and I could simply not specify anything and it will work.
This is related to, but not a dupe of, What should I pass to unordered_map's bucket count argument if I just want to specify a hash function?)
If it helps your answer, you may assume I want to have a load factor between f_1 and f_2 (known in advance).
I'm using the default hash function, and I don't know what the input is like, but it's unlikely to be adversarial to the hashing..
According to n4296 in 23.5.4.2 [unord.map.cnstr] (this is the final draft of C++14)
by default the max_load_factor for an unordered_map is 1.0, so you could just set the bucket_count to n.
There is obviously a space-time trade-off between increasing the bucket count for improved speed and decreasing it (and raising the max load factor) for improved space.
I would either not worry about it, or if it is a large map, set the bucket count to n. Then you can worry about optimizing when profiling shows you have a problem.
If you know the range of load factors you want, then you just set the bucket count to std::ceil(n/(std::max(f_1,f_2)), (and set the load factor before you fill the map).
Given the fact you have a range for your load factor, the only missing information is the collision rate. You can simply use nb_buckets = n / f_2 and you will be sure to get a load factor less than or equal to f_2. Ensuring correctness about f_1 requires data about the collision rate.

CPLEX MIP early termination with relative gap, getBestObjValue vs getObjValue

I'm using C++ to model a (maximization) MIP with CPLEX, and I specify a relative gap using
cplex.setParam(IloCplex::EpGap, gap);
I'm puzzled by the difference between
cplex.getBestObjValue();
and
cplex.getObjValue();
in case of early termination because of the gap.
If I understand correctly, the value from getBestObjValue() will always correspond to an integer feasible solution, and a lower bound to the optimal value. On the other hand, the value from getObjValue() (may? will always?) correspond to a non-feasible solution and is an upper bound to the optimal value. Am I understanding this correctly?
I also have another question: the value returned by getBestObjValue() is, in the case of maximization problems, 'the maximum objective function value of all remaining unexplored nodes' (from the CPLEX docs). Is there a a way to query the objective values of these unexplored nodes? I'm asking because I would like to get the minimal value that satisfies my relative gap, not the maximum.
According to the manual:
Cplex.GetBestObjValue Method:
It is computed for a minimization problem as the minimum objective function value of all remaining unexplored nodes. Similarly, it is computed for a maximization problem as the maximum objective function value of all remaining unexplored nodes.
For a regular MIP optimization, this value is also the best known bound on the optimal solution value of the MIP problem. In fact, when a problem has been solved to optimality, this value matches the optimal solution value.
It corresponds to an upper bound (when maximising) of the objective value, there is a gap when you stop the solver before reaching optimality. In MIP, there is branch and bound tree behind, as more nodes are explored, the upper bound decreases. There might or might not be any solution matching the upper bound when you stop by epgap.
Therefore your assumption below is wrong:
If I understand correctly, the value from getBestObjValue() will always correspond to an integer feasible solution.
GetObjValue() on the other hand is the objective value of the current best solution (corresponding to a found feasible solution). It is a lower bound, this is the value that you want to use in your second question.

c++ discrete distribution sampling with frequently changing probabilities

Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...).
some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time.
The method I am currently thinking is a binary tree (heap method), where all wi's are saved in the lowest level, and then the sum of each two in higher level and so on. The sum of all of them will be in the highest level, which is also a normalisation constant. Thus in order to update the tree after change in wi, one needs to do log(n) changes, as well as the same amount to get the sample from the distribution.
Question:
Q1. Do you have a better idea on how to achieve it faster?
Q2. The most important part: I am looking for a library which has already done this.
explanation: I have done this myself several years ago, by building heap structure in a vector, but since then I have learned many things including discovering libraries ( :) ), and containers such as map... Now I need to rewrite that code with higher functionality, and I want to make it right this time:
so Q2.1 is there a nice way to make a c++ map ordered and searched not by index, but by a cumulative sum of it's elements (this is how we sample, right?..). (that is my current theory how I would like to do it, but it doesnt have to be this way...)
Q2.2 Maybe there is some even nicer way to do the same? I would believe this problem is so frequent that I am very surprised I could not find some sort of library which would do it for me...
Thank you very much, and I am very sorry if this has been asked in some other form, please direct me towards it, but I have spent a good while looking...
-z
Edit: There is a possibility that I might need to remove or add the elements as well, but I think I could avoid it, if that makes a huge difference, thus leaving only changing the value of the weights.
Edit2: weights are reals in general, I would have to think if I could make them integers...
I would actually use a hash set of strings (don't remember the C++ container for it, you might need to implement your own though). Put wi elements for each i, with the values "w1_1", "w1_2",... all through "w1_[w1]" (that is, w1 elements starting with "w1_").
When you need to sample, pick an element at random using a uniform distribution. If you picked w5_*, say you picked element 5. Because of the number of elements in the hash, this will give you the distribution you were looking for.
Now, when wi changes from A to B, just add B-A elements to the hash (if B>A), or remove the last A-B elements of wi (if A>B).
Adding new elements and removing old elements is trivial in this case.
Obviously the problem is 'pick an element at random'. If your hash is a closed hash, you pick an array cell at random, if it's empty - just pick one at random again. If you keep your hash 3 or 4 times larger than the total sum of weights, your complexity will be pretty good: O(1) for retrieving a random sample, O(|A-B|) for modifying the weights.
Another option, since only a small part of your weights change, is to split the weights into two - the fixed part and the changed part. Then you only need to worry about changes in the changed part, and the difference between the total weight of changed parts and the total weight of unchanged parts. Then for the fixed part your hash becomes a simple array of numbers: 1 appears w1 times, 2 appears w2 times, etc..., and picking a random fixed element is just picking a random number.
Updating your normalisation factor when you change a value is trivial. This might suggest an algorithm.
w_sum = w_sum_old - w_i_old + w_i_new;
If you leave p_i as a computed property p_i = w_i / w_sum you would avoid recalculating the entire p_i array at the cost of calculating p_i every time they are needed. You would, however, be able to update many statistical properties without recalculating the entire sum
expected_something = (something_1 * w_1 + something_2 * w_2 + ...) / w_sum;
With a bit of algebra you can update expected_something by subtracting the contribution with the old weight and add the contribution with the new weight, multiplying and dividing with the normalization factors as required.
If you during the sampling keep track of which outcomes that are part of the sample, it would be possible to propagate how the probabilities were updated to the generated sample. Would this make it possible for you to update rather than recalculate values related to the sample? I think a bitmap could provide an efficient way to store an index of which outcomes that were used to build the sample.
One way of storing the probabilities together with the sums is to start with all probabilities. In the next N/2 positions you store the sums of the pairs. After that N/4 sums of the pairs etc. Where the sums are located can, obviously, be calculate in O(1) time. This data-structure is sort of a heap, but upside down.

how does IF affect complexity?

Let's say we have an array of 1.000.000 elements and we go through all of them to check something simple, for example if the first character is "A". From my (very little) understanding, the complexity will be O(n) and it will take some X amount of time. If I add another IF (not else if) to check, let's say, if the last character is "G", how will it change complexity? Will it double the complexity and time? Like O(2n) and 2X?
I would like to avoid taking into consideration the number of calculations different commands have to make. For example, I understand that Len() requires more calculations to give us the result than a simple char comparison does, but let's say that the commands used in the IFs will have (almost) the same amount of complexity.
O(2n) = O(n). Generalizing, O(kn) = O(n), with k being a constant. Sure, with two IFs it might take twice the time, but execution time will still be a linear function of input size.
Edit: Here and Here are explanations, with examples, of the big-O notation which is not too mathematic-oriented
Asymptotic complexity (which is what big-O uses) is not dependent on constant factors, more specifically, you can add / remove any constant factor to / from the function and it will remain equivalent (i.e. O(2n) = O(n)).
Assuming an if-statement takes a constant amount of time, it will only add a constant factor to the complexity.
A "constant amount of time" means:
The time taken for that if-statement for a given element is not dependent on how many other elements there are in the array
So basically if it doesn't call a function which looks through the other elements in the array in some way or something similar to this
Any non-function-calling if-statement is probably fine (unless it contains a statement that goes through the array, which some language allows)
Thus 2 (constant-time) if-statements called for each each element will be O(2n), but this is equal to O(n) (well, it might not really be 2n, more on that in the additional note).
See Wikipedia for more details and a more formal definition.
Note: Apart from not being dependent on constant factors, it is also not dependent on asymptotically smaller terms (terms which remain smaller regardless of how big n gets), e.g. O(n) = O(n + sqrt(n)). And big-O is just an upper bound, so saying it is O(n9999) would also be correct (though saying that in a test / exam will probably get you 0 marks).
Additional note: The problem when not ignoring constant factors is - what classifies as a unit of work? There is no standard definition here. One way is to use the operation that takes the longest, but determining this may not always be straight-forward, nor would it always be particularly accurate, nor would you be able to generically compare complexities of different algorithms.
Some key points about time complexity:
Theta notation - Exact bound, hence if a piece of code which we are analyzing contains conditional if/else and either part has some more code which grows based on input size then exact bound can't be obtained since either of branch might be taken and Theta notation is not advisable for such cases. On the other hand, if both of the branches resolve to constant time code, then Theta notation can be applicable in such case.
Big O notation - Upper bound, so if a code has conditionals where either of the conditional branches might grow with input size n, then we assume max or upper bound to calculate the time consumption by the code, hence we use Big O for such conditionals assuming we take the path that has max time consumption. So, the path which has lower time can be assumed as O(1) in amortized analysis(including the fact that we assume this path has no no recursions that may grow with the input size) and calculate time complexity Big O for the lengthiest path.
Big Omega notation - Lower bound, This is the minimum guaranteed time that a piece of code can take irrespective of the input. Useful for cases where the time taken by code doesn't grow based on input size n, but it consumes a significant amount of time k. In these cases, we can use the lower bound analysis.
Note: All of these notations doesn't depend upon the input being best/avg/worst and all of these can be applied to any piece of code.
So as discussed above, Big O doesn't care about the constant factors such as k and only sees how time increases with respect to growth in n, in which case here it is O(kn) = O(n) linear.
PS: This post was about the relation of big O and conditionals evaluation criteria for amortized analysis.
It's related to a question I posted myself today.
In your example it depends on whether you can jump from the first to the last element and if you can't then it also depends on the average length of each entry.
If as you went down through the array you had to read each full entry in order to evaluate your two if statements then your order would be O(1,000,000xN) where N is the average length of each entry. IF N is variable then it will affect the order. An example would be standard multiplication where we perform Log(N) additions of an entry which is Log(N) in lenght and so the order is O(Log^2(N)) or if you prefer O((Log(N))^2).
On the other hand if you can just check the first and last character then N = 2 and is constant so can be ignored.
This is an IMPORTANT point you have to be careful though because how can you decide if your multipler can be ignored. For example say we were doing Log(N) additions of a Log(N/100) number. Now just because Log(N/100) is the smaller term doesn't mean we can ignore it. The multiplying factor cannot be ignored if it is variable.

What does the 'lower bound' in circulation problems mean?

Question: Circulation problems allow you to have both a lower and an upper bound on the flow through a particular arc. The upper bound I understand (like pipes, there's only so much stuff that can go through). However, I'm having a difficult time understanding the lower bound idea. What does it mean? Will an algorithm for solving the problem...
try to make sure every arc with a lower bound will get at least that much flow, failing completely if it can't find a way?
simply disregard the arc if the lower bound can't be met? This would make more sense to me, but would mean there could be arcs with a flow of 0 in the resulting graph, i.e.
Context: I'm trying to find a way to quickly schedule a set of events, which each have a length and a set of possible times they can be scheduled at. I'm trying to reduce this problem to a circulation problem, for which efficient algorithms exist.
I put every event in a directed graph as a node, and supply it with the amount of time slots it should fill. Then I add all the possible times as nodes as well, and finally all the time slots, like this (all arcs point to the right):
The first two events have a single possible time and a length of 1, and the last event has a length of 4 and two possible times.
Does this graph make sense? More specifically, will the amount of time slots that get 'filled' be 2 (only the 'easy' ones) or six, like in the picture?
(I'm using a push-relabel algorithm from the LEMON library if that makes any difference.)
Regarding the general circulation problem:
I agree with #Helen; even though it may not be as intuitive to conceive of a practical use of a lower bound, it is a constraint that must be met. I don't believe you would be able to disregard this constraint, even when that flow is zero.
The flow = 0 case appeals to the more intuitive max flow problem (as pointed out by #KillianDS). In that case, if the flow between a pair of nodes is zero, then they cannot affect the "conservation of flow sum":
When no lower bound is given then (assuming flows are non-negative) a zero flow cannot influence the result, because
It cannot introduce a violation to the constraints
It cannot influence the sum (because it adds a zero term).
A practical example of a minimum flow could exist because of some external constraint (an associated problem requires at least X water go through a certain pipe, as pointed out by #Helen). Lower bound constraints could also arise from an equivalent dual problem, which minimizes the flow such that certain edges have lower bound (and finds an optimum equivalent to a maximization problem with an upper bound).
For your specific problem:
It seems like you're trying to get as many events done in a fixed set of time slots (where no two events can overlap in a time slot).
Consider the sets of time slots that could be assigned to a given event:
E1 -- { 9:10 }
E2 -- { 9:00 }
E3 -- { 9:20, 9:30, 9:40, 9:50 }
E3 -- { 9:00, 9:10, 9:20, 9:30 }
So you want to maximize the number of task assignments (i.e. events incident to edges that are turned "on") s.t. the resulting sets are pairwise disjoint (i.e. none of the assigned time slots overlap).
I believe this is NP-Hard because if you could solve this, you could use it to solve the maximal set packing problem (i.e. maximal set packing reduces to this). Your problem can be solved with integer linear programming, but in practice these problems can also be solved very well with greedy methods / branch and bound.
For instance, in your example problem. event E1 "conflicts" with E3 and E2 conflicts with E3. If E1 is assigned (there is only one option), then there is only one remaining possible assignment of E3 (the later assignment). If this assignment is taken for E3, then there is only one remaining assignment for E2. Furthermore, disjoint subgraphs (sets of events that cannot possibly conflict over resources) can be solved separately.
If it were me, I would start with a very simple greedy solution (assign tasks with fewer possible "slots" first), and then use that as the seed for a branch and bound solver (if the greedy solution found 4 task assignments, then bound if you recursive subtree of assignments cannot exceed 3). You could even squeeze out some extra performance by creating the graph of pairwise intersections between the sets and only informing the adjacent sets when an assignment is made. You can also update your best number of assignments as you continue the branch and bound (I think this is normal), so if you get lucky early, you converge quickly.
I've used this same idea to find the smallest set of proteins that would explain a set of identified peptides (protein pieces), and found it to be more than enough for practical problems. It's a very similar problem.
If you need bleeding edge performance:
When rephrased, integer linear programming can do nearly any variant of this problem that you'd like. Of course, in very bad cases it may be slow (in practice, it's probably going to work for you, especially if your graph is not very densely connected). If it doesn't, regular linear programming relaxations approximate the solution to the ILP and are generally quite good for this sort of problem.
Hope this helps.
The lower bound on the flow of an arc is a hard constraint. If the constraints can't be met, then the algorithm fails. In your case, they definitely can't be met.
Your problem can not be modeled with a pure network-flow model even with lower bounds. You are trying to get constraint that a flow is either 0 or at least some lower bound. That requires integer variables. However, the LEMON package does have an interface where you can add integer constraints. The flow out of each of the first layer of arcs must be either 0 or n where n is the number of required time-slots or you could say that at most one arc out of each "event" has nonzero flow.
Your "disjunction" constraint,
can be modeled as
f >= y * lower
f <= y * upper
with y restricted to being 0 or 1. If y is 0, then f can only be 0. If y is 1, the f can be any value between lower and upper. The mixed-integer programming algorithms will orders of magnitude slower than the network-flow algorithms, but they will model your problem.