find the number of all possible combinations with conflicts - combinations

I am trying to solve an optimization problem, but first I have to find the number of all possible combinations of n elements but considering some conflicts. A possible example could be:
elements: {1,2,3,4}
conflicts: {1,2},{3,4}
The term "conflict" means that the numbers that belong to the same conflict set must not be allocated into the same combination. Also the conflict sets are not always disjoint and the elements in each conflict set are always two.
Until now I only found how all possible combinations can be calculated, that is 2^n.
Thank you.

The conflict sets can be modeled as edges in a graph. You are asking for the number of independent vertex sets in a graph
An independent vertex set of a graph G is a subset of the vertices such that no two vertices in the subset represent an edge of G
- http://mathworld.wolfram.com/IndependentVertexSet.html
The above link also refers to something called the independence polynomial which can be used to count such things -- though this is useful only if the conflict graph has a nice structure. The general problem of determining the number of independent sets is known to be #P-complete (see https://en.wikipedia.org/wiki/Sharp-P-complete for a definition of this complexity class) so there is little chance that your question has a simple answer. Markov-chain techniques have been applied to approximate this number in some cases. See http://www.researchgate.net/publication/221590282_Approximately_Counting_Up_To_Four_(Extended_Abstract)

Related

Generating "unique" matricies

This may be more of a math question than a programming question but since I am specifically working in c++ I figured maybe there was a library or something I didn't know about.
Anyway I'm working on a game where I'm generating some X by X arrays of booleans and randomly assigning Y of them to be true. Think tetris block kind of stuff. What I need to know is if there's a clever way to generate "unique" arrays without having to rotate the array 4 times and compare each time. To use tetris as an example again. an "L" piece is an "L" piece no matter how it's rotated, but a "J" piece would be a different unique piece. As a side question, is there a way to determine the maximum number of unique possible configurations for an X by X array with Y filled in elements?
You could sum (x-X/2)^2 + (y-X/2)^2 for each (x,y) true grid element. This effectively gives the squared distances from the centre of your grid to each "true" cell. Two grids that are the same when rotated share the property that their "true" cells are all the same distances from the centre, so this sum will also be the same. If the grids all have unique sums of squares, they are unique under rotation.
Note that although unique sums guarantees no rotational duplicates, the converse isn't true; two non-matching grids can have the same sum of squares.
If your grids are quite small and you are struggling to maximize the number of different patterns, you'll probably want to test those with equal sums. Otherwise, if your generator spits out a grid with a sum of squares that matches a previously created grid, reject it.
What you can do, is make a basic form: somehow uniquely decide which orientation among the 4 possible ones is the basic one and then compare them via the basic forms only.
How to decide which form is the basic one? It doesn't really matter as long as it is consistent. Say, pick the highest one according to lexicographical comparison.
Edit:
About the number of unique shapes: roughly speaking it is binomial number (n^2 over k)/4 - only that it doesn't take into account symetrical shapes that are preserved by 180° rotation, though there are only a few such shapes in comparison (at least for large n,k).
Side note: you should also consider the case of shapes that differ by shift only.

Linear Programming - Constraints

I am trying to encode this (a small part of a project) to linear programming:
For each package p we know its length (xDimp) and width (yDimp). Also, we have the length (xTruck) and width (yTruck) of the Truck. All the numbers are integers.
Due to the design of the packages, they cannot be rotated when placed in a truck.
The Truck is represented as a matrix of 2 dimensions, only with x and y coordinates. We ignore the height.
Decision variables:
– pxy[p,x,y] = package p is in the cell with upper-right coordinates (x, y)
– pbl[p,x,y] = the bottom left cell of p has upper-right coordinates (x, y)
How do I write such constraints to set pbl and pxy variables? I supouse that I should set the variable pbl to assure that the package fits in the truck and the value of pxy variable depends of the value of pbl.
Thank you,
This is a variant of the bin packing problem, a two dimensional packing of multiple rectangles of varying widths and heights in an enclosing rectangle (2BP). If they are only allowed to be rotated by 90°, we got the orthogonal orientations rectangular packing problem, and in your case we have a non-rotatable rectangular packing problem. Its computational complexity is NP-hard, but it's not unfeasible.
From your description, the problem is already discretised, restricting the possible placements to the grid, which means that the optimum of the continuous version may not be available anymore.
One approch is to calculate certain conflict graph in advance, which represents your search space and holds information about the overlap of the rectangles:
where
Every edge represents a conflict and every node represents a possible placement within your truck. Two packages p and q intersect iff
and pairwise.
Now, the packing problem on the grid is a maximum independent set problem on the conflict graph (MIS), assuming you want to maximize the number of packages on the truck. The MIS, in turn, has the following ILP formulation:
This is an integer relaxation of the MIS but still not good for the branch and bound solving method. If C is clique in G then any independent set can pick at most one node from C, therefore use the following constraint:
The resulting linear program's number of variables grows exponentially.
In order to go further, you can try a meta constraint satisfaction approach.
Firstly, use the following contraints to make sure your packages are within the truck:
Secondly, use a set of disjunctive constraints to prevent overlap:
From that point on, you can start to formulate a meta program, as descriped here
I think this should be be enough for a start :-)
You can find more information in the literature about combinatorial optimization.
Sources:
http://www.staff.uni-mainz.de/schoemer/publications/ESA03.pdf
https://kluedo.ub.uni-kl.de/frontdoor/index/index/docId/2046

Algorithm to find maximum independent subgroup of hashmaps

I need an algorithm to find a maximum independent subgroup of hashmaps, where it represented in an array of hashmaps.
I tried to go over the array of the hashmaps and send and index every time and see which hashmaps in the array not independent with the hashmaps in this index, it worked but in case of
A and B independent
B and C independent
but A and C can be not independent
Definition of maximum independent subgroup of hashmaps:
I have an array which contain hashmaps, every hashmap contain a key, every two hashmaps called independent if every key in the first hashmap is not contained in the second map so I have to find a subgroup of those hashmaps which all are independent
First of all, this problem is NP-complete.
To prove this, suppose a graph, with indexed edges.
Create a HashMap for every vertex and fill it with indices for every incident edge of that vertex.
Then if two HashMaps are independent, they do not contain the same edge, therefore the respective vertices are independent as well.
Finding a maximum independent subset of these HashMaps hence gives you the maximum independent set in the graph, which we know is NP-complete.
You can solve this problem by constructing a graph, with a vertex for each HashMap and adding an edge for every two dependent HashMaps and then using some algorithm for independent sets.
Another way is to take the complement graph and finde max clique.
Since this is inefficient, you may consider using some approximation algorithm.

splitting of a graph into cycles and then into paths

First of all, I should say I am not familiar with the Graph theory and also my mathematics knowledge is very poor. Anyhow I am using graph concepts for my analysis.
Basically, I am decomposing an undirected graph (say G) into cycles (closed graph). The specialty of my cycle is that they are the shortest cycles that one can traverse between two vertices (as they are cycle, starting and ending are same though). According to my example graph, my cycles are (1,4,5,1)(1,2,3,4,1)(7,9,8,7) (I neglect the cycles whose length is less than 3).
Edit: I use depth first search to get the cycles and then got the smallest cycles.
Later, I am further braking those cycles into directed paths. In here, I broke the cycles through the edges (through red lines in figure), so that I inserted starting and ending nodes for my new path graphs. So for the cycle (7,9,8,7)=> new directed paths are (a,9,c)(d,8,7,b)
Edit: the further breaking is done only for selected cycles. It is just inserting a new vector and updating the elements. Any graph theory related algorithms doesn't involve here.
Then I do some analysis with my data.
I did all above things. So, my problem is how to describe the entire
things with mathematical notations (without example like I said). This is very hard for me as I do not have even basics.
I was trying and googling but still cannot find a way to describe what I did. I guess, the thing what I did is clear for you.
So, Could you please help me, How to describe
decomposing a undirected graph into cycles (shortest cycles)
Cycle breaking via edges and make directed path graphs (as shown in figure)
with mathematical notation (according to graph theory)
I have seen many authors use different notations and symbols to define graphs and their sub graphs, but for me, I can not define such things as my basic are too poor. So, Please help me to say these things in a formal, mathematical way. Thanks in advance.
I have inserted sample figures to get idea also.
Note: I have add c++ tag as many computer scientists use graph theories and would like to have a response.
The first problem you might encounter in an attempt to put your operations in a mathematical description is your definition of the "shortest cycles" as cycles are typically defined as a sequence of vertices connected by edges in which the first one is also the last one.
Math crashcourse
In math a graph is typcally described by two sets V (like vertices) and E (like edges)
The set E consisting of sets with two elements each of them being a vertex.
Such as
V = { v1, v2, ...., vn }
E = { ..., {vi, vk}, ... }
Every set in E correspends to one edge in your graph.
As such a (connected) path is typcially defined as:
A sequence of vertices v1, ...., vn with the property that for every two consecutive vertexes in the sequence vi and vi+1 the set { vi, vi+1 } is an element of the set E.
(practically speaking: there is an edge from vertex vi to vertex vi+i)
A cycle is typically defined as a path with the property: v1 = vn (thus the first vertex is also the last one)
Whith this definition an your example already the sequence: 1, 4, 1 forms a cycle (in the mathematical sense)
As such every edge in your graph would count as a "shortest" cycle, while the examples given are definately longer!
You told that you
... neglect the cycles whose length is less than 3
this doesn't look to bad as a starting point for your description. Unfortunately I didn't completely understand the next steps you want to perform.
Advice
My advice, or the least the way I would approach the problem is to convert the rather long description to some kind of shorter algorithmic description while refining on exactly how you try to perform the task. This way getting to your final description shouldn't be too hard to accomplish. Especially don't forget to tell what exactly the input to your algorithm is. Even that doesn't seem to be too clear from your description.
are you starting with a already known set of "shortest" cycles?
or are you just given a graph as input and have to determine the "shortest" cycles yourself?
if you detect them yourself how exactly is this done?
Especially don't forget to tell about this part of the story if it applies as it seems to be one of the most crucial ones to your problem.

What does the 'lower bound' in circulation problems mean?

Question: Circulation problems allow you to have both a lower and an upper bound on the flow through a particular arc. The upper bound I understand (like pipes, there's only so much stuff that can go through). However, I'm having a difficult time understanding the lower bound idea. What does it mean? Will an algorithm for solving the problem...
try to make sure every arc with a lower bound will get at least that much flow, failing completely if it can't find a way?
simply disregard the arc if the lower bound can't be met? This would make more sense to me, but would mean there could be arcs with a flow of 0 in the resulting graph, i.e.
Context: I'm trying to find a way to quickly schedule a set of events, which each have a length and a set of possible times they can be scheduled at. I'm trying to reduce this problem to a circulation problem, for which efficient algorithms exist.
I put every event in a directed graph as a node, and supply it with the amount of time slots it should fill. Then I add all the possible times as nodes as well, and finally all the time slots, like this (all arcs point to the right):
The first two events have a single possible time and a length of 1, and the last event has a length of 4 and two possible times.
Does this graph make sense? More specifically, will the amount of time slots that get 'filled' be 2 (only the 'easy' ones) or six, like in the picture?
(I'm using a push-relabel algorithm from the LEMON library if that makes any difference.)
Regarding the general circulation problem:
I agree with #Helen; even though it may not be as intuitive to conceive of a practical use of a lower bound, it is a constraint that must be met. I don't believe you would be able to disregard this constraint, even when that flow is zero.
The flow = 0 case appeals to the more intuitive max flow problem (as pointed out by #KillianDS). In that case, if the flow between a pair of nodes is zero, then they cannot affect the "conservation of flow sum":
When no lower bound is given then (assuming flows are non-negative) a zero flow cannot influence the result, because
It cannot introduce a violation to the constraints
It cannot influence the sum (because it adds a zero term).
A practical example of a minimum flow could exist because of some external constraint (an associated problem requires at least X water go through a certain pipe, as pointed out by #Helen). Lower bound constraints could also arise from an equivalent dual problem, which minimizes the flow such that certain edges have lower bound (and finds an optimum equivalent to a maximization problem with an upper bound).
For your specific problem:
It seems like you're trying to get as many events done in a fixed set of time slots (where no two events can overlap in a time slot).
Consider the sets of time slots that could be assigned to a given event:
E1 -- { 9:10 }
E2 -- { 9:00 }
E3 -- { 9:20, 9:30, 9:40, 9:50 }
E3 -- { 9:00, 9:10, 9:20, 9:30 }
So you want to maximize the number of task assignments (i.e. events incident to edges that are turned "on") s.t. the resulting sets are pairwise disjoint (i.e. none of the assigned time slots overlap).
I believe this is NP-Hard because if you could solve this, you could use it to solve the maximal set packing problem (i.e. maximal set packing reduces to this). Your problem can be solved with integer linear programming, but in practice these problems can also be solved very well with greedy methods / branch and bound.
For instance, in your example problem. event E1 "conflicts" with E3 and E2 conflicts with E3. If E1 is assigned (there is only one option), then there is only one remaining possible assignment of E3 (the later assignment). If this assignment is taken for E3, then there is only one remaining assignment for E2. Furthermore, disjoint subgraphs (sets of events that cannot possibly conflict over resources) can be solved separately.
If it were me, I would start with a very simple greedy solution (assign tasks with fewer possible "slots" first), and then use that as the seed for a branch and bound solver (if the greedy solution found 4 task assignments, then bound if you recursive subtree of assignments cannot exceed 3). You could even squeeze out some extra performance by creating the graph of pairwise intersections between the sets and only informing the adjacent sets when an assignment is made. You can also update your best number of assignments as you continue the branch and bound (I think this is normal), so if you get lucky early, you converge quickly.
I've used this same idea to find the smallest set of proteins that would explain a set of identified peptides (protein pieces), and found it to be more than enough for practical problems. It's a very similar problem.
If you need bleeding edge performance:
When rephrased, integer linear programming can do nearly any variant of this problem that you'd like. Of course, in very bad cases it may be slow (in practice, it's probably going to work for you, especially if your graph is not very densely connected). If it doesn't, regular linear programming relaxations approximate the solution to the ILP and are generally quite good for this sort of problem.
Hope this helps.
The lower bound on the flow of an arc is a hard constraint. If the constraints can't be met, then the algorithm fails. In your case, they definitely can't be met.
Your problem can not be modeled with a pure network-flow model even with lower bounds. You are trying to get constraint that a flow is either 0 or at least some lower bound. That requires integer variables. However, the LEMON package does have an interface where you can add integer constraints. The flow out of each of the first layer of arcs must be either 0 or n where n is the number of required time-slots or you could say that at most one arc out of each "event" has nonzero flow.
Your "disjunction" constraint,
can be modeled as
f >= y * lower
f <= y * upper
with y restricted to being 0 or 1. If y is 0, then f can only be 0. If y is 1, the f can be any value between lower and upper. The mixed-integer programming algorithms will orders of magnitude slower than the network-flow algorithms, but they will model your problem.