How to build my own global constraint with the CP-SAT solver of OR-tools? - data-mining

I am a phd candidate in data mininig, and i have to create a global constraint with ORtools for a data mining purpose.
The problem is that there is a lack of documentation about creating your own global constraint with CP-Sat in the internet, and i don't know how to start.

It is obviously possible, but very tedious, and very complex.
Writing a new constraint implies:
extending the proto to support the constraint
writing the input validation
writing the solution checker
writing the loading (into CP-SAT engine) code
writing the presolve rules
writing the propagation code. Which is complex as every deduction needs to be fully explained.
writing the linearization/cut generation code
The last 3 items are extremely error prone, and very hard to debug, as the effect of cuts and explanations are delayed, and sometimes never used.
For these reason, I recommend expanding the constraint into smaller ones. In fact, most of the CP constraints are expanded (alldiff, element, table, reservoir, inverse, automaton, some products, some modulos).
You can also submit a feature request for a new constraint. It can happen if it is useful/general enough.
Thanks

Related

What is the difference between Classic and Constraint Model Assertions in Nunit?

I am learning Nunit-2.6.3 by reading the Documentation. I am having a few doubts about it.
What is the difference between the classical model and the constraint model assertion?
Which model of assertions is the best one, and why?
The main difference is syntactic. It's the difference between (classic):
Assert.AreEqual("expected", someString);
And (constraint)
Assert.That(someString, Is.EqualTo("expected"));
Classic mode has been around longer and some people believe that it's more explicit and easier to follow.
Other people believe the constraint based approach is closer to the way that you might say the constraint if you were explaining it to somebody else.
If you're just getting started, then probably the constraint based assertions are the better ones to learn, since they're the direction that NUnit appears to be trying to head in. They're also closer to FluentAssertions. The constraint based assertions also has more explicit support for extension through the use of the IResolveConstraint interface.
You should however probably gain an awareness of the classic assertions since there's a good chance that different places you encounter code may use either depending on what they used first.
Although the syntax is different, what they're doing is very similar, so if you understand one set of assertions, converting them back and forth is pretty straightforward.

CPLEX C++ Interface: How to get the index of a violated constraint?

I attempt to solve an integer linear program (ILP) using the solver IBM ILOG CPLEX in C++. The solver states that the problem is infeasible and points out the index of a violated constraint. My question concerns the identification and analysis of this constraint in C++.
A manual approach to analyzing the constraint would be to export the problem to a text file using the function extractModel and look up the violated constraint in this file.
Preferably, I would like to get the index of the violated constraint in C++ and get as much information about this conflict as possible.
Currently, I am using the conflict refiner but do not get any useful information out of it. Specifically, I keep an IloRangeArray of all constraints I ever add to the model, call refineConflict for this array and then use the function getConflict to query (possibly) violated constraints. The result is that all constraints I ever added are possibly violated and no constraint is proved to be violated.
How can I access the index of the one constraint reported in the error message that states that the problem is infeasible?
Also, am I using the conflict refiner incorrectly? E.g. am I doing something wrong when I make copies of constraints that I add to the model in a separate array? (The copy constructor and assignment operator of certain classes in Cplex seem to have non-standard behavior that I do not understand.)
Any help is appreciated.
I've not tried to use the conflict refiner API. Probably should look into it... but I use the conflict refiner a lot in the standalone interactive CPLEX. I am not aware of any issues of keeping copies of the constraints in your own code - I have done it before in CPLEX & Concert with C++. It may be a conceptual misunderstanding of what the conflict refiner does...
Remember that it is very rare to have a single identifiable infeasible constraint. It is much more common that there is a set of constraints that cannot be satisfied together, but if any of that set of constraints is removed then the rest are then feasible. This is usually called the "irreducible infeasible set".
Think for example of three constraints:
a >= b + 1
b >= c + 1
c >= a + 1
Clearly these three constraints cannot be satisfied simultaneously, but take any one away and the other two are then OK. It can be very hard to decide which constraint is wrong in some cases, and really depends on a deeper understanding of the problem and its model.
Anyway, try exporting the model as an LP, MPS or SAV format file and read it into the standalone CPLEX optimiser. Then optimise it - it should also fail with a reported infeasibility. Then run the conflict refiner and then display the computed (irreducible) infeasible set:
read fred.lp
optimize
conflict
display conflict all
I find that MPS files are better at preserving the full precision of the problem and are probably more portable to try with other solvers, but LP files are much more human-readable. The SAV file format is supposed to be the most accurate copy of what CPLEX has in memory, but is very opaque and rather CPLEX-specific. If your problem is clearly infeasible the LP format is probably nicer to work with, but if the problem is borderline infeasible you may get different behaviour from the LP file. It would probably help you a great deal if you name all your variables ad constraints too. Maybe just do the naming in debug builds or add a flag to control whether or not to do the additional naming.

Arenas where core.logic dominates [soft]

Community Wiki
I don't care about the reputation points, I just want good answers. Feel free to remark this question as community wiki.
Context
I'm been working through The Reasoned Schemer, and have found the following observations:
Logic programming is very interesting.
Logic programming is sometimes counter-intuitive
Logic programming is often "inefficient" (or at least the code I write).
It seems like in going from
Assembly -> C++, I "give up" control of writing my own machine code
C++ -> Clojure, I give up control of memory management
Clojure -> core.logic/prolog/minikanren, I lose partial control of how computations are done
Question:
Besides (1) solving logic puzzles and (2) type inference, what are the domains of problems that logic programming dominates?
Thanks!
Constraint logic programming can be really useful for solving various scheduling, resource allocation, and other nontrivial constraint satisfaction / combinatorial optimization problems. All you have is declarative: the constraints (e.g. only one aircraft can be on the runway at a time), and maybe something you want to minimize/maximize (throughput/waiting).
There are various well-known flavors of that in Prolog, including CLP(FD), which works in finite integer domain, and CLP(R), which works in real domain. At least CLP(FD) seems to be in core.logic's immediate roadmap.
I believe such Prolog-derived solutions are actively being used in air traffic control and other logistics tasks, although it's hard to get precise info what technologies exactly such mission- and life-critical companies are using under the hood.
Research in artificial intelligence, and in particular cognitive robotics and other application of logic-based knowledge representation, are areas where Prolog is used a lot for its close relation to logic theory. This relation is very useful because it basically brings theory to life. Theorems can be proven on paper, and then implemented almost trivially in prolog and executed, and the executing programs have the proven properties. This allows for programs that are "correct by construction", which is the opposite of first writing programs and then trying to prove properties about them (as is done in formal methods, using, e.g., model checking).
The semantic web is another place where logic programming plays a growing role.

Why are relational databases needed?

Specifically thinking of web apps,
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
The web apps I write have logic built-in that validates user input against required fields. I see no real use for foreign keys and thus no real use for relational databases.
Besides, if I were to put all the required field validation logic in the RDBMS(ie:MySQL) it would simply return a vague error. At least with PHP-based validation I know which field is missing and I can notify the user(though with Javascript-based validation this would almost NEVER happen anyway).
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
I really need some insight on this topic. I'm simply can't come up with a good answer.
I will come at this from a different angle.
I work at a place where we had a database that had no foreign key constraints, default values, or other data checks whatsoever in their initial records database. The lead engineer's excuse for this was something similar to what you have described above. "The application will ensure the referential integrity".
The problem is, we did not have a standard data layer (like an object relational mapping) over the top of the database. We had multiple programmatic sources that fed into the same tables. It was funny because after a while, you could tell which parts of the code created which rows in the table. Sometimes the links lined up, sometimes they didn't. Sometimes the links were NULL (when they shouldn't be), and sometimes they were 0. We even had a few cyclic records which was fun.
My point is, you never know when you are going to need to write a quick script to batch import records, or write a new subsystem that references the same tables. It behooves us as programmers to program as defensively as possible. We can't assume that those who come after us will know as much (if anything) about how our schema should be used.
I'm not much of an SQL lover, but even I must say that the relational structure has its advantages.
It doesn't only allow validation. By providing the database with metadata describing the relations between the actual pieces information stored, a great number of optimizations are possible.
This makes it possible to quickly retrieve large, complex datasets. It also reduces the number of queries needed to make modifications and keep the data coherent, since most of the "book-keeping" is carried out automatically on the DB side of the connection.
One incredibly useful feature of foreign keys in most relational databases are cascades.
Suppose you have a families table and a persons table. Each family can have multiple people, but a person can only belong in one family (one-to-many relationship). If you have foreign keys and you delete a family row, the database can automatically update all the related people, either by deleting them or setting their foreign keys to null.
If you do not have this constraint, you must handle this situation yourself, in your own code.
RDBMSs are still very useful. Not sure why you wouldn't think so. Foreign key constraints can be used to maintain referential integrity (in other words, to provide a simple way to express 1:1, 1:many and many:many relationships. RDBMSs are also useful because there was a rich theory accompanying practical developments, unlike previous DBMSs. In particular, relational calculus/algebra are nice since they allow for good query optimization, normalization, etc.
Not sure if that really answers your question. Wikipedia might list some advantages of RDBMSs.
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
First off, I think you are talking about foreign key CONSTRAINTS. Foreign keys are just a logical design feature that says that this entity matches up with that one.
The reason foreign key constraints are useful are:
They help you adhere to the DRY (Don't repeat yourself) principle. Sure your app validates the relationship, but does it do it in several places? Are there multiple apps that access the same DB? Do you have to repeat the logic in each app? Hey, you could pull that logic out and use a common DLL for access to that data that enforces that logic.Better yet, what if that was built into the RDMBS so I didn't have to write custom code to do something so routine? Bam. Foreign key constraints.
If your app enforces the foreign key validations, how do you force users who are working directly in the DB to honor your rules? I know, I know. You shouldn't let users into the back-end directly, but you just try telling that to the data analysts when they have a project for corporate and you are the bottleneck.
As to the vague error. Wouldn't your argument be better stated as RDBMS X has vague errors when data fails foreign key constraint checks? The way you have generalized it, you could also argue that we should use paper ledgers instead of computers because the constraint had a vague error.
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
Yeah, that would be now, yesterday and probably long into the future.
I could go on forever about the reasons, but here is the big one...
It provides a common structured file format that is easy to extend, leverage by other applications. You may be too young to remember when every dang system had it's own proprietary structured file format, but it sucked. Plus, it forced you re-invent the wheel constantly in terms of things like indexing, a query language, locking, etc.
"I see no real use for foreign keys and thus no real use for
relational databases"
Judging by this remark, you seem to be underestimating what a relational database is for. Foreign key constraints aren't a defining feature of relational databases and certainly aren't the only reason for using such databases. The relational database model is a powerful and effective way to represent data and it remains so even if you decide you don't want to implement a foreign key constraint. I will therefore assume the question you really meant to ask is: Why are foreign keys useful in relational databases?
A foreign key constraint is just one kind of data integrity constraint. You can of course implement integrity rules outside the database but the DBMS is designed and optimised to do the job for you and is generally the most efficient place to do it because it is closest to the data structures. If you did it outside the database then you would have at least an extra round trip to retrieve the necessary data. You would also have to replicate the DBMS's locking/concurrency model in your application code.
The database optimiser can take advantage of constraints in the database to improve the performance of queries. It can't do that if the rules only exist in your application code.
If you have many applications sharing the same database then implementing data integrity rules in every application is impractical and expensive to maintain. Centralising the constraint logic makes more sense.
Various CASE tools and DBA tools will take advantage of database constraints, can reverse engineer them and use them to assist development and maintenance tasks.
In practice the meaning and function of a database constraint versus some procedural code that validates data only on entry is very different. If X is implemented in a database constraint then I know it is valid for every piece of data in the database. If X is implemented in the application when data is entered then I only know it applies to future data - I can't be sure it applies to everything already in the database (maybe X was only implemented today and didn't apply to the data entered yesterday).
Because they maintain the integrity of the database. If you have all your business logic in the application then in theory they are not needed, but are still useful as a safeguard against bad data.

Need refactoring ideas for Arrow Anti-Pattern

I have inherited a monster.
It is masquerading as a .NET 1.1 application processes text files that conform to Healthcare Claim Payment (ANSI 835) standards, but it's a monster. The information being processed relates to healthcare claims, EOBs, and reimbursements. These files consist of records that have an identifier in the first few positions and data fields formatted according to the specs for that type of record. Some record ids are Control Segment ids, which delimit groups of records relating to a particular type of transaction.
To process a file, my little monster reads the first record, determines the kind of transaction that is about to take place, then begins to process other records based on what kind of transaction it is currently processing. To do this, it uses a nested if. Since there are a number of record types, there are a number decisions that need to be made. Each decision involves some processing and 2-3 other decisions that need to be made based on previous decisions. That means the nested if has a lot of nests. That's where my problem lies.
This one nested if is 715 lines long. Yes, that's right. Seven-Hundred-And-Fif-Teen Lines. I'm no code analysis expert, so I downloaded a couple of freeware analysis tools and came up with a McCabe Cyclomatic Complexity rating of 49. They tell me that's a pretty high number. High as in pollen count in the Atlanta area where 100 is the standard for high and the news says "Today's pollen count is 1,523". This is one of the finest examples of the Arrow Anti-Pattern I have ever been priveleged to see. At its highest, the indentation goes 15 tabs deep.
My question is, what methods would you suggest to refactor or restructure such a thing?
I have spent some time searching for ideas, but nothing has given me a good foothold. For example, substituting a guard condition for a level is one method. I have only one of those. One nest down, fourteen to go.
Perhaps there is a design pattern that could be helpful. Would Chain of Command be a way to approach this? Keep in mind that it must stay in .NET 1.1.
Thanks for any and all ideas.
I just had some legacy code at work this week that was similar (although not as dire) as what you are describing.
There is no one thing that will get you out of this. The state machine might be the final form your code takes, but thats not going to help you get there, nor should you decide on such a solution before untangling the mess you already have.
First step I would take is to write a test for the existing code. This test isn't to show that the code is correct but to make sure you have not broken something when you start refactoring. Get a big wad of data to process, feed it to the monster, and get the output. That's your litmus test. if you can do this with a code coverage tool you will see what you test does not cover. If you can, construct some artificial records that will also exercise this code, and repeat. Once you feel you have done what you can with this task, the output data becomes your expected result for your test.
Refactoring should not change the behavior of the code. Remember that. This is why you have known input and known output data sets to validate you are not going to break things. This is your safety net.
Now Refactor!
A couple things I did that i found useful:
Invert if statements
A huge problem I had was just reading the code when I couldn't find the corresponding else statement, I noticed that a lot of the blocks looked like this
if (someCondition)
{
100+ lines of code
{
...
}
}
else
{
simple statement here
}
By inverting the if I could see the simple case and then move onto the more complex block knowing what the other one already did. not a huge change, but helped me in understanding.
Extract Method
I used this a lot.Take some complex multi line block, grok it and shove it aside in it's own method. this allowed me to more easily see where there was code duplication.
Now, hopefully, you haven't broken your code (test still passes right?), and you have more readable and better understood procedural code. Look it's already improved! But that test you wrote earlier isn't really good enough... it only tells you that you a duplicating the functionality (bugs and all) of the original code, and thats only the line you had coverage on as I'm sure you would find blocks of code that you can't figure out how to hit or just cannot ever hit (I've seen both in my work).
Now the big changes where all the big name patterns come into play is when you start looking at how you can refactor this in a proper OO fashion. There is more than one way to skin this cat, and it will involve multiple patterns. Not knowing details about the format of these files you're parsing I can only toss around some helpful suggestions that may or may not be the best solutions.
Refactoring to Patterns is a great book to assist in explainging patterns that are helpful in these situations.
You're trying to eat an elephant, and there's no other way to do it but one bite at a time. Good luck.
A state machine seems like the logical place to start, and using WF if you can swing it (sounds like you can't).
You can still implement one without WF, you just have to do it yourself. However, thinking of it like a state machine from the start will probably give you a better implementation then creating a procedural monster that checks internal state on every action.
Diagram out your states, what causes a transition. The actual code to process a record should be factored out, and called when the state executes (if that particular state requires it).
So State1's execute calls your "read a record", then based on that record transitions to another state.
The next state may read multiple records and call record processing instructions, then transition back to State1.
One thing I do in these cases is to use the 'Composed Method' pattern. See Jeremy Miller's Blog Post on this subject. The basic idea is to use the refactoring tools in your IDE to extract small meaningful methods. Once you've done that, you may be able to further refactor and extract meaningful classes.
I would start with uninhibited use of Extract Method. If you don't have it in your current Visual Studio IDE, you can either get a 3rd-party addin, or load your project in a newer VS. (It'll try to upgrade your project, but you will carefully ignore those changes instead of checking them in.)
You said that you have code indented 15 levels. Start about 1/2-way out, and Extract Method. If you can come up with a good name, use it, but if you can't, extract anyway. Split in half again. You're not going for the ideal structure here; you're trying to break the code in to pieces that will fit in your brain. My brain is not very big, so I'd keep breaking & breaking until it doesn't hurt any more.
As you go, look for any new long methods that seem to be different than the rest; make these in to new classes. Just use a simple class that has only one method for now. Heck, making the method static is fine. Not because you think they're good classes, but because you are so desperate for some organization.
Check in often as you go, so you can checkpoint your work, understand the history later, be ready to do some "real work" without needing to merge, and save your teammates the hassle of hard merging.
Eventually you'll need to go back and make sure the method names are good, that the set of methods you've created make sense, clean up the new classes, etc.
If you have a highly reliable Extract Method tool, you can get away without good automated tests. (I'd trust VS in this, for example.) Otherwise, make sure you're not breaking things, or you'll end up worse than you started: with a program that doesn't work at all.
A pairing partner would be helpful here.
Judging by the description, a state machine might be the best way to deal with it. Have an enum variable to store the current state, and implement the processing as a loop over the records, with a switch or if statements to select the action to take based on the current state and the input data. You can also easily dispatch the work to separate functions based on the state using function pointers, too, if it's getting too bulky.
There was a pretty good blog post about it at Coding Horror. I've only come across this anti-pattern once, and I pretty much just followed his steps.
Sometimes I combine the state pattern with a stack.
It works well for hierarchical structures; a parent element knows what state to push onto the stack to handle a child element, but a child doesn't have to know anything about its parent. In other words, the child doesn't know what the next state is, it simply signals that it is "complete" and gets popped off the stack. This helps to decouple the states from each other by keeping dependencies uni-directional.
It works great for processing XML with a SAX parser (the content handler just pushes and pops states to change its behavior as elements are entered and exited). EDI should lend itself to this approach too.