Advice about how to make Z3 evaluate simple constraints faster - c++

I'm trying to use Z3 (with C++ API) to check if lots of variable configurations satisfy my constraints, but I'm having big performance issues.
I'm looking for advice about which logic or parameter setting I might be able to use to improve the runtime, or hints about how I could try and feed the problem to Z3 in a different way.
Short description of what I'm doing and how I'm doing it:
//_______________Pseudocode and example_______________
context ctx()
solver s(ctx)
// All my variables are finite domain, maybe some 20 values at most, but usually less.
// They can only be ints, bools, or enums.
// There are not that many variables, maybe 10 or 20 for now.
//
// Since I need to be able to solve constraints of the type (e == f), where
// e and f are two different enum variables, all my
// enum types are actually contained in only one enumeration_sort(), populated
// with all the different values.
sort enum_sort = {"green", "red", "yellow", "blue", "null"}
expr x = ctx.int_const("x")
expr y = ctx.int_const("y")
expr b = ctx.bool_const("b")
expr e = ctx.constant("e", enum_sort)
expr f = ctx.constant("f", enum_sort)
// now I assert the finite domains, for each variable
// enum_value(s) is a helper function, that returns the matching enum expression
//
// Let's say that these are the domains:
//
// int x is from {1, 3, 4, 7, 8}
// int y is from {1, 2, 3, 4}
// bool b is from {0, 1}
// enum e is from {"green", "red", "yellow"}
// enum f is from {"red", "blue", "null"}
s.add(x == 1 || x == 3 || x == 3 || x == 7 || x == 8)
s.add(y == 1 || y == 2 || y == 3 || y == 4)
s.add(b == 0 || b == 1)
s.add(e == enum_value("green") || e == enum_value("red") || enum_value("yellow"))
s.add(f == enum_value("red") || f == enum_value("blue") || enum_value("null"))
// now I add in my constraints. There are also about 10 or 20 of them,
// and each one is pretty short
s.add(b => (x + y >= 5))
s.add((x > 1) => (e != f))
s.add((y == 4 && x == 1) || b)
// setup of the solver is now done. Here I start to query different combinations
// of values, and ask the solver if they are "sat" or "unsat"
// some values are left empty, because I don't care about them
expr_vector vec1 = {x == 1, y == 3, b == 1, e == "red"}
print(s.check(vec1))
expr_vector vec2 = {x == 4, e == "green", f == "null"}
print(s.check(vec2))
....
// I want to answer many such queries.
Of course, in my case this isn't hardcoded, but I read and parse the constraints, variables and their domains from files, then feed the info to Z3.
But it's slow.
Even for something like ten thousand queries, my program is already running over 10s. All of this is inside s.check(). Is it possible to make it run faster?
Hopefully it is, because what I'm asking of the solver doesn't look like it's overly difficult.
No quantifiers, finite domain, no functions, everything is a whole number or an enum, domains are small, the values of the numbers are small, there's only simple arithmetic, constraints are short, etc.
If I try to use parameters for parallel processing, or set the logic to "QF_FD", the runtime doesn't change at all.
Thanks in advance for any advice.

Is it always slow? Or does it get progressively slower as you query for more and more configurations using the same solver?
If it's the former, then your problem is just too hard and this is the price to pay. I don't see anything obviously wrong in what you've shown; though you should never use booleans as integers. (Just looking at your b variable in there. Stick to booleans as booleans, and integers as integers, and unless you really have to, don't mix the two together. See this answer for some further elaboration on this point: Why is Z3 slow for tiny search space?)
If it's the latter, you might want to create a solver from scratch for each query to clean-up all the extra stuff the solver created. While additional lemmas always help, they could also hurt performance if the solver cannot make good use of them in subsequent queries. And if you follow this path, then you can simply "parallelize" the problem yourself in your C++ program; i.e., create many threads and call the solver separately for each problem, taking advantage of many-cores your computer no doubt has and OS-level multi-tasking.
Admittedly, this is very general advice and may not apply directly to your situation. But, without a particular "running" example that we can see and inspect, it's hard to be any more specific than this.

Some Ideas:
1. Replace x == 1 || x == 3 || x == 3 || x == 7 || x == 8 with (1 <= x && x <= 8) && (x <= 1 || (3 <= x) && (x <= 4 || 7 <= x). Similar change with y.
rationale: the solver for linear arithmetic now knows that x is always confined in the interval [1,8], this can be useful information for other linear equalities/inequalities; it may be useful to also learn the trivial mutual exclusion constraints not(x <= 1) || not(3 <= x) and not(x <= 4) || not(7 <= x); there are now exactly 3 boolean assignments that cover your original 5 cases, this makes the reasoning of the linear arithmetic solver more cost-efficient because each invocation deals with a larger chunk of the search space. (Furthermore, it is more likely that clauses learned from conflicts are going to be useful with subsequent calls to the solver)
(Your queries may also contain set of values rather than specific assignments of values; this may allow one to prune some unsatisfiable ranges of values with fewer queries)
2. Just like #alias mentioned, Boolean variables ought to be Booleans and not 0/1 Integer variables. The example you provided is a bit confusing, b is declared as a bool const but then you state b == 0 || b == 1
3. I am not familiar with the enum_sort of z3, meaning that I don't know how it is internally encoded and what solving techniques are applied to deal with it. Therefore, I am not sure whether the solver may try to generate trivially inconsistent truth-assignments in which e == enum_value("green") e e == enum_value("red") are both assigned to true at the same time. This might be worth a bit of investigation. For instance, another possibility could be to declare e and f as Int and give them an appropriate interval domain (as contiguous as possible) with the same approach shown in 1., that will be interpreted by your software as a list of enum values. This should remove a number of Boolean assignments from the search space, make conflict clauses more effective and possibly speed-up the search.
4. Given the small number of problem variables, values and constraints, I would suggest you to try to encode everything using just the Bit-Vector theory and nothing else (using small-but-big-enough domains). If you then configure the solver to encode Bit-Vectors eagerly, then everything is bit-blasted into SAT, and z3 should only use Boolean Constraint Propagation for satisfiability, which is the cheapest technique.
This might be an X Y problem, why are you performing thousands of queries, what are you trying to achieve? Are you trying to explore all possible combination of values? Are you trying to perform model counting?

Related

Linear Programming: Depict logical expression in a boolean variable

I have a mixed integer linear program (MIP or MILP).
In the end I want a boolean variable im my linear program, that has the following properties.
I have two variables:
boolean b.
real x, with x being 0 or larger.
What I want to achieve is:
b == false if x == 0.
b == true if x > 0.
I found a way to depict if x is in specific range (e.g. between 2 and 3) via:
2*b <= x
x <= 3*b
The problem with the above testing formula is, that b will be true if x is in the given range and false if outside that range.
Does anybody know a way to set a boolean variable to false if x == 0 and to true if x is larger than 0?
If U is an upper bound of x then
if x > 0 ==> b == 1
can be made as
x <= U*b
The second part (x == 0 => b == 0) needs to be modified to
x < epsilon ==> b == 0
which can be made as
b <= 1 + x - epsilon
where epsilon is a small number. Other than good practice this is necessary, because solvers do not work in rational arithmetic (although there are some research efforts to make them do so), but with certain precision thresholds, and therefore quantities such as 10e-12 are treated as zero.
I hope this helps!
You could use the signum function http://en.wikipedia.org/wiki/Signum_function take the absolute value and negate it. Since you didn't name a specific programming language I keep it general.

difference between 'when' and 'if' in OpenModelica?

I'm new to OpenModelica and I've a few questions regarding the code of 'BouncingBall.mo' which is distributed with the software as example code.
1) what's the difference between 'when' and 'if'?
2)what's the purpose of variable 'foo' in the code?
3)in line(15) - "when {h <= 0.0 and v <= 0.0,impact}",, shouldn't the expression for 'when' be enough as "{h <= 0.0 and v <= 0.0}" because this becomes TRUE when impact occurs, what's the purpose of impact(to me its redundant here) and what does the comma(,) before impact means?
model BouncingBall
parameter Real e = 0.7 "coefficient of restitution";
parameter Real g = 9.81 "gravity acceleration";
Real h(start = 1) "height of ball";
Real v "velocity of ball";
Boolean flying(start = true) "true, if ball is flying";
Boolean impact;
Real v_new;
Integer foo;
equation
impact = h <= 0.0;
foo = if impact then 1 else 2;
der(v) = if flying then -g else 0;
der(h) = v;
when {h <= 0.0 and v <= 0.0,impact} then
v_new = if edge(impact) then -e * pre(v) else 0;
flying = v_new > 0;
reinit(v, v_new);
end when;
end BouncingBall;
OK, that's quite a few questions. Let me attempt to answer them:
What is the difference between when and if.
The questions inside a when clause are only "active" at the instant that the conditional expressions used in the when clause becomes active. In contrast, equations inside an if statement are true as long as the conditional expression stays true.
What's the purpose of foo?
Probably for visualization. It has no clear impact on the model that I can see.
Why is impact listed in the when clause.
One of the problems you have so-called Zeno systems like this is that it will continue to bounce indefinitely with smaller and smaller intervals. I suspect the impact flag here is meant to indicate when the system has stopped bouncing. This is normally done by checking to make sure that the conditional expression h<=0.0 actually becomes false at some point. Because event detection includes numerical tolerancing, at some point the height of the bounces never gets outside of the tolerance range and you need to detect this or the ball never bounces again and just continues to fall. (it's hard to explain without actually running the simulation and seeing the effect).
What does the , do in the when clause.
Consider the following: when {a, b} then. The thing is, if you want to have a when clause trigger when either a or b become true, you might think you'll write it as when a or b then. But that's not correct because that will only trigger when the first one becomes true. To see this better, consider this code:
a = time>1.0;
b = time>2.0;
when {a, b} then
// Equation set 1
end when;
when a or b then
// Equation set 2
end when;
So equation set 1 will get executed twice here because it will get executed when a becomes true and then again when b becomes true. But equation set 2 will only get executed once when a becomes true. That's because the whole expression a or b only becomes true at one instant.
These are common points of confusion about when. Hopefully these explanations help.

How to assign binary variable in AMPL in respect to another variable

I have a problem with AMPL modelling. Can you help me how to define a binary variable u that suppose to be equall to 0 when another variable x is also equall to 0 and 1 when x is different than 0?
I was trying to use logical expressions but solver that I am working with (cplex and minos) doesn't allow it.
My idea was:
subject to:
u || x != u && x
Take M a 'big' constant such as x < M holds, and assume x is an integer (or x >= 1 if x is continuous). You can use the two constraints:
u <= x (if x=0, then u=0)
x <= M*u (if x>0, then u=1)
with u a binary variable.
If now x is continuous and not necessarily greater than 1, you will have to adapt the constraints above (for example, the first constraint here would not be verified with x=0.3 and u=1).
The general idea is that you can (in many cases) replace those logical constraints with inequalities, using the fact that if a and b are boolean variables, then the statement "a implies b" can be written as b>=a (if a=1, then b=1).

Solving a linear equation in one variable

What would be the most efficient algorithm to solve a linear equation in one variable given as a string input to a function? For example, for input string:
"x + 9 – 2 - 4 + x = – x + 5 – 1 + 3 – x"
The output should be 1.
I am considering using a stack and pushing each string token onto it as I encounter spaces in the string. If the input was in polish notation then it would have been easier to pop numbers off the stack to get to a result, but I am not sure what approach to take here.
It is an interview question.
Solving the linear equation is (I hope) extremely easy for you once you've worked out the coefficients a and b in the equation a * x + b = 0.
So, the difficult part of the problem is parsing the expression and "evaluating" it to find the coefficients. Your example expression is extremely simple, it uses only the operators unary -, binary -, binary +. And =, which you could handle specially.
It is not clear from the question whether the solution should also handle expressions involving binary * and /, or parentheses. I'm wondering whether the interview question is intended:
to make you write some simple code, or
to make you ask what the real scope of the problem is before you write anything.
Both are important skills :-)
It could even be that the question is intended:
to separate those with lots of experience writing parsers (who will solve it as fast as they can write/type) from those with none (who might struggle to solve it at all within a few minutes, at least without some hints).
Anyway, to allow for future more complicated requirements, there are two common approaches to parsing arithmetic expressions: recursive descent or Dijkstra's shunting-yard algorithm. You can look these up, and if you only need the simple expressions in version 1.0 then you can use a simplified form of Dijkstra's algorithm. Then once you've parsed the expression, you need to evaluate it: use values that are linear expressions in x and interpret = as an operator with lowest possible precedence that means "subtract". The result is a linear expression in x that is equal to 0.
If you don't need complicated expressions then you can evaluate that simple example pretty much directly from left-to-right once you've tokenised it[*]:
x
x + 9
// set the "we've found minus sign" bit to negate the first thing that follows
x + 7 // and clear the negative bit
x + 3
2 * x + 3
// set the "we've found the equals sign" bit to negate everything that follows
3 * x + 3
3 * x - 2
3 * x - 1
3 * x - 4
4 * x - 4
Finally, solve a * x + b = 0 as x = - b/a.
[*] example tokenisation code, in Python:
acc = None
for idx, ch in enumerate(input):
if ch in '1234567890':
if acc is None: acc = 0
acc = 10 * acc + int(ch)
continue
if acc != None:
yield acc
acc = None
if ch in '+-=x':
yield ch
elif ch == ' ':
pass
else:
raise ValueError('illegal character "%s" at %d' % (ch, idx))
Alternative example tokenisation code, also in Python, assuming there will always be spaces between tokens as in the example. This leaves token validation to the parser:
return input.split()
ok some simple psuedo code that you could use to solve this problem
function(stinrgToParse){
arrayoftokens = stringToParse.match(RegexMatching);
foreach(arrayoftokens as token)
{
//now step through the tokens and determine what they are
//and store the neccesary information.
}
//Use the above information to do the arithmetic.
//count the number of times a variable appears positive and negative
//do the arithmetic.
//add up the numbers both positive and negative.
//return the result.
}
The first thing is to parse the string, to identify the various tokens (numbers, variables and operators), so that an expression tree can be formed by giving operator proper precedences.
Regular expressions can help, but that's not the only method (grammar parsers like boost::spirit are good too, and you can even run your own: its all a "find and recourse").
The tree can then be manipulated reducing the nodes executing those operation that deals with constants and by grouping variables related operations, executing them accordingly.
This goes on recursively until you remain with a variable related node and a constant node.
At the point the solution is calculated trivially.
They are basically the same principles that leads to the production of an interpreter or a compiler.
Consider:
from operator import add, sub
def ab(expr):
a, b, op = 0, 0, add
for t in expr.split():
if t == '+': op = add
elif t == '-': op = sub
elif t == 'x': a = op(a, 1)
else : b = op(b, int(t))
return a, b
Given an expression like 1 + x - 2 - x... this converts it to a canonical form ax+b and returns a pair of coefficients (a,b).
Now, let's obtain the coefficients from both parts of the equation:
le, ri = equation.split('=')
a1, b1 = ab(le)
a2, b2 = ab(ri)
and finally solve the trivial equation a1*x + b1 = a2*x + b2:
x = (b2 - b1) / (a1 - a2)
Of course, this only solves this particular example, without operator precedence or parentheses. To support the latter you'll need a parser, presumable a recursive descent one, which would be simper to code by hand.

Is it possible to define a variable in expression in C++?

I have this insane homework where I have to create an expression to validate date with respect to Julian and Gregorian calendar and many other things ...
The problem is that it must be all in one expression, so I can't use any ;
Are there any options of defining variable in expression? Something like
d < 31 && (bool leapyear = y % 4 == 0) || (leapyear ? d % 2 : 3) ....
where I could define and initialize one or more variables and use them in that one expression without using any ;?
Edit: It is explicitly said, that it must be a one-line expression. No functions ..
The way I'm doing it right now is writing macros and expanding them, so I end up with stuff like this
#define isJulian(d, m, y) (y < 1751 || (y == 1752 && (m < 9) || (m == 9 && d <= 2)))
#define isJulianLoopYear(y) (y % 4 == 0)
#define isGregorian(d, m, y) (y > 1573 || (y == 1752 && (m > 9) || (m == 9 && d > 13)))
#define isGregorianLoopYear(y) ((y % 4 == 0) || (y % 400 = 0))
// etc etc ....
looks like this is the only suitable way to solve the problem
edit: Here is original question
Suppose we have variables d m and y containing day, month and year. Task is to write one single expression which decides, if date is valid or not. Value should be true (non-zero value) if date is valid and false (zero) if date is not valid.
This is an example of expression (correct expression would look something like this):
d + 4 == y ^ 85 ? ~m : d * (y-2)
These are examples of wrong answers (not expressions):
if ( log ( d ) == 1752 ) m = 1;
or:
for ( i = 0; i < 32; i ++ ) m = m / 2;
Submit only file containing only one single expression without ; at the end. Don't submit commands or whole program.
Until 2.9.1752 was Julian calendar, after that date is Gregorian calendar
In Julian calendar is every year dividable by 4 a leap year.
In Gregorian calendar is leap year ever year, that is dividible by 4, but is not dividible by 100. Years that are dividable by 400 are another exception and are leap years.
1800, 1801, 1802, 1803, 1805, 1806, ....,1899, 1900, 1901, ... ,2100, ..., 2200 are not loop years.
1896, 1904, 1908, ..., 1996, 2000, 2004, ..., 2396,..., 2396, 2400 are loop years
In september 1752 is another exception, when 2.9.1752 was followed by 14.9.1752, so dates 3.9.1752, 4.9.1752, ..., 13.9.1752 are not valid.
((m >0)&&(m<13)&&(d>0)&&(d<32)&&(y!=0)&&(((d==31)&&
((m==1)||(m==3)||(m==5)||(m==7)||(m==8)||(m==10)||(m==12)))
||((d<31)&&((m!=2)||(d<29)))||((d==29)&&(m==2)&&((y<=1752)?((y%4)==0):
((((y%4)==0)&&((y%100)!=0))
||((y%400)==0)))))&&(((y==1752)&&(m==9))?((d<3)||(d>13)):true))
<evil>
Why would you define a new one, if you can reuse an existing one? errno is a perfect temporary variable.
</evil>
I think the intent of the homework is to ask you to do this without using variables, and what you are trying to do might defeat its purpose.
In standard C++, this is not possible. G++ has an extension known as statement expressions that can do that.
I don't believe you can, but even if you could, it would only have scope inside of the parentheses they are defined in (in your example) and cannot be used outside of them.
Your solution, which I will not provide fully for you, will probably go along these lines:
isJulian ? isJulianLeapyear : isGregorianLeapyear
To make it more specific, it could be like this:
isJulian ? (year % 4) == 0 : ((year % 4) == 0 || (year % 400) == 0)
You'll have to just make sure your algorithm is correct. I'm not a calender expert, so I wouldn't know about that.
First: Don't. It may be cute, but even if there's an extension that allows it, code golf is a dangerous game that will almost always end up causing mmore grief than it solves.
Okay, back to the 'real' question as defined by the homework. Can you make additional functions? If so, instead of capturing whether or not it's a leap year in a variable, make a function isLeapYear(int year) that returns the correct value.
Yes, that means you'll calculate it more than once. If that ends up being a performance issue, I'd be incredibly surprised... and it's a premature optimization to worry about that in the first place.
I'd be very surprised if you weren't allowed to write functions as part of doing this. It seems like that'd be half the point of an exercise like this.
......
Okay, so here's a quick overview of what you'll need to do.
First, basic verification - that month, day, year are possible values at all - month 0-11 (assuming 0-based), day 0-30, year non-negative (assuming that's a constraint).
Once you're past that, I'd probably check for the 1752 special cases.
If that's not relevant, the 'regular' months can be handled pretty simply.
This leaves us with the leap year cases, which can be broken down into two expressions - whether something is a leap year (which will be broken down additionally based on gregorian/julian), and whether the date is valid at that point.
So, at the highest level, your expression looks something like this:
areWithinRange(d,m,y) && passes1752SpecialCases(d,m,y) && passes30DayMonths(d,m,y) && passes31DayMonths(d,m,y) && passesFebruaryChecks(d,m,y)
If we assume that we only return false from our sub-expressions if we actively detect a rule break (31 days in June for the 30DayMonth rule returns false, but 30 days in February is irrelevant and so passes true), then we can pretty much say that the logic at that level is correct.
At this point, I'd write separate functions for the individual pieces (as pure expressions, a single return ... statement). Once you've gotten those in place, you can replace the method call in your top-level expression with the expanded version. Just make sure you parenthesize (is that a word?) everything sufficiently.
I'd also make a test harness program that uses the expression and has a number of valid and invalid inputs, and verifies that you're doing the right thing. You can write that in a function for ease of cut and paste for the final turn-in by doing something like:
bool isValidDate(int d, int m, int y)
{
return
// your expression here
}
Since the expression will be on a line by itself, it'll be easy to cut and paste.
You may find other ways to simplify your logic - excepting the 1752 special cases, days between 1 and 28 are always valid, for instance.
Considering it is homework, I think the best advice would be an approach to deriving your own solution.
If I were tackling this assignment, I would start by breaking the rules.
I would write a c++ function that given the variables d, m, and y, returns a boolean result on the validity of the date. I would use as many non-recursive helper functions as needed, and feel free to use if, else if, and else, but no looping aloud.
I would then Inline all helper functions
I would reduce all if, else if, and else statement to ? : notation
If I was successful at limiting my use of variables, I might be able to reduce all this to a single function with no variables - the body of which will contain the single expression I seek.
Good Luck.
You clearly have to have the date passed in somehow. Beyond that, all you're really going to be doing is chaining && and || (assuming we get the date as a tm struct):
#include <ctime>
bool validate(tm date)
{
return (
// sanity check that all values are positive
date.tm_mday >= 1 && date.tm_mon >= 0 && date.tm_year >= 0
// check for valid days
&& ((date.tm_mon == 0 && date.tm_mday <= 31)
|| (date.tm_mon == 1 && date.tm_mday <= (date.tm_year % 4 ? 28 : 29))
|| (date.tm_mon == 2 && date.tm_mday <= 31)
// yadda yadda for the other months
|| (date.tm_mon == 11 && date.tm_mday <= 31))
);
}
The parenthesis around date.tm_year % 4 ? 28 : 29 actually aren't needed, but I'm including them for readability.
UPDATE
Looking at a comment, you'll also need similar rules to validate dates that don't exist in the Gregorian calendar.
UPDATE II
Since you're dealing with dates in the past you will need to implement a more correct leap year test. However, I generally deal with dates in the future, and this incorrect leap year test will give correct results in 2012, 2016, 2020, 2024, 2028, 2032, 2036, 2040, 2044, 2048, 2052, 2056, 2060, 2064, 2068, 2072, 2076, 2080, 2084, 2088, 2092, and 2096. I will make a prediction that before this test fails in 2100 silicon-based computers will be forgotten relics. I seriously doubt we will use C++ on the quantum computers in use then. Besides, I won't be the programmer assigned to fix the bug.