build a parse tree from if statement - if-statement

My goal is to be able to say if a particular JSON objects matches a particular set of rules that are stored in a .yaml file.
A rule is similar to something that we can find in an if statement. For example:
if( A and B OR C):
#do something
where A, B and C are some conditions, for example to check if one attribute of the JSON object is greater than 0 (just an example).
I would like to build the corresponding parse tree of a particular if statement.
In such a tree, the nodes would be the logical operators, namely AND and OR and the leaves would be the actual conditions. Let's consider the following condition:
(((A and B) OR (C AND B AND D AND E)) AND F). That would give us the following parse tree:
After building such a tree, one would iterate from the far left of the tree (or far right, does not matter) and start performing the tests. If we start for example in the far left of the tree, that would be the leaf A. One verifies if A is satisfied, if it is not, then the JSON object does not match the rule since its parent is an AND operator, if it does, one would move to the second test, namely the condition B. Then check the conditions C, B and D and so on until you perform all the tests.
What approach can I use to build such trees ? Are there existing parsers that I can use to achieve this ?
If there is a better way to achieve this, I would like to hear it.
Thank you!

You will indeed need to parse the text in some way.
I would suggest relying on a regular expression to tokenise the input, so that you get the following tokens:
(
)
AND
OR
Anything else will be considered an atomic condition.
Then iterate those tokens and use a stack or recursion to build the tree.
From your other questions I see you use Python, so in Python you could create a class for each operator (OR, AND) and let it inherit from list. Such a list will either have as members strings (leaves of the tree) or more OR or AND list instances. So the whole tree is actually a nested list structure.
Here is how you would implement those list-based classes:
# A generic class from which all operators derive
class Node(list):
#property
def label(self):
return self.__class__.__name__
def __repr__(self):
return self.label + super().__repr__()
# Subclass for each operator: no additional logic is needed
class AND(Node): pass
class OR(Node): pass
Here is the parser:
import re
def parse(s):
# Rely completely on the operators that have been defined as subclasses of Node
Operators = { cls.__name__ : cls for cls in Node.__subclasses__() }
# Create a regular expression based on those operator names (AND, OR)
regex = r"\s*([()]|" + "|".join(fr"\b{operator}\b" for operator in Operators.keys()) + ")\s*"
# Tokenise the input
tokens = iter((token for token in re.split(regex, s, flags=re.IGNORECASE) if token))
def dfs(end=")"):
operator = ""
operands = []
while True:
token = next(tokens, "")
if not token or token == ")":
raise ValueError(f"Operand expected, but got '{token}'")
operands.append(dfs() if token == "(" else token)
token = next(tokens, "")
if token == end:
return Operators[operator](operands) if operator else operands[0]
utoken = token.upper()
if utoken in Operators:
if operator and utoken != operator:
raise ValueError("Use parentheses to indicate operator precedence")
operator = utoken
else:
raise ValueError(f"{', '.join(Node.keys())} expected, but got '{token}'")
return dfs("")
Use it as follows:
# Example run
s = "(((A and B) OR (C AND B AND D AND E)) AND F)"
tree = parse(s)
print(tree)
# Demo on how to address specific nodes in the tree
print(tree.label, tree[0].label, tree[0][0].label, tree[0][0][0])

Related

What is the idiomatic (and fast) way of treating the empty list/Seq as failure in a short-circuiting operation?

I have a situation where I am using functions to model rule applications, with each function returning the actions it would take when applied, or, if the rule cannot be applied, the empty list. I have a number of rules that I would like to try in sequence and short-circuit. In other languages I am used to, I would treat the empty sequence as false/None and chain them with orElse, like this:
def ruleOne(): Seq[Action] = ???
def ruleTwo(): Seq[Action] = ???
def ruleThree(): Seq[Action] = ???
def applyRules(): Seq[Action] = ruleOne().orElse(ruleTwo).orElse(ruleThree)
However, as I understand the situation, this will not work and will, in fact, do something other than what I expect.
I could use return which feels bad to me, or, even worse, nested if statements. if let would have been great here, but AFAICT Scala does not have that.
What is the idiomatic approach here?
You have different approaches here.
One of them is combining all the actions inside a Seq (so creating a Seq[Seq[Action]]) and then using find (it will return the first element that matches a given condition). So, for instance:
Seq(ruleOne, ruleTwo, ruleThree).find(_.nonEmpty).getOrElse(Seq.empty[Action])
I do not know clearly your domain application, but the last getOrElse allows to convert the Option produced by the find method in a Seq. This method though eval all the sequences (no short circuit).
Another approach consists in enriching Seq with a method that simulated your idea of orElse using pimp my library/extensions method:
implicit class RichSeq[T](left: Seq[T]) {
def or(right: => Seq[T]): Seq[T] = if(left.isEmpty) { right } else { left }
}
The by name parameter enable short circuit evaluation. Indeed, the right sequence is computed only if the left sequence is empty.
Scala 3 has a better syntax to this kind of abstraction:
extension[T](left: Seq[T]){
def or(rigth: => Seq[T]): Seq[T] = if(left.nonEmpty) { left } else { rigth }
}
In this way, you can call:
ruleOne or ruleTwo or ruleThree
Scastie for scala 2
Scastie for scala 3

Evaluate constraint expression as boolean

I want to evaluate if a constraint is respected or not in Pyomo when the values of the variables contained in constraint expression are known.
Use case: We know that one particular constraint sometimes makes the problem infeasible, depending on the value of the variable. Instead of sending the problem to the solver to test if the problem is feasible, converting the constraint expression to a boolean type would be enough to determine if the constraint is the culprit.
For the sake of providing a feasible example, here would be the code:
from pyomo.environ import ConcreteModel, Var, Constraint
model = ConcreteModel()
model.X = Var()
def constraint_rule(model):
return model.X <= 1
model.a_constraint = Constraint(rule=constraint_rule)
Now, let's try to work with the expression to evaluate:
# Let's define the expression in this way:
expression = constraint_rule(model)
# Let's show that the expression is what we expected:
print(str(expression))
The previous statement should print X <= 1.0.
Now, the tricky part is how to evaluate the expression.
if expression == True:
print("feasible")
else:
print("infeasible")
creates an TypeError Exception (TypeError: Cannot create an EqualityExpression where one of the sub-expressions is a relational expression: X <= 1.0).
The last example doesn't work because constraint_rule doesn't return a boolean but a Pyomo expression.
Finally, I know that something like
def evaluate_constraint_a_expression(model):
return value(model.X) <= 1
would work, but I can't assume that I will always know the content of my constraint expression, so I need a robust way of evaluating it.
Is there a clever way of achieving this? Like, evaluating the expression as a boolean and evaluating the left hand side and right hand side of the expression at the same time?
The solution is to use value function. Even if it says that it evaluates an expression to a numeric value, it also converts the expression to a boolean value if it is an equality/inequality expression, like the rule of a constraint.
Let's suppose that the model is defined the way it is in the question, then the rest of the code should be:
from pyomo.environ import value
if value(expression) == True:
print("feasible")
else:
print("infeasible")
where expression is defined as written in the question.
However, be advised that numeric precision in Python using this method can be different than the one provided by the solver. Therefore, it is possible that this method will show that a constraint is infeasible while it is just a matter of numeric imprecision of under 1e-10. So, while it is useful in finding if most constraints are feasible, it also generates some false positives.

How to get a value from multiple functions in Pyomo

Let's suppose that the objective function is
max z(x,y) = f1(x) - f2(y)
where f1 is function of variables x and f2 is functions of variables y.
This could be written in Pyomo as
def z(model):
return f1(model) - f2(model)
def f1(model):
return [some summation of x variables with some coefficients]
def f2(model):
return [some summation of y variables with some coefficients]
model.objective = Objective(rule=z)
I know it is possible to get the numeric value of z(x,y) easily by calling (since it is the objective function) :
print(model.objective())
but is there a way to get the numeric value of any of these sub-functions separetedly after the optimization, even if they are not explicitly defined as objectives?
I'll answer your question in terms of a ConcreteModel, since rules in Pyomo, for the most part, are nothing more than a mechanism to delay building a ConcereteModel. For now, they are also required to define indexed objects, but that will likely change soon.
First, there is nothing stopping you from defining those "rules" as standard functions that take in some argument and return a value. E.g.,
def z(x, y):
return f1(x) - f2(y)
def f1(x):
return x + 1
def f2(x):
return y**2
Now if you call any of these functions with a built-in type (e.g., f(1,5)), you will get a number back. However, if you call them with Pyomo variables (or Pyomo expressions) you will get a Pyomo expression back, which you can assign to an objective or constraint. This works because Pyomo modeling components, such as variables, overload the standard algebraic operators like +, -, *, etc. Here is an example of how you can build an objective with these functions:
import pyomo.environ as aml
m = aml.ConcreteModel()
m.x = aml.Var()
m.y = aml.Var()
m.o = aml.Objective(expr= z(m.x, m.y))
Now if m.x and m.y have a value loaded into them (i.e., the .value attribute is something other than None), then you can call one of the sub-functions with them and evaluate the returned expression (slower)
aml.value(f1(m.x))
aml.value(f2(m.y))
or you can extract the value from them and pass that to the sub-functions (faster)
f1(m.x.value)
f2(m.y.value)
You can also use the Expression object to store sub-expressions that you want to evaluate on the fly or share inside multiple other expression on a model (all of which you can update by changing what expression is stored under the Expression object).

implement set union by typing a+b where a and b are two dictionaries

I want to implement the built in data type set in python using a class and dictionary in python. I have included certain basic functions, but i could not perform the union and intersection operations defined on it. I wish to just write c=a+b where a and b are two dictionaries c is yet another dictionary whose keys give the union of 'a' and 'b'. I tried with try and except as given in my code below, but i want a better solution. can anyone help me with this?
class My_Set:
def __init__(self,listt):
if listt:
self.dictionary={}
i=0
for x in listt:
self.dictionary[x]=len(x)
i=i+1
else:
self.dictionary={}
def is_element(self,element):
if element in self.dictionary:
return True
else:
return False
def remove(self,element):
if element in self.dictionary:
self.dictionary.pop(element)
else:
print 'element missing'
def add_element(self,element):
self.dictionary.update({element:len(element)})
#return self.dictionary
def union(self,other):
self.dictionary.update(other.dictionary)
return self.dictionary.keys()

What's the difference between `::` and `+:` for prepending to a list)?

List has 2 methods that are specified to prepend an element to an (immutable) list:
+: (implementing Seq.+:), and
:: (defined only in List)
+: technically has a more general type signature—
def +:[B >: A, That](elem: B)(implicit bf: CanBuildFrom[List[A], B, That]): That
def ::[B >: A](x: B): List[B]
—but ignoring the implicit, which according to the doc message merely requires That to be List[B], the signatures are equivalent.
What is the difference between List.+: and List.::? If they are in fact identical, I assume +: would be preferred to avoid depending on the concrete implementation List. But why was another public method defined, and when would client code call it?
Edit
There is also an extractor for :: in pattern matching, but I'm wondering about these particular methods.
See also: Scala list concatenation, ::: vs ++
The best way to determine the difference between both methods is to look it the source code.
The source of :::
def ::[B >: A] (x: B): List[B] =
new scala.collection.immutable.::(x, this)
The source of +::
override def +:[B >: A, That](elem: B)(implicit bf: CanBuildFrom[List[A], B, That]): That = bf match {
case _: List.GenericCanBuildFrom[_] => (elem :: this).asInstanceOf[That]
case _ => super.+:(elem)(bf)
}
As you can see, for List, both methods do one and the same (the compiler will choose List.canBuildFrom for the CanBuildFrom argument).
So, which method to use? Normally one would choose the interface (+:) than the implementation (::) but because List is a general data structure in functional languages it has its own methods which are widely used. Many algorithms are build up the way how List works. For example you will find a lot of methods which prepend single elements to List or call the convenient head or tail methods because all these operations are O(1). Therefore, if you work locally with a List (inside of single methods or classes), there is no problem to choose the List-specific methods. But if you want to communicate between classes, i.e. you want to write some interfaces, you should choose the more general Seq interface.
+: is more generic, since it allows the result type to be different from the type of the object it is called on. For example:
scala> Range(1,4).+:(0)
res7: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2, 3)