For various toy projects I'd like to be able to embed object languages into the PolyML top level, like the backtick syntax for HOL, where expressions between backticks are parsed by a custom parser.
I don't mind the specific delimiting syntax: backticks `...`, guillemets <<...>>, or something like {|...|}. I just want to be able to write expressions at the top-level and have them parsed by a custom parser.
For example if I had a datatype like
datatype expression =
Add of expression * expression
| Int of int
| Mul of expression * expression
I'd like to be able to type the following:
> `3 + 2 * 5`;
val it = Add (Int 3, Mul (Int 2, Int 5)): expression
Is this possible (in a simple way)?
For the case you have, you could approximate this with something like this
val op + = Add
val op * = Mul
val ` = Int
val it = `3 + `2 * `5
However, this isn't going to use a custom parser or anything, and will just rely on the existing parser.
If you wanted to use a custom parser the most straightforward way would simply be to write a function parse : string -> expression and apply it manually on the top level.
I want to build a complicated filter:
queryset.filter(
(Q(k__contains=“1”) & Q(k__contains=“2”) & (~Q(k__contains=“3”))) |
(Q(k1__contains=“2”) & (~Q(k4__contains=“3”)))
)
The structure is fixed, but the query is dynamic and depends on a case specified by given input.
Tthe input could be for example :
(k=1&k=2&~k=3) | (k1=1&~k4=3)
or
(k=1&~k=3) | (k1=1&~k4=3) | (k=4&~k=3)
How to add parentheses to build this query to make it run as expected?
Two answers are presented: one simple and one perfect
1) A simple answer for similarly simple expressions:
It your expressions are simple and it seems better to read it by a short Python code. The most useful simplification is that parentheses are not nested. Less important simplification is that the operator OR ("|") is never in parentheses. The operator NOT is used only inside parentheses. The operator NOT is never double repeated ("~~").
Syntax: I express these simplifications in a form of EBNF syntax rules that could be useful later in a discussion about Python code.
expression = term, [ "|", term ];
term = "(", factor, { "&", factor }, ")";
factor = [ "~" ], variable, "=", constant;
variable = "a..z_0..9"; # anything except "(", ")", "|", "&", "~", "="
constant = "0-9_a-z... ,'\""; # anything except "(", ")", "|", "&", "~", "="
Optional white spaces are handled easily by .strip() method to can freely accept expressions like in mathematics. White spaces inside constants are supported.
Solution:
def compile_q(input_expression):
q_expression = ~Q() # selected empty
for term in input_expression.split('|'):
q_term = Q() # selected all
for factor in term.strip().lstrip('(').rstrip(')').split('&'):
left, right = factor.strip().split('=', 1)
negated, left = left.startswith('~'), left.lstrip('~')
q_factor = Q(**{left.strip() + '__contains': right.strip()})
if negated:
q_factor = ~q_factor
q_term &= q_factor
q_expression |= q_term
return q_expression
The superfluous empty and full Q expressions ~Q() and Q() are finally optimized and eliminated by Django.
Example:
>>> expression = "(k=1&k=2&~k=3) | ( k1 = 1 & ~ k4 = 3 )"
>>> qs = queryset.filter(compile_q(expression))
Check:
>>> print(str(qs.values('pk').query)) # a simplified more readable sql print
SELECT id FROM ... WHERE
((k LIKE %1% AND k LIKE %2% AND NOT (k LIKE %3%)) OR (k1 LIKE %1% AND NOT (k4 LIKE %3%)))
>>> sql, params = qs.values('pk').query.get_compiler('default').as_sql()
>>> print(sql); print(params) # a complete parametrized escaped print
SELECT... k LIKE %s ...
[... '%2%', ...]
The first "print" is a Django command for simplified more readable SQL without apostrophes and escaping, because it is actually delegated to the driver. The second print is a more complicated parametrized SQL command with all possible safe escaping.
2) Perfect, but longer solution
This answer can compile any combination of boolean operators "|", "&", "~", any level of nested parentheses and a comparision operator "=" to a Q() expression:
Solution: (not much more complicated)
import ast # builtin Python parser
from django.contrib.auth.models import User
from django.db.models import Q
def q_combine(node: ast.AST) -> Q:
if isinstance(node, ast.Module):
assert len(node.body) == 1 and isinstance(node.body[0], ast.Expr)
return q_combine(node.body[0].value)
if isinstance(node, ast.BoolOp):
if isinstance(node.op, ast.And):
q = Q()
for val in node.values:
q &= q_combine(val)
return q
if isinstance(node.op, ast.Or):
q = ~Q()
for val in node.values:
q |= q_combine(val)
return q
if isinstance(node, ast.UnaryOp):
assert isinstance(node.op, ast.Not)
return ~q_combine(node.operand)
if isinstance(node, ast.Compare):
assert isinstance(node.left, ast.Name)
assert len(node.ops) == 1 and isinstance(node.ops[0], ast.Eq)
assert len(node.comparators) == 1 and isinstance(node.comparators[0], ast.Constant)
return Q(**{node.left.id + '__contains': str(node.comparators[0].value)})
raise ValueError('unexpected node {}'.format(type(node).__name__))
def compile_q(expression: str) -> Q:
std_expr = (expression.replace('=', '==').replace('~', ' not ')
.replace('&', ' and ').replace('|', ' or '))
return q_combine(ast.parse(std_expr))
Example: the same as in my previous answer a more complex following:
>>> expression = "~(~k=1&(k1=2|k1=3|(k=5 & k4=3))"
>>> qs = queryset.filter(compile_q(expression))
The same example gives the same result, a more nested example gives a correct more nested result.
EBNF syntax rules are not important in this case, because no parser is implemented in this solution and the standard Python parser AST is used. It is little different with recursion.
expression = term, [ "|", term ];
term = factor, { "&", factor };
factor = [ "~" ], variable, "=", constant | [ "~" ], "(", expression, ")";
variable = "a..z_0..9"; # any identifier acceptable by Python, e.g. not a Python keyword
constant = "0-9_a-z... ,'\""; # any numeric or string literal acceptable by Python
This answer can compile any combination of boolean operators "|", "&", "~", any level of nested parentheses and a comparision operator "=" to a Q() expression:
Solution: (not much more complicated)
import ast # builtin Python parser
from django.contrib.auth.models import User
from django.db.models import Q
def q_combine(node: ast.AST) -> Q:
if isinstance(node, ast.Module):
assert len(node.body) == 1 and isinstance(node.body[0], ast.Expr)
return q_combine(node.body[0].value)
if isinstance(node, ast.BoolOp):
if isinstance(node.op, ast.And):
q = Q()
for val in node.values:
q &= q_combine(val)
return q
if isinstance(node.op, ast.Or):
q = ~Q()
for val in node.values:
q |= q_combine(val)
return q
if isinstance(node, ast.UnaryOp):
assert isinstance(node.op, ast.Not)
return ~q_combine(node.operand)
if isinstance(node, ast.Compare):
assert isinstance(node.left, ast.Name)
assert len(node.ops) == 1 and isinstance(node.ops[0], ast.Eq)
assert len(node.comparators) == 1 and isinstance(node.comparators[0], ast.Constant)
return Q(**{node.left.id + '__contains': str(node.comparators[0].value)})
raise ValueError('unexpected node {}'.format(type(node).__name__))
def compile_q(expression: str) -> Q:
std_expr = (expression.replace('=', '==').replace('~', ' not ')
.replace('&', ' and ').replace('|', ' or '))
return q_combine(ast.parse(std_expr))
Example: the same as in my previous answer a more complex following:
>>> expression = "~(~k=1&(k1=2|k1=3|(k=5 & k4=3))"
>>> qs = queryset.filter(compile_q(expression))
The same example gives the same result, a more nested example gives a correct more nested result.
EBNF syntax rules are not important in this case, because no parser is implemented in this solution and the standard Python parser AST is used. It is little different with recursion.
expression = term, [ "|", term ];
term = factor, { "&", factor };
factor = [ "~" ], variable, "=", constant | [ "~" ], "(", expression, ")";
variable = "a..z_0..9"; # any identifier acceptable by Python, e.g. not a Python keyword
constant = "0-9_a-z... ,'\""; # any numeric or string literal acceptable by Python
Finally I failed to make this work using django Q.
Bug use extra to build code like
queryset.extra(where =
["(k1 like '%1%' and k2 like '%2%' and (k3 not like '%3%')) or (k1 like '%4%' and (k3 not like '%3%'))"]
)
and it works.
😂
I need to create a regular expression (for program in haskell) that will catch the strings containing "X" and ".", assuming that there are 4 "X" and only one ".". It cannot catch any string with other X-to-dot relations.
I have thought about something like
[X\.]{5}
But it catches also "XXXXX" or ".....", so it isn't what I need.
That's called permutation parsing, and while "pure" regular expressions can't parse permutations it's possible if your regex engine supports lookahead. (See this answer for an example.)
However I find the regex in the linked answer difficult to understand. It's cleaner in my opinion to use a library designed for permutation parsing, such as megaparsec.
You use the Text.Megaparsec.Perm module by building a PermParser in a quasi-Applicative style using the <||> operator, then converting it into a regular MonadParsec action using makePermParser.
So here's a parser which recognises any combination of four Xs and one .:
import Control.Applicative
import Data.Ord
import Data.List
import Text.Megaparsec
import Text.Megaparsec.Perm
fourXoneDot :: Parsec Dec String String
fourXoneDot = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = [a, b, c, d, e]
x = char 'X'
dot = char '.'
I'm applying the mkFive function, which just stuffs its arguments into a five-element list, to four instances of the x parser and one dot, combined with <||>.
ghci> parse fourXoneDot "" "XXXX."
Right "XXXX."
ghci> parse fourXoneDot "" "XX.XX"
Right "XXXX."
ghci> parse fourXoneDot "" "XX.X"
Left {- ... -}
This parser always returns "XXXX." because that's the order I combined the parsers in: I'm mapping mkFive over the five parsers and it doesn't reorder its arguments. If you want the permutation parser to return its input string exactly, the trick is to track the current position within the component parsers, and then sort the output.
fourXoneDotSorted :: Parsec Dec String String
fourXoneDotSorted = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = map snd $ sortBy (comparing fst) [a, b, c, d, e]
x = withPos (char 'X')
dot = withPos (char '.')
withPos = liftA2 (,) getPosition
ghci> parse fourXoneDotSorted "" "XX.XX"
Right "XX.XX"
As the megaparsec docs note, the implementation of the Text.Megaparsec.Perm module is based on Parsing Permutation Phrases; the idea is described in detail in the paper and the accompanying slides.
The other answers look quite complicated to me, given that there are only five strings in this language. Here's a perfectly fine and very readable regex for this:
\.XXXX|X\.XXX|XX\.XX|XXX\.X|XXXX\.
Are you attached to regex, or did you just end up at regex because this was a question you didn't want to try answering with applicative parsers?
Here's the simplest possible attoparsec implementation I can think of:
parseDotXs :: Parser ()
parseDotXs = do
dotXs <- count 5 (satisfy (inClass ".X"))
let (dots,xS) = span (=='.') . sort $ dotXs
if (length dots == 1) && (length xS == 4) then do
return ()
else do
fail "Mismatch between dots and Xs"
You may need to adjust slightly depending on your input type.
There are tons of fancy ways to do stuff in applicative parsing land, but there is no rule saying you can't just do things the rock-stupid simple way.
Try the following regex :
(?<=^| )(?=[^. ]*\.)(?=(?:[^X ]*X){4}).{5}(?=$| )
Demo here
If you have one word per string, you can simplify the regex by this one :
^(?=[^. \n]*\.)(?=(?:[^X \n]*X){4}).{5}$
Demo here
Hi have an array of numbers as string:
val original_array = Array("-0,1234567",......) which is a string and I want to convert to a numeric Array.
val new_array = Array("1234567", ........)
How can I aheive this in scala?
Using original_array.toDouble is giving error
The simple answer is ...
val arrNums = Array("123", "432", "99").map(_.toDouble)
... but this a little dangerous because it will throw if any of the strings are not proper numbers.
This is safer...
val arrNums = Array("123", "432", "99").collect{ case n
if n matches """\d+""" => n.toDouble
}
... but you'll want to use a regex pattern that covers all cases. This example won't recognize floating point numbers ("1.1") or negatives ("-4"). Something like """-?\d*\.?\d+""" might fit your requirements.
Is it possible to transform expressions like this, using R:
IF(expr.bool, expr1, expr2) into if (expr.bool) expr1 else expr2
AND(expr.bool1, expr.bool2) (or &&) into expr.bool1 & expr.
OR(expr.bool1, expr.bool2) (or ||) into expr.bool1 | expr.bool2
NOT(expr.bool) into !expr.bool
TRUE into 1
FALSE into 0
and so on.
I have tried the ast package and using substitute to build an expression tree and then adapt them to the new syntax but no one seems to work.
What I want to do is to read an expression string using the syntax in the left, parse it and then use eval to get a float result.
p.s. I am completely new to R.
Everything that does something is a function in R. Just do something like this:
IF <- `if`
IF(FALSE, 1, 2)
#[1] 2
NOT <- `!`
NOT(TRUE)
#[1] FALSE
Then eval/parse your strings.
Coercing logical values to integers can be done with as.integer.