Quotation mechanism for PolyML top level - sml

For various toy projects I'd like to be able to embed object languages into the PolyML top level, like the backtick syntax for HOL, where expressions between backticks are parsed by a custom parser.
I don't mind the specific delimiting syntax: backticks `...`, guillemets <<...>>, or something like {|...|}. I just want to be able to write expressions at the top-level and have them parsed by a custom parser.
For example if I had a datatype like
datatype expression =
Add of expression * expression
| Int of int
| Mul of expression * expression
I'd like to be able to type the following:
> `3 + 2 * 5`;
val it = Add (Int 3, Mul (Int 2, Int 5)): expression
Is this possible (in a simple way)?

For the case you have, you could approximate this with something like this
val op + = Add
val op * = Mul
val ` = Int
val it = `3 + `2 * `5
However, this isn't going to use a custom parser or anything, and will just rely on the existing parser.
If you wanted to use a custom parser the most straightforward way would simply be to write a function parse : string -> expression and apply it manually on the top level.

Related

How to use match with regular expressions in Scala

I am starting to learn Scala and want to use regular expressions to match a character from a string so I can populate a mutable map of characters and their value (String values, numbers etc) and then print the result.
I have looked at several answers on SO and gone over the Scala Docs but can't seem to get this right. I have a short Lexer class that currently looks like this:
class Lexer {
private val tokens: mutable.Map[String, Any] = collection.mutable.Map()
private def checkCharacter(char: Character): Unit = {
val Operator = "[-+*/^%=()]".r
val Digit = "[\\d]".r
val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => tokens(c) = "Operator"
case Digit(c) => tokens(c) = Integer.parseInt(c)
case Other(c) => tokens(c) = "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
for (s <- inputArray)
checkCharacter(s)
for((key, value) <- tokens)
println(key + ": " + value)
}
}
I'm pretty confused by the sort of strange method syntax, Operator(c), that I have seen being used to handle the value to match and am also unsure if this is the correct way to use regex in Scala. I think what I want this code to do is clear, I'd really appreciate some help understanding this. If more info is needed I will supply what I can
This official doc has lot's of examples: https://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html. What might be confusing is the type of the regular expression and its use in pattern matching...
You can construct a regex from any string by using .r:
scala> val regex = "(something)".r
regex: scala.util.matching.Regex = (something)
Your regex becomes an object that has a few useful methods to be able to find matching groups like findAllIn.
In Scala it's idiomatic to use pattern matching for safe extraction of values, thus Regex class also has unapplySeq method to support pattern matching. This makes it an extractor object. You can use it directly (not common):
scala> regex.unapplySeq("something")
res1: Option[List[String]] = Some(List(something))
or you can let Scala compiler call it for you when you do pattern matching:
scala> "something" match {
| case regex(x) => x
| case _ => ???
| }
res2: String = something
You might ask why exactly this return type on unapply/unapplySeq. The doc explains it very well:
The return type of an unapply should be chosen as follows:
If it is just a test, return a Boolean. For instance case even().
If it returns a single sub-value of type T, return an Option[T].
If you want to return several sub-values T1,...,Tn, group them in an optional tuple Option[(T1,...,Tn)].
Sometimes, the number of values to extract isn’t fixed and we would
like to return an arbitrary number of values, depending on the input.
For this use case, you can define extractors with an unapplySeq method
which returns an Option[Seq[T]]. Common examples of these patterns
include deconstructing a List using case List(x, y, z) => and
decomposing a String using a regular expression Regex, such as case
r(name, remainingFields # _*) =>
In short your regex might match one or more groups, thus you need to return a list/seq. It has to be wrapped in an Option to comply with extractor contract.
The way you are using regex is correct, I would just map your function over the input array to avoid creating mutable maps. Perhaps something like this:
class Lexer {
private def getCharacterType(char: Character): Any = {
val Operator = "([-+*/^%=()])".r
val Digit = "([\\d])".r
//val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => "Operator"
case Digit(c) => Integer.parseInt(c)
case _ => "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
val tokens = inputArray.map(x => x -> getCharacterType(x))
for((key, value) <- tokens)
println(key + ": " + value)
}
}
scala> val l = new Lexer()
l: Lexer = Lexer#60f662bd
scala> l.lex("a-1")
a: Other
-: Operator
1: 1

How to split expression containing brackets correctly

I am trying to write an expression handler that will correctly split brackets, until today it has worked very well, but I've now encountered a problem I hadn't thought of.
I try to split the expression by the content of brackets first, once these are evaluated I replace the original content with the results and process until there are no brackets remaining.
The expression may contain marcos/variables. Macros are denoted by text wrapped in $macro$.
A typical expression:
($exampleA$ * 3) + ($exampleB$ / 2)
Macros are replaced before the expression is evaluated, the above works fine because the process is as follows:
Split expression by brackets, this results in two expressions:
$exampleA$ * 3
$exampleB$ / 2
Each expression is then evaluated, if exampleA = 3 and exampleB = 6:
$exampleA$ * 3 = 3 * 3 = 9
$exampleB$ / 2 = 6 / 2 = 3
The expression is then rebuilt using the results:
9 + 3
The final expression without any brackets is then evaluated to:
12
This works fine until an expressions with nested brackets is used:
((($exampleA$ * 3) + ($exampleB$ / 2) * 2) - 1)
This breaks completely because the regular expression I'm using:
regex("(?<=\\()[^)]*(?=\\))");
Results in:
($exampleA$ * 3
$exampleB$ / 2
So how can I correctly decode this, I want the above to be broken down to:
$exampleA$ * 3
$exampleB$ / 2
I am not exactly sure what you are trying to do. If you want to match the innermost expressions, wouldn't this help?:
regex("(?<=\\()[^()]*(?=\\))");
By the way, are the parentheses in your example unbalanced on purpose?
Traditional regex cannot handle recursive structures like nested brackets.
Depending on which regex flavor you are using, you may be able to use regex recursion. Otherwise, you will probably need a new method for parsing the groups. I think the traditional way is to represent the expression as a stack: start with an empty stack, push when you find a '(', pop when you find a ')'.
You can't really do this with regex. You really need a recursive method, like this:
using System;
using System.Data;
using System.Xml;
public class Program
{
public static void Main() {
Console.WriteLine(EvaluateExpression("(1 + 2) * 7"));
}
public static int EvaluateExpression(string expression) {
// Recursively evaluate parentheses as sub expressions
var expr = expression.ToLower();
while (expr.Contains("(")) {
// Find first opening bracket
var count = 1;
var pStart = expr.IndexOf("(", StringComparison.InvariantCultureIgnoreCase);
var pos = pStart + 1;
// Find matching closing bracket
while (pos < expr.Length && count > 0) {
if (expr.Substring(pos, 1) == "(") count++;
if (expr.Substring(pos, 1) == ")") count--;
pos++;
}
// Error if no matching closing bracket
if (count > 0) throw new InvalidOperationException("Closing parentheses not found.");
// Divide expression into sub expression
var pre = expr.Substring(0, pStart);
var subexpr = expr.Substring(pStart + 1, pos - pStart - 2);
var post = expr.Substring(pos, expr.Length - pos);
// Recursively evaluate the sub expression
expr = string.Format("{0} {1} {2}", pre, EvaluateExpression(subexpr), post);
}
// Replace this line with you're own logic to evaluate 'expr', a sub expression with any brackets removed.
return (int)new DataTable().Compute(expr, null);
}
}
I'm assuming your using C# here... but you should get the idea and be able to translate it into whatever.
If you use the following regex, you can capture them as group(1). group(0) will have parenthesis included.
"\\(((?:\"\\(|\\)\"|[^()])+)\\)"
Hope it helps!

how to elegantly handle rule with multiple components in bison

I use to program in ocaml and use ocalmyacc to generate parser. One very useful feather of ocaml is its variant type like this:
type exp = Number of int
| Addexp of exp*exp
with such a type, I can construct an AST data structure very elegantly in the parser to represent an exp like this:
exp :
number {Number($1)}
| exp1 + exp2 {Addexp($1,$3)}
So if there exist similar mechanism in C++ and bison?
Yes, just match against exp + exp. Notice that for a given rule all its actions must have the same declared %type assigned to $$. In your case it would look something like this:
exp: number { $$ = PrimaryExp($1); }
| exp '+' exp { $$ = AddExp($1, $2); }

Regular Expression to mask given number either from begining or end

I have a scenario where have to mask two number in return from my application based on the configured regular expression patterns. I have following two numbers and need to mask as shown below.
20128569 --> 2012****
40953186 --> ****3186
I need two regular expression patterns to achieve this pattern accordingly using the String.replaceAll(...) or some other possible way.
public static void main(String[] args) {
String value = "20128569";
String pattern = "(?<=.{4}).?" ;
String formattedValue = value.replaceAll(pattern, "*");
System.out.println(formattedValue);
}
Note: I need two regular expression patterns in order to mask number as shown above.
However currently i have resolve this issue temporally through the following code. But it is nice if i can resolve this issue through only regular expression.
String maskedAccountNumber = Pattern.compile(aRegexPattern).matcher(aKey).replaceFirst(MASK_CHARACTER);
StringBuilder maskBuffer = new StringBuilder();
for(int i = 0; i <= aKey.length() - maskedAccountNumber.length() ; i++){
maskBuffer.append(MASK_CHARACTER);
}
return maskedAccountNumber.replace(MASK_CHARACTER, maskBuffer.toString());
Below are the two regulare expressions i used so far:
^.\d{1,3}
.\d{1,3}$
That's fairly easy to do, it's still hard to understand under what circumstances do you want which regex.
Either way, (\d*)\d{4} and replace it with $1****, as seen https://regex101.com/r/uI0zJ6/2
Or \d{4}(\d*) and replace with ****$1 https://regex101.com/r/uI0zJ6/3 .

Hadoop Pig: Extract all substrings matching a given regular expression

I am parsing some data of the form:
(['L123', 'L234', 'L1', 'L253764'])
(['L23', 'L2'])
(['L5'])
...
where the phrases inside the parens, including the brackets, are encoded as a single chararray.
I want to extract just the L+(digits) tags to obtain tuples of the form:
((L123, L234, L1, L253764))
((L23, L2))
((L5))
I have tried using REGEX_EXTRACT_ALL using the regular expression '(L\d+)', but it only seems to extract a single tag per line, which is useless to me. Is there a way to create tuples in the way I have described above?
If order does not matter, then this will work:
-- foo is the tuple, and bar is the name of the chararray
B = FOREACH A GENERATE TOKENIZE(foo.bar, ',') AS values: {T: (value: chararray)} ;
C = FOREACH B {
clean_values = FOREACH values GENERATE
REGEX_EXTRACT(value, '(L[0-9]+)', 1) AS clean_value: chararray ;
GENERATE clean_values ;
}
The schema and output are:
C: {clean_values: {T: (clean_value: chararray)}}
({(L123),(L234),(L1),(L253764)})
({(L23),(L2)})
({(L5)})
Generally, if you don't know how many elements the array will have then a bag will be better.