Antlr4 C++ visit ambiguous branch - c++

So let's say I have a rule like this:
rule : '(' rule ')' | '!' rule '!';
Now in my runtime I have this method:
antlrcpp::Any runtimeVisitor::visitRule(tinycParser::RuleContext *ctx) {
...
}
How can I check whether I am in the fist or in the second case? Might something like this works?
if(ctx->rule(0)) visitRule(ctx->rule(0))

You can label your alternatives like this:
rule
: '(' rule ')' #ParenthesizedRule
| '!' rule '!' #ExclamationMarkRule
| ...
;
Then you can define specific visitor methods for each alternative (i.e. visitParenthesizedRule, visitExclamationMarkRule etc.) instead of visitRule.
If you don't want to add anything to your grammar, you can also just check whether the first child of the rule is an opening parenthesis or an exclamation mark:
if (ctx.children[0].getText() == "(") {
...
} else if (ctx.children[0].getText() == "!") {
...
} else {
...
}

Related

Regex or Wildcard in Kotlin's when statement?

I'm working on a RESTful app in Kotlin and for the router, I'm using a when statement, as it's the most readable and good looking conditional.
Is there a way to use Regex or a wildcard in the when statement for a string?
(So that URIs like "/article/get/" would all be passed to the same controller)
The structure of my router is as follows:
when(uri) {
"some/url" -> return SomeController(config).someAction(session)
}
Yes.
import kotlin.text.regex
val regex1 = Regex( /* pattern */ )
val regex2 = Regex( /* pattern */ )
/* etc */
when {
regex1.matches(uri) -> /* do stuff */
regex2.matches(uri) -> /* do stuff */
/* etc */
}
You could also use containsMatchIn if that suits your needs better than matches.
Explanation:
The test expression of a when statement is optional. If no test expression is included, then the when statement functions like an if-else if chain, where the whenCondition of each whenEntry shall independently evaluate to a boolean.
EDIT:
So I thought about it for awhile, and I came up with a different approach that might be closer to what you want.
import kotlin.text.regex
when (RegexWhenArgument(uri)) {
Regex(/* pattern */) -> /* do stuff */
Regex(/* pattern */) -> /* do stuff */
/* etc */
}
Where RegexWhenArgument is minimally defined as:
class RegexWhenArgument (val whenArgument: CharSequence) {
operator fun equals(whenEntry: Regex) = whenEntry.matches(whenArgument)
override operator fun equals(whenEntry: Any?) = (whenArgument == whenEntry)
}
This approach lets you get as close as possible to the "argument-ful" when expression syntax. I think it's about as streamlined and readable as it can be (assuming that you define the RegexWhenArgument class elsewhere).
This approach uses something similar to the visitor design pattern in combination with Kotlin's operator overloading to redefine what constitutes a "match" between a when expression argument and a whenEntry. If you really wanted to, I suppose you could take this approach a step further and generify RegexWhenArgument into a general-purpose WhenArgument and WhenArgumentDecorator that allows you to specify custom "match" criteria in a when expression for any sort of type, not just Regex.
The typing of the when statement enforces to have compatible types between the whenSubject and the whenEntries. So we cannot compare a String whenSubject with a Regex directly.
We can use when with no subject, then branch conditions may be simply boolean expressions.
fun main() {
val uri: String? = "http://my.site.com/a/b/c"
val res = when {
uri == null -> "NULL"
uri == "http://my.site.com/" -> "ROOT"
uri.startsWith("http://my.site.com/a/") -> "A STUFF"
uri.matches(Regex("http://my.site.com/b/.*")) -> "B STUFF"
else -> "DEFAULT"
}
/* do stuff */
}
Alternatively, we can emulate a kind of when+regex with a dedicated class and few helper functions.
fun main() {
val uri: String? = "http://my.site.com/a/b/c"
val res2 = when(matching(uri)) {
null -> "NULL"
matchesLiteral("http://my.site.com/") -> "ROOT"
matchesRegex("http://my.site.com/a/.*") -> "A STUFF"
else -> "DEFAULT"
}
/* do stuff */
}
class MatchLiteralOrPattern(val value: String, val isPattern: Boolean) {
override fun equals(other: Any?): Boolean {
if (other !is MatchLiteralOrPattern) return false
if (isPattern && !other.isPattern) return Regex(this.value).matches(other.value)
if (!isPattern && other.isPattern) return Regex(other.value).matches(this.value)
return value == other.value
}
}
fun matching(whenSubject: String?) = whenSubject?.let { MatchLiteralOrPattern(it, false) }
fun matchesLiteral(value: String) = MatchLiteralOrPattern(value, false)
fun matchesRegex(value: String) = MatchLiteralOrPattern(value, true)
I tried the following on the kotlin playground and it seems to work as expected.
class WhenArgument (val whenArg: CharSequence) {
override operator fun equals(other: Any?): Boolean {
return when (other) {
is Regex -> other.matches(whenArg)
else -> whenArg.equals(other)
}
}
}
fun what(target: String): String {
return when (WhenArgument(target) as Any) {
Regex("source-.*") -> "${target} is-a-source"
Regex(".*-foo") -> "${target} is-a-foo"
"target-fool" -> "${target} is-the-target-fool"
else -> "nothing"
}
}
fun main() {
println(what("target-foo"))
println(what("source-foo"))
println(what("target-bar"))
println(what("target-fool"))
}
It works around the type compatibility problem by making the 'when' argument of type Any.

Added plus sign before number input in angularjs

I am using this directive to keep user typing only number into input tag.
app.directive('validNumber', function () {
return {
require: '?ngModel',
link: function (scope, element, attrs, ngModelCtrl) {
if (!ngModelCtrl) {
return;
}
ngModelCtrl.$parsers.push(function (val) {
if (angular.isUndefined(val)) {
var val = '';
}
var clean = val.replace(/[^0-9\.]/g, '');
var decimalCheck = clean.split('.');
if (!angular.isUndefined(decimalCheck[0])) {
decimalCheck[0] = decimalCheck[0].slice(0, 10);
if (!angular.isUndefined(decimalCheck[1])) {
clean = decimalCheck[0] + '.' + decimalCheck[1];
}
else {
clean = decimalCheck[0];
}
//console.log(decimalCheck[0][0]);
}
if (!angular.isUndefined(decimalCheck[1])) {
decimalCheck[1] = decimalCheck[1].slice(0, 3);
clean = decimalCheck[0] + '.' + decimalCheck[1];
}
if (val !== clean) {
ngModelCtrl.$setViewValue(clean);
ngModelCtrl.$render();
}
return clean;
});
element.bind('keypress', function (event) {
if (event.keyCode === 32) {
event.preventDefault();
}
});
}
};
});
But now i want to custome this, that means user can type ONLY ONE of "+" or "-" in the first. I think i have to change this pattern of
var clean = val.replace(/[^0-9\.]/g, '');
i also try to change into val.replace(/[^0-9.+-]/g, ''). It works but incorrectly, with this pattern user can type more "+" and "-" in any position of input field. I just wanna keep user typing ONLY ONE of "+" or "-" in the first like "+1234" or "-1234"
This is more of a regex problem than an AngularJS one, so you might have more luck there: https://stackoverflow.com/questions/tagged/regex
I'll try help you though. I think the regex you want matches a single +-, then any number of digits, then optionally a decimal point, then any number of digits. A single regex to match that is:
^[+-]?[0-9]*\.?[0-9]*
Have a read about groups and the '?' operator. This regex allows:
+.
-.
which don't make sense as input. You could design clever regexes to omit those results, but I think it would be easier to check the entry programmatically.
Finally, there are also very likely regexes online to help you solve any regex problem you ever come across more comprehensivley than you could. Just google an english description next time, and check out this for what you want:
http://www.regular-expressions.info/floatingpoint.html

ANTLR4: Code generation for if statement

How do I emit branching code for an "if" statement defined like this in ANTLR4?
statement
: // stuff
| If LPar cond=expression RPar trueBlock=statement (Else falseBlock=statement)? # IfStatement
;
Basically, it's just like in the Java.g4 example I used as a reference (see "statement" and "expression" rules).
The problem is that I can't figure out how to emit branching code for that in a listener and I'm trying to avoid adding any {code} in the grammar file. For example, if I EnterIfStatement, then it's too early to emit branching because the condition code is yet to be generated. And when I ExitIfStatement, it's too late because the whole if block code has already been created. ANTLR4 doesn't create any EnterTrueBlock event or something like that.
I think of a couple of possible workarounds using dictionaries to remember contexts and generate jump instructions when I catch related expressions but it just doesn't feel natural.
Today I learned the visitor pattern fits compilation tasks better.
public override string VisitIfStatement(MylangParser.IfStatementContext context)
{
var hash = context.GetHashCode();
Visit(context.cond);
var elseLabel = "else" + hash;
var endIfLabel = "end_if" + hash;
emitter.Emit(OpCode.Jiz, elseLabel);
if (context.trueBlock != null)
{
Visit(context.trueBlock);
}
emitter.Emit(OpCode.Jmp, endIfLabel);
emitter.Mark(elseLabel);
if (context.falseBlock != null)
{
Visit(context.falseBlock);
}
emitter.Mark(endIfLabel);
return null;
}

Regex for strings in Bibtex

I'm trying to parse Bibtex files using lex/yacc. Strings in the bibtex database can be surrounded by quotes "..." or with braces - {...}
But every entry is also enclosed in braces. How do differentiate between an entry and a string surrounded by braces?
#Book{sweig42,
Author = { Stefan Sweig },
title = { The impossible book },
publisher = { Dead Poet Society},
year = 1942,
month = mar
}
you have various options:
lexer start conditions (from a Lex tutorial)
building on the ideas from greg ward, enhance your lex rules with start conditions ('modes' as they are called in the referenced source).
specifically, you would have the start conditions BASIC ENTRY STRING and the following rules (example taken and slightly enhanced from here):
%START BASIC ENTRY STRING
%%
/* Lexical grammar, mode 1: top-level */
<BASIC>AT # { BEGIN ENTRY; }
<BASIC>NEWLINE \n
<BASIC>COMMENT \%[^\n]*\n
<BASIC>WHITESPACE. [\ \r\t]+
<BASIC>JUNK [^#\n\ \r\t]+
/* Lexical grammar, mode 2: in-entry */
<ENTRY>NEWLINE \n
<ENTRY>COMMENT \%[^\n]*\n
<ENTRY>WHITESPACE [\ \r\t]+
<ENTRY>NUMBER [0-9]+
<ENTRY>NAME [a-z0-9\!\$\&\*\+\-\.\/\:\;\<\>\?\[\]\^\_\`\|]+ { if (stricmp(yytext, "comment")==0) { BEGIN STRING; } }
<ENTRY>LBRACE \{ { if (delim == '\0') { delim='}'; } else { blevel=1; BEGIN STRING; } }
<ENTRY>RBRACE \} { BEGIN BASIC; }
<ENTRY>LPAREN \( { BEGIN STRING; delim=')'; plevel=1; }
<ENTRY>RPAREN \)
<ENTRY>EQUALS =
<ENTRY>HASH \#
<ENTRY>COMMA ,
<ENTRY>QUOTE \" { BEGIN STRING; bleveL=0; plevel=0; }
/* Lexical grammar, mode 3: strings */
<STRING>LBRACE \{ { if (blevel>0) {blevel++;} }
<STRING>RBRACE \} { if (blevel>0) { blevel--; if (blevel == 0) { BEGIN ENTRY; } } }
<STRING>LPAREN \( { if (plevel>0) { plevel++;} }
<STRING>RPAREN \} { if (plevel>0) { plevel--; if (plevel == 0) { BEGIN ENTRY; } } }
<STRING>QUOTE \" { BEGIN ENTRY; }
please note that the rule set is by no means complete but should get you started. more details to be found here.
btparse
These docs explain in a fairly detailed fashion thenintricacies of parsing the bibtex formats and comes with a 'python parser.
biblex
you might also be interested in employing the unix toolchain of biblex and bibparse. these tools generate and parse a bibtex token stream, respectively.
more info can be found here.
best regards, carsten

splitting function attributes

Hi would it be possible to correctly split function attributes using regex ?
i want an expression that splits all attributes in a comma seperated list. But the attributes themselves could be an Array or Object or something that can also contain commas :
ex :
'string1','string2',sum(1,5,8), ["item1","item2","item3"], "string3" , {a:"text",b:"text2"}
this should be split up as :
'string1'
'string2'
sum(1,5,8)
["item1","item2","item3"]
"string3"
{a:"text",b:"text2"}
so the expression should split all commas , but not commas that are surrounded by (), {} or [].
i am trying this in as3 btw
here is some code that will split all the commas (which is ofcourse not what i want) :
var attr:String = "'string1','string2',sum(1,5,8), ['item1','item2','item3'], 'string3' , {a:'text',b:'text2'}";
var result:Array = attr.match(/([^,]+),/g);
trace(attr);
for(var a:int=0;a<result.length;a++){
trace(a,result[a]);
}
here is an expression that allows nested round brackets , but not the others...
/([^,]+\([^\)]+\)|[^,]+),*/g
I've created a little example how to tackle a problem like this, only tested on your input so it might contain horrible mistakes. It only takes into account the parentheses and not the (curly) braces, but those can be easily added.
Basic idea is that you iterate over the characters in the input and add them to the current token if they are not a separator char, and push the current token into the result array when encountering a separator. You have to add a stack that will keep track how 'deep' you are nested to determine of a comma is a separator or part of a token.
For any issue more complicated than this you'll probably be better of using a 'real' parser (and probably a parser-generator), but in this case I think you'll be ok using some custom code.
As you can see parsing code like this quickly becomes quite hard to understand/debug. In a real-case scenario I'd recommend adding more comments, but also a good batch of tests to explain your expected behavior.
package {
import flash.display.Sprite;
public class parser extends Sprite
{
public function parser()
{
var input:String = "'string1','string2',sum(1,5,8), [\"item1\",\"item2\",\"item3\"], \"string3\" , {a:\"text\",b:\"text2\"}"
var result:Array = parseInput(input);
for each (var item:String in result)
{
trace(item);
}
}
// this function only takes into account the '(' and ')' - adding the others is similar.
private function parseInput(input:String):Array
{
var result:Array = [];
trace("parsing: " + input);
var token:String = "";
var parenthesesStack:Array = [];
var currentChar:String;
for (var i:int = 0; i < input.length; i++)
{
currentChar = input.charAt(i)
switch (currentChar)
{
case "(":
parenthesesStack.push("(");
break;
case ")":
if (parenthesesStack.pop() != "(")
{
throw new Error("Parse error at index " + i);
}
break;
case ",":
if (parenthesesStack.length == 0)
{
result.push(token);
token = "";
}
break;
}
// add character to the token if it is not a separating comma
if (currentChar != "," || parenthesesStack.length != 0)
{
token = token + currentChar;
}
}
// add the last token
if (token != "")
{
result.push(token);
}
return result;
}
}
}