Antlr grammar unpredicted behavior - regex

I've begun experimenting with ANTLR3 today. There seems to be a discrepency in the expressions that I use.
I want my class name to start with a capital letter, followed by mixed case letters and numbers. For instance, Car is valid, 8Car is invalid.
CLASS_NAME : ('A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')*;
This works fine when I test it individually. However when I use it in the following rule,
model
: '~model' CLASS_NAME model_block
;
However, the CLASS_NAME begins to pick up class names beginning with numbers as well. In this case, ANTLR picks up Car, 8Car or even #Car as valid tokens. I'm missing something silly. Any pointers would be appreciated. Thanks.

CLASS_NAME will not match 8Car or #Car. You're probably using ANTLRWorks' interpreter (or the Eclipse plugin, which uses the same interpreter), which is printing errors on a UI tab you're not aware of, and displaying the incorrect chars in the tokens. Use ANTLRWorks' debugger instead, or write a small test class yourself:
T.g
grammar T;
parse : CLASS_NAME EOF;
CLASS_NAME : ('A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')*;
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("8Car"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}

Related

Regex for finding the name of a method containing a string

I've got a Node module file containing about 100 exported methods, which looks something like this:
exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};
Goal: What I'd like to do is figure out how to grab the name of any method which contains a call to fooMethod, and return the correct method names: methodTwo and methodThree. I wrote a regex which gets kinda close:
exports\.(\w+).*(\n.*?){1,}fooMethod
Problem: using my example code from above, though, it would effectively match methodOne and methodThree because it finds the first instance of export and then the first instance of fooMethod and goes on from there. Here's a regex101 example.
I suspect I could make use of lookaheads or lookbehinds, but I have little experience with those parts of regex, so any guidance would be much appreciated!
Edit: Turns out regex is poorly-suited for this type of task. #ctcherry advised using a parser, and using that as a springboard, I was able to learn about Abstract Syntax Trees (ASTs) and the recast tool which lets you traverse the tree after using various tools (acorn and others) to parse your code into tree form.
With these tools in hand, I successfully built a script to parse and traverse my node app's files, and was able to find all methods containing fooMethod as intended.
Regex isn't the best tool to tackle all the parts of this problem, ideally we could rely on something higher level, a parser.
One way to do this is to let the javascript parse itself during load and execution. If your node module doesn't include anything that would execute on its own (or at least anything that would conflict with the below), you can put this at the bottom of your module, and then run the module with node mod.js.
console.log(Object.keys(exports).filter(fn => exports[fn].toString().includes("fooMethod(")));
(In the comments below it is revealed that the above isn't possible.)
Another option would be to use a library like https://github.com/acornjs/acorn (there are other options) to write some other javascript that parses your original target javascript, then you would have a tree structure you could use to perform your matching and eventually return the function names you are after. I'm not an expert in that library so unfortunately I don't have sample code for you.
This regex matches (only) the method names that contain a call to fooMethod();
(?<=exports\.)\w+(?=[^{]+\{[^}]+fooMethod\(\)[^}]+};)
See live demo.
Assuming that all methods have their body enclosed within { and }, I would make an approach to get to the final regex like this:
First, find a regex to get the individual methods. This can be done using this regex:
exports\.(\w+)(\s|.)*?\{(\s|.)*?\}
Next, we are interested in those methods that have fooMethod in them before they close. So, look for } or fooMethod.*}, in that order. So, let us name the group searching for fooMethod as FOO and the name of the method calling it as METH. When we iterate the matches, if group FOO is present in a match, we will use the corresponding METH group, else we will reject it.
exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})
Explanation:
exports\.(?<METH>\w+): Till the method name (you have already covered this)
(\s|.)*?\{(\s|.)*?: Some code before { and after, non-greedy so that the subsequent group is given preference
(\}|(?<FOO>fooMethod)(\s|.)*?\}): This has 2 parts:
\}: Match the method close delimiter, OR
(?<FOO>fooMethod)(\s|.)*?\}): The call to fooMethod followed by optional code and method close delimiter.
Here's a JavaScript code that demostrates this:
let p = /exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})/g
let input = `exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};';`
let match = p.exec( input );
while( match !== null) {
if( match.groups.FOO !== undefined ) console.log( match.groups.METH );
match = p.exec( input )
}

How to find and replace SkippedTokensTrivia using Roslyn

I'm trying to fix the following VBA statement (converting some old code just for fun and to learn Roslyn, not at all looking for anything perfect) to remove the Set keyword so it's a valid VB.NET statement:
Set f = New Foo()
When I look at it through the Syntax Visualizer, I see it turns into trailing trivia.
I'm trying to figure out how to find it using a query. I tried several approaches but all of the following came up empty:
var attempt1 = root.DescendantTokens().Where(t=>t.IsKind(SyntaxKind.SkippedTokensTrivia));
var attempt2 = root.DescendantTokens().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt3 = root.DescendantTrivia().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt4 = root.DescendantNodes()
.OfType<EmptyStatementSyntax>()
.Where(e => e.DescendantTokens().Any(t => t.IsKeyword()));
(Yes, I'm using C# to work with a VisualBasicSyntaxTree)
I can't seem to just find the SetKeyword token that appears in the visualizer, so I thought maybe it's doing some more heavy lifting to piece together what it really is (is that what's meant by structured trivia?). I read something in the documentation that mentioned the compiler can choose to represent it a couple of different ways, so I thought that may be what's going on here.
The query was just the first thing I tried, but in reality I have a SyntaxRewriter I'm using to visit the code to find and fix all such problems (I'm already able to fix missing parentheses around ArgumentLists, for example) but in this case I can't seem to figure out which Visit method to override.
So again, 1) how to query for these from the root and 2) the best override to select from a rewriter. I've been beating my face on the keyboard for two days on this which exponentially increases the likelihood that I'm having a cranio/recto-insertion moment and I need one of you kind souls to pull me out of it.
Cheers!
Brian
Edit: Fixed typo in query attempt1
So it appears that when the compiler reaches an error condition, it will skip all tokens up to the next point where it can recover and continue parsing (the end of the line in this case). The node representing this error condition is an EmptyStatement with trailing syntax trivia containing the rest of the text as parsed tokens.
So if you're going to rewrite a node, you'll want to rewrite EmptyStatements. But you don't want to write just any empty statement, just the ones with the "BC30807" diagnostic code.
public override SyntaxNode VisitEmptyStatement(EmptyStatementSyntax node)
{
var diagnostic = GetLetSetDiagnostic(node);
if (diagnostic == null)
return base.VisitEmptyStatement(node);
return RewriteLetSetStatement(node);
}
private Diagnostic GetLetSetDiagnostic(EmptyStatementSyntax node)
{
//'Let' and 'Set' assignment statements are no longer supported.
const string code = "BC30807";
return node.GetDiagnostics().SingleOrDefault(n => n.Id == code);
}
The implementation of the RewriteLetSetStatement() method is a bit of a mystery to me, I'm not sure how it can be implemented utilizing the compiler services effectively, I don't think that this is a use case that it covers well. The trivia retains the parsed tokens, but there's not much you can do with those tokens AFAIK.
Ideally, we'd just want to ignore the Set token from the tokens and throw it back into the parser to be reparsed. And as far as I can tell, that's not possible, we can only parse from text.
So, I guess the next best thing to do would be to take the text, rewrite it to remove the Set and parse the text again.
private SyntaxNode RewriteLetSetStatement(EmptyStatementSyntax node)
{
var letSetTokens = node.GetTrailingTrivia()
.Where(triv => triv.IsKind(SyntaxKind.SkippedTokensTrivia))
.SelectMany(triv => triv.GetStructure().ChildTokens())
.TakeWhile(tok => new[] {SyntaxKind.LetKeyword, SyntaxKind.SetKeyword}
.Contains(tok.VisualBasicKind()));
var span = new RelativeTextSpan(node.FullSpan);
var newText = node.GetText().WithChanges(
// replacement spans must be relative to the text
letSetTokens.Select(tok => new TextChange(span.GetSpan(tok.Span), ""))
);
return SyntaxFactory.ParseExecutableStatement(newText.ToString());
}
private class RelativeTextSpan(private TextSpan span)
{
public TextSpan GetSpan(TextSpan token)
{
return new TextSpan(token.Start - span.Start, token.Length);
}
}

find class name using regex.match in vb.net

I'm devaloping a very simple Java compiler in Visual Basic. I want to parse the class name in a Java code which I paste into a textbox of my VB program. For example:
class MyPro { // in this case i need to get "MyPro"
I used Regex.Match for this but I failed. Below is the code I tried:
Dim regex As Regex = New Regex("(class)*{")
Dim match As Match = regex.Match("class mypro{")
If match.Success Then
Console.WriteLine(match.ToString)
End If
Edit:
Sometimes the source code looks like:
class Football extends Sports{
In this case I want to get Football.
And sometimes it is:
class Dog implements ISpeak{
In this case I want to get Dog.
Sometimes the classes both also implement and extend, like this:
class hello implements Serializable extends Object {
There are 4 patterns:
class MyPro { //I want to get "MyPro"
class Football extends Sports{ //"Football"
class Dog implements ISpeak{ //"Dog"
class hello implements Serializable extends Object { //"hello"
You could use the following regular expression:
class\s+([^\s]+)[\s\r\n{]
The value of the class will be captured in the group with index = 1. In C#, it would be: match.Groups[1].Value

Configuring C# out parameters with Foq in F#

I am using F# and Foq to write unit tests for a C# project.
I am trying to set up a mock of an interface whose method has an out parameter, and I have no idea how to even start. It probably has to do with code quotations, but that's where my understanding ends.
The interface is this:
public interface IGetTypeNameString
{
bool For(Type type, out string typeName);
}
In C# Foq usage for the interface looks like this:
[Fact]
public void Foq_Out()
{
// Arrange
var name = "result";
var instance = new Mock<IGetTypeNameString>()
.Setup(x => x.For(It.IsAny<Type>(), out name))
.Returns(true)
.Create();
// Act
string resultName;
var result = instance.For(typeof(string), out resultName);
// Assert
Assert.True(result);
Assert.Equal("result", resultName);
}
As for how to achieve that with F#, I am completely lost. I tried something along the lines of
let name = "result"
let instance = Mock<IGetTypeNameString>().Setup(<# x.For(It.IsAny<Type>(), name) #>).Returns(true).Create();
which results in the quotation expression being underlined with an error message of
This expression was expected to have type IGetTypeNameString -> Quotations.Expr<'a> but here has type Quotations.Expr<'b>
Without any indication what types a and b are supposed to be, I have no clue how to correct this.
:?>
(It gets even wilder when I use open Foq.Linq; then the Error List window starts telling me about possible overloads with stuff like Action<'TAbstract> -> ActionBuilder<'TAbstract>, and I get even loster....)
Any assistance or explanation greatly appreciated!
Edit:
So, as stated here, byref/out parameters can not be used in code quotations. Can this be set up at all then in F#?
Foq supports setting up of C# out parameters from C# using the Foq.Linq namespace.
The IGetTypeNameString interface can be easily setup in F# via an object expression:
let mock =
{ new IGetTypeNameString with
member __.For(t,name) =
name <- "Name"
true
}
For declarations that have no analog in F#, like C#'s protected members and out parameters, you can also use the SetupByName overload, i.e.:
let mock =
Mock<IGetTypeNameString>()
.SetupByName("For").Returns(true)
.Create()
let success, _ = mock.For(typeof<int>)

Load 2 different input models in Acceleo

I'd like to load 2 different input models (a .bpel and a .wsdl) in my main template of Acceleo.
I loaded the ecore metamodels for both bpel and wsdl and I'd like to be able to use something like this:
[comment encoding = UTF-8 /]
[module generate('http:///org/eclipse/bpel/model/bpel.ecore','http://www.eclipse.org/wsdl/2003/WSDL')/]
[import org::eclipse::acceleo::module::sample::files::processJavaFile /]
[template public generate(aProcess : Process, aDefinition : Definition)]
[comment #main /]
Process Name : [aProcess.name/]
Def Location : [aDefinition.location/]
[/template]
but when I run the acceleo template I get this error:
An internal error occurred during: "Launching Generate".
Could not find public template generate in module generate.
I think I have to modify the java launcher (generate.java) because right now it can't take 2 models as arguments. Do you know how?
Thanks!
** EDIT from Kellindil suggestions:
Just to know if I understood it right, before I get to modify stuff:
I'm trying to modify the Generate() constructor.
I changed it in:
//MODIFIED CODE
public Generate(URI modelURI, URI modelURI2, File targetFolder,
List<? extends Object> arguments) {
initialize(modelURI, targetFolder, arguments);
}
In the generic case, I can see it calls the AbstractAcceleoGenerator.initialize(URI, File, List>?>), shall I call it twice, once per each model? like:
initialize(modelURI, targetFolder, arguments);
initialize(modelURI2, targetFolder, arguments);
Then, to mimic in my Generate() constructor the code that is in the super-implementation:
//NON MODIFIED ACCELEO CODE
Map<String, String> AbstractAcceleoLauncher.generate(Monitor monitor) {
File target = getTargetFolder();
if (!target.exists() && !target.mkdirs()) {
throw new IOException("target directory " + target + " couldn't be created."); //$NON-NLS-1$ //$NON-NLS-2$
}
AcceleoService service = createAcceleoService();
String[] templateNames = getTemplateNames();
Map<String, String> result = new HashMap<String, String>();
for (int i = 0; i < templateNames.length; i++) {
result.putAll(service.doGenerate(getModule(), templateNames[i], getModel(), getArguments(),
target, monitor));
}
postGenerate(getModule().eResource().getResourceSet());
originalResources.clear();
return result;
}
what shall I do? Shall I try to mimic what this method is doing in my Generate() constructor after the initialize() calls?
What you wish to do is indeed possible with Acceleo, but it is not the "default" case that the generated launcher expects.
You'll have to mark the "generate" method of the generated java class as "#generated NOT" (or remove the "#generated" annotation from its javadoc altogether). In this method, what you need to do is mimic the behavior of the super-implementation (in AbstractAcceleoLauncher) does, loading two models instead of one and passing them on to AcceleoService#doGenerate.
In other words, you will need to look at the API Acceleo provides to generate code, and use it in the way that fits your need. Our generated java launcher and the AcceleoService class are there to provide an example that fits the general use case. Changing the behavior can be done by following these samples.
You should'nt need to modify the Generate.java class. By default, it should allow you to perform the code generation.
You need to create a launch config and provide the right arguments (process and definition) in this launch config, that's all.
I don't understand the 'client.xmi' URI that is the 1st argument of your module. It looks like it is your model file, if so remove it from the arguments, which must only contain your metamodels URIs.