ANTLR - Keep a block unchanged - c++

I am a beginner with ANTLR, and I need to modify an existing - and complex - grammar.
I want to create a rule to keep a block without parsing with other rules.
To be more clear, I need to insert a code wrote in c++ into the interpreted code.
Edit 11/02/2013
After many tests, here is my grammar, my test, the result I have, and the result and want:
Grammar
cppLiteral
: cppBegin cppInnerTerm cppEnd
;
cppBegin
: '//$CPP_IN$'
;
cppEnd
: '//$CPP_OUT$'
;
cppInnerTerm
: ( ~('//$CPP_OUT$') )*
;
Test
//$CPP_IN$
txt1 txt2
//$CPP_OUT$
Result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1' 'txt2'
cppEnd = '//$CPP_OUT$'
Expected result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1 txt2'
cppEnd = '//$CPP_OUT$'
(Sorry, I can't post the image of the AST because I don't have 10 reputations)
The three tokens "cppBegin", "cppInnerTerm" and "cppEnd" can be in one token, like this:
cppLiteral
: '//$CPP_IN$'( ~('//$CPP_OUT$') )*'//$CPP_OUT$'
;
to have this result:
cppLiteral = '//$CPP_IN$\n txt1 txt2\n //$CPP_OUT$'

I want to create a rule to keep a block without parsing with other rules.
Parse it like a multiline comment, e.g. /* foobar */. Below is a small example using the keywords specified in your question.
Note that most of the work is done with lexer rules (those that start with a capital letter). Any time you want to deal with blocks of text, particularly if you want to avoid other rules as in this case, you're probably thinking in terms of lexer rules rather than parser rules.
CppBlock.g
grammar CppBlock;
document: CPP_LITERAL* EOF;
fragment CPP_IN:'//$CPP_IN$';
fragment CPP_OUT:'//$CPP_OUT$';
CPP_LITERAL: CPP_IN .* CPP_OUT
{
String t = getText();
t = t.substring(10, t.length() - 11); //10 = length of CPP_IN, 11 = length of CPP_OUT
setText(t);
}
;
WS: (' '|'\t'|'\f'|'\r'|'\n')+ {skip();};
Here is a simple test case:
Input
//$CPP_IN$
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
//$CPP_OUT$
//$CPP_IN$
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
//$CPP_OUT$
Output Tokens
[CPP_LITERAL :
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
]
[CPP_LITERAL :
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
]
Rule CPP_LITERAL preserves newlines at the beginning and end of the input (after //$CPP_IN$ and before //$CPP_OUT$). If you don't want those, just update the action to strip them out. Otherwise, I think this grammar does what you're asking for.

Related

How to count assignment operators in a text file?

My task is to create a program in C ++ that processes a text file in sequential mode. The data must be read from the file one line at a time. Do not back up the entire contents of the file to RAM. The text file contains syntactically correct C++ code and I have to count how many assignment operators are there.
The only thing I could think of was making a function that searches for patterns and then counts how many times they appear. I insert every assignment operator as a pattern and then sum all the counts together. But this does not work because if I insert the pattern "=" many operators such as "%=" or "+=" also get counted in. And even operators like "!=" or "==" get counted, but they shouldn't because they are comparison operators.
My code gives the answer 7 but the real answer should be 5.
#include <iostream>
#include <fstream>
using namespace std;
int patternCounting(string pattern, string text){
int x = pattern.size();
int y = text.size();
int rez = 0;
for(int i=0; i<=y-x; i++){
int j;
for(j=0; j<x; j++)
if(text[i+j] !=pattern[j]) break;
if(j==x) rez++;
}
return rez;
}
int main()
{
fstream file ("test.txt", ios::in);
string rinda;
int skaits=0;
if(!file){cout<<"Nav faila!"<<endl; return 47;}
while(file.good()){
getline(file, rinda);
skaits+=patternCounting("=",rinda);
skaits+=patternCounting("+=",rinda);
skaits+=patternCounting("*=",rinda);
skaits+=patternCounting("-=",rinda);
skaits+=patternCounting("/=",rinda);
skaits+=patternCounting("%=",rinda);
}
cout<<skaits<<endl;
return 0;
}
Contents of the text file:
#include <iostream>
using namespace std;
int main()
{
int z=3;
int x=4;
for(int i=3; i<3; i++){
int f+=x;
float g%=3;
}
}
Note that as a torture test, the following code has 0 assignments on older C++ standards and one on newer ones, due to the abolition of trigraphs.
// = Torture test
int a = 0; int b = 1;
int main()
{
// The next line is part of this comment until C++17 ??/
a = b;
struct S
{
virtual void foo() = 0;
void foo(int, int x = 1);
S& operator=(const S&) = delete;
int m = '==';
char c = '=';
};
const char* s = [=]{return "=";}();
sizeof(a = b);
decltype(a = b) c(a);
}
There are multiple issues with the code.
The first, rather mundane issue, is your handling of file reading. A loop such as while (file.good()) … is virtually always an error: you need to test the return value of getline instead!
std::string line;
while (getline(file, line)) {
// Process `line` here.
}
Next, your patternCounting function fundamentally won’t work since it doesn’t account for comments and strings (nor any of C++’s other peculiarities, but these seem to be out of scope for your assignment). It also doesn’t really make sense to count different assignment operators separately.
The third issue is that your test case misses lots of edge cases (and is invalid C++). Here’s a better test case that (I think) exercises all interesting edge cases from your assignment:
int main()
{
int z=3; // 1
int x=4; // 2
// comment with = in it
"string with = in it";
float f = 3; // 3
f = f /= 4; // 4, 5
for (int i=3; i != 3; i++) { // 6
int f=x += z; // 7, 8
bool g=3 == 4; // 9
}
}
I’ve annotated each line with a comment indicating up to how many occurrences we should have counted by now.
Now that we have a test case, we can start implementing the actual counting logic. Note that, for readability, function names generally follow the pattern “verb subject”. So instead of patternCounting a better function name would be countPattern. But we won’t count arbitrary patterns, we will count assignments. So I’ll use countAssignments (or, using my preferred C++ naming convention: count_assignments).
Now, what does this function need to do?
It needs to count assignments (incl. initialisations), duh.
It needs to discount occurrences of = that are not assignments:
inside strings
inside comments
inside comparison operators
Without a dedicated C++ parser, that’s a rather tall order! You will need to implement a rudimentary lexical analyser (short: lexer) for C++.
First off, you will need to represent each of the situations we care about with its own state:
enum class state {
start,
comment,
string,
comparison
};
With this in hand, we can start writing the outline of the count_assignments function:
int count_assignments(std::string const& str) {
auto count = 0;
auto state = state::start;
auto prev_char = '\0';
for (auto c : str) {
switch (state) {
case state::start:
break;
case state::comment:
break;
case state::string:
break;
case state::comparison:
break;
}
prev_char = c;
}
// Useful for debugging:
// std::cerr << count << "\t" << str << "\n";
return count;
}
As you can see, we iterate over the characters of the string (for (c : str)). Next, we handle each state we could be currently in.
The prev_char is necessary because some of our lexical tokens are more than one character in length (e.g. comments start by //, but /= is an assignment that we want to count!). This is a bit of a hack — for a real lexer I would split such cases into distinct states.
So much for the function skeleton. Now we need to implement the actual logic — i.e. we need to decide what to do depending on the current (and previous) character and the current state.
To get you started, here’s the case state::start:
switch (c) {
case '=':
++count;
state = state::comparison;
break;
case '<': case '>': case '!':
state = state::comparison;
break;
case '"' :
state = state::string;
break;
case '/' :
if (prev_char == '/') {
state = state::comment;
}
break;
}
Be very careful: the above will over-count the comparison ==, so we will need to adjust that count once we’re inside case state::comparison and see that the current and previous character are both =.
I’ll let you take a stab at the rest of the implementation.
Note that, unlike your initial attempt, this implementation doesn’t distinguish the separate assignment operators (=, +=, etc.) because there’s no need to do so: they’re all counted automatically.
The clang compiler has a feature to dump the syntax tree (also called AST). If you have syntactically correct C++ code (which you don't have), you can count the number of assignment operators for example with the following command line (on a unixoid OS):
clang++ -Xclang -ast-dump -c my_cpp_file.cpp | egrep "BinaryOperator.*'='" | wc -l
Note however that this will only match real assigments, not copy initializations, which also can use the = character, but are something syntactically different (for example an overloaded = operator is not called in that case).
If you want to count the compound assignments and/or the copy initializations as well, you can try to look for the corresponding lines in the output AST and add them to the egrep search pattern.
In practice, your task is incredibly difficult.
Think for example of C++ raw string literals (you could have one spanning dozen of source lines, with arbitrary = inside them). Or of asm statements doing some addition....
Think also of increment operators like (for some declared int x;) a x++ (which is equivalent to x = x+1; for a simple variable, and semantically is an assignment operator - but not syntactically).
My suggestion: choose one open source C++ compiler. I happen to know GCC internals.
With GCC, you can write your own GCC plugin which would count the number of Gimple assignments.
Think also of Quine programs coded in C++...
NB: budget months of work.

Automatically modify C++ Code: Transform code from a parse tree back to source code

i'm trying to modify C++ code. I get a piece of code and line numbers and i need to apply code at the given line numbers.
Like this:
1 void foo(){
2 int a = 5;
3 int b = 10;
4 }
And the lines numbers: 2,3. Now i want to automatically insert Code after the lines numbers:
1 void foo(){
2 int a = 5;
3 newcode();
4 int b = 10;
5 newcode();
6 }
In another Thread people said antlr is a good way for this. So i tried using the antlr runtime api. Its easy to generate a parse Tree. I also found ways to modify it. But now i dont know how to get the source code back from the parse tree?
I dont really need the source code, it would also be enough to just compile the parse tree to an executable program. How can i do this?
Is there maybe an easier way to solve my problem? Maybe just read the code, count the \n and after 2 and 3 \n i put the my code?
Edit:
For my bachelor thesis, i get a piece of parallel code and i need to force it to execute a given interleaving. Therefore i have the job to write a tool to automatically insert instructions like "EnterCriticalSection(...)" and "LeaveCriticalSection(...)" at given lines in the code. Now, i got another job, to rename the main function and insert my own main function. I think this won't work with counting lines.
A possible solution could be to use the generated parse tree for token positions (each TerminalNode has a Token instance attached with the information where it is located in the original source code). With that at hand you can start copying the unmodified text from the original source stream and then insert your own text, which belongs at this position. After that copy the next unmodifed code part and then insert your next modification. Do this in a loop until you reach EOF.
This scenario doesn't care for the final formatting, but I think that's probably not relevant - your tasks sounds like you are doing instrumentation of code for some measurements.
Here's code I use to retrieve the original source code given two parse tree nodes:
std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(start)->start;
Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(stop)->stop;
return sourceTextForRange(startToken, stopToken, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
CharStream *cs = start->getTokenSource()->getInputStream();
size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
if (keepQuotes || result.size() < 2)
return result;
char quoteChar = result[0];
if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
if (quoteChar == '"' || quoteChar == '\'') {
// Replace any double occurence of the quote char by a single one.
replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
}
return result.substr(1, result.size() - 2);
}
return result;
}
This code has been taken from the MySQL Workbench parser module.

Preprocessor directive to simulate a notation

In C++, is there a methodology by means of the preprocessor to substitute for a variable name followed by an index number by: its name followed by '[' followed by the index number, followed by ']'?
As an example, if I write:
int main(void)
{
int var[64];
var0 = 0;
var1 = 1;
var2 = 2; // etc...
return 0;
}
Translation:
int main(void)
{
int var[64];
var[0] = 0;
var[1] = 1;
var[2] = 2; // etc...
return 0;
}
#define is going to be the closest you can get.
#define var0 var[0]
#define var1 var[1]
...
However, it should be noted if you're going to use the above, you may as well just do it manually in the first place, since you're typing everything out anyway.
No. Thanks to language creators! There is no way to make syntax analysis (string parsing) inside macro definition.
It would be the real hell to read such programs.

passing a substring to a function

I am working on building the LISP interpreter. The problem I am stuck at is where I need to send the entire substring to a function as soon as I encounter a "(".
For example, if I have,
( begin ( set x 2 ) (set y 3 ) )
then I need to pass
begin ( set x 2 ) (set y 3 ) )
and when I encounter "(" again
I need to pass
set x 2 ) (set y 3 ) )
then
set y 3 ) )
I tried doing so with substr by calculating length, but that didn't quite work. If anyone could help, that'd be great.
Requested code
int a=0;
listnode *makelist(string t) //t is the substring
{
//some code
istringstream iss(t);
string word;
while(iss>>word){
if(word=="(")//I used strcmp here. Just for the sake for time saving I wrote this
//some operations
int x=word.size();
a=a+x;
word=word.substr(a);
p->down=makelist(word);//function called again and word here should be the substring
}}
Have you thought of using an intermediate representation? So first parse all whole string to a data structure and then execute it? After all Lisps have had traditionally applicative order which means they evaluate the arguments first before calling the function. The data structure could look something along the lines of a struct which has the first part of the string (ie begin or set in your example) and the rest of the string to process in as a second property (head and rest if you want). Also consider that Trees are more easily constructed through recursion than through iteration, the base case here being reaching the ')' character.
If you are interested in Lisp interpreters and compilers you should checkout Lisp in Small Pieces, well worth the price.
I would have thought soemthing like this:
string str = "( begin ( set x 2 ) (set y 3 ) )";
func(str);
...
void func(string s)
{
int i = 0;
while(s.size() > i)
{
if (s[i] == '(')
{
func(s.substr(i));
}
i++;
}
}
would do the job. [Obviously, you'll perhaps want to do something else in there too!]
Normally, lisp parsing is done by recursively calling a reader and let the reader "consume" as much data as is necessary. If you're doing this on strings, it may be handy to pass the same string around, by reference, and return a tuple of "this is what I read" and "this is where I finished reading".
So something like this (obviously, in actual code, you may want to pass pointers to offset rather than have a pair-structure and needing to deal with memory-management of that, I elided that to make the code more readable):
struct readthing {
Node *data;
int offset
}
struct readthing *read (char *str, int offset) {
if (str[offset] == '(')
return read_delimited(str, offset+1, ')'); /* Read a list, consumer the start */
...
}
struct readthing *read_delimited (char *str, int offset, char terminator) {
Node *list = NULL;
offset = skip_to_next_token(str, offset);
while (str[offset] != terminator) {
struct readthing *foo = read(str, offset);
offset = foo->offset;
list = do_cons(foo->data, list);
}
return make_readthing(do_reverse(list), offset+1);
}

how to auto generate eclipse-indigo method block comment?

i want to generate block comment using eclipse-Indigo like this. I'm C++ programmer.
/**
*
* #param bar
* #return
*/
int foo(int bar);
how can i do like this.
IF your input is pretty much static, you can write a simplified lexer that will work, requires simple string mungeing. string has lots of nice editing capabilities in it with .substr() and .find() in it. all you have to do is figure out where the perens are. you know you can optionally process this as a stringstream, which makes this FAR easier (don't forget to use std::skipws to skip whitespace.
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
#include <vector>
#include <string>
typedef STRUCT arg_s {
string sVarArgDataType, sVarArg;
} arg_s ARG;
ARG a;
vector<ARG> va;
char line[65000];
filein.getline(line, 65000);
line[65000-1]='\0'; //force null termination if it hasn't happened
get line and store in string sline0
size_t firstSpacePos=sline.find(' ');
size_t nextSpacePos = sline.find(' ',firstSpacePos+1);
size_t prevCommaPos = string::npos;
size_t nextCommaPos = sline.find(',');
size_t openPerenPos=sline.find('(');
size_t closePerenPos=sline.find(");");
string sReturnDataType, sFuncName;
if (
string::npos==firstSpacePos||
string::npos==semicolonPos||
string::npos==openPerenPos||
string::npos==closePerenPos) {
return false; //failure
}
while (string::npos != nextSpacePos) {
if (string::npos != nextCommaPos) {
//found another comma, a next argument. use next comma as a string terminator and prevCommaPos as an arg beginning.
//assume all keywords are globs of text
a.sVarArgDataType=sline.substr(prevCommaPos+1,nextSpacePos-(prevCommaPos+1));
a.sVarArg=sline.substr(nextSpacePos+1,nextCommaPos-(nextSpacePos+1));
} else {
//didn't find another comma. use ) as a string terminator and prevCommaPos as an arg beginning.
//assume all keywords are globs of text
a.sVarArgDataType=sline.substr(prevCommaPos+1,nextSpacePos-(prevCommaPos+1));
a.sVarArg=sline.substr(nextSpacePos+1,closePerenPos-(nextSpacePos+1));
}
va.push_back(a); //add structure to list
//move indices to next argument
nextCommaPos = sline.find(',', secondSpacePos+1);
nextSpacePos = sline.find(' ', secondSpacePos+1);
}
int i;
fileout<<"/**
*
";
for (i=0; i < va.size(); i++) {
fileout<<" * #param "<<va[i].sVarArg;
}
fileout<<"
* #return
*/
"<<sReturnDataType<<" "<<sFuncName<<'(';
for (i=0; i < va.size(); i++) {
fileout<<va[i].sArgDataType<<" "<<va[i].sVarArg;
if (i != va.size()-1) {
fileout<<", "; //don;t show a comma-space for the last item
}
}
fileout<<");"<<std::endl;
this will handle any number of arguments EXCEPT ... the variable argument type. but you can put in your own detection code for that and the if statement that switches out between ... and the 2-keyword argument types. here I am only supporting 2 keywords in my struct. you can support more by using a while to search for all the spaces before the next , comma or ) right peren in inside the while loop add your variable number of strings to a vector<string> inside the struct you are going to replace - nah, just make a vector<vector<string> >. or, just one vector and do a va.clear() after every function is done.
I just noticed the eclipse tag. I don't know much about eclipse. I can't even get it to work. some program.