antlr visitor: lookup of reserved words efficiently

antlr visitor: lookup of reserved words efficiently - c++

I'm learning Antlr. At this point, I'm writing a little stack-based language as part of my learning process -- think PostScript or Forth. An RPN language. For instance:
10 20 mul
This would push 10 and 20 on the stack and then perform a multiply, which pops two values, multiplies them, and pushes 200. I'm using the visitor pattern. And I find myself writing some code that's kind of insane. There has to be a better way.
Here's a section of my WaveParser.g4 file:
any_operator:
value_operator |
stack_operator |
logic_operator |
math_operator |
flow_control_operator;
value_operator:
BIND | DEF
;
stack_operator:
DUP |
EXCH |
POP |
COPY |
ROLL |
INDEX |
CLEAR |
COUNT
;
BIND is just the bind keyword, etc. So my visitor has this method:
antlrcpp::Any WaveVisitor::visitAny_operator(Parser::Any_operatorContext *ctx);
And now here's where I'm getting to the very ugly code I'm writing, which leads to the question.
Value::Operator op = Value::Operator::NO_OP;
WaveParser::Value_operatorContext * valueOp = ctx->value_operator();
WaveParser::Stack_operatorContext * stackOp = ctx->stack_operator();
WaveParser::Logic_operatorContext * logicOp = ctx->logic_operator();
WaveParser::Math_operatorContext * mathOp = ctx->math_operator();
WaveParser::Flow_control_operatorContext * flowOp = ctx->flow_control_operator();
if (valueOp) {
if (valueOp->BIND()) {
op = Value::Operator::BIND;
}
else if (valueOp->DEF()) {
op = Value::Operator::DEF;
}
}
else if (stackOp) {
if (stackOp->DUP()) {
op = Value::Operator::DUP;
}
...
}
...
I'm supporting approximately 50 operators, and it's insane that I'm going to have this series of if statements to figure out which operator this is. There must be a better way to do this. I couldn't find a field on the context that mapped to something I could use in a hashmap table.
I don't know if I should make every one of my operators have a separate rule, and use the corresponding method in my visitor, or if what else I'm missing.
Is there a better way?

With ANTLR, it's usually very helpful to label components of your rules, as well as the high level alternatives.
If part of a parser rule can only be one thing with a single type, usually the default accessors are just fine. But if you have several alternatives that are essentially alternatives for the "same thing", or perhaps you have the same sub-rule reference in a parser rule more than one time and want to differentiate them, it's pretty handy to give them names. (Once you start doing this and see the impact to the Context classes, it'll become pretty obvious where they provide value.)
Also, when rules have multiple top-level alternatives, it's very handy to give each of them a label. This will cause ANTLR to generate a separate Context class for each alternative, instead of dumping everything from every alternative into a single class.
(making some stuff up just to get a valid compile)
grammar WaveParser
;
any_operator
: value_operator # val_op
| stack_operator # stack_op
| logic_operator # logic_op
| math_operator # math_op
| flow_control_operator # flow_op
;
value_operator: op = ( BIND | DEF);
stack_operator
: op = (
DUP
| EXCH
| POP
| COPY
| ROLL
| INDEX
| CLEAR
| COUNT
)
;
logic_operator: op = (AND | OR);
math_operator: op = (ADD | SUB);
flow_control_operator: op = (FLOW1 | FLOW2);
AND: 'and';
OR: 'or';
ADD: '+';
SUB: '-';
FLOW1: '>>';
FLOW2: '<<';
BIND: 'bind';
DEF: 'def';
DUP: 'dup';
EXCH: 'exch';
POP: 'pop';
COPY: 'copy';
ROLL: 'roll';
INDEX: 'index';
CLEAR: 'clear';
COUNT: 'count';

Related

rust how to collapse if let - clippy suggestion

I run cargo clippy to get some feedback on my code and clippy told me that I can somehow collapse a if let.
Here is the exact "warning":
warning: this `if let` can be collapsed into the outer `if let`
--> src\main.rs:107:21
|
107 | / if let Move::Normal { piece, from, to } = turn {
108 | | if i8::abs(from.1 - to.1) == 2 && piece.getColor() != *color && to.0 == x {
109 | | let offsetX = x - to.0;
110 | |
... |
116 | | }
117 | | }
| |_____________________^
I thought I could maybe just append the inner if using && but then i get a warning ( `let` expressions in this position are experimental, I am using rust version 1.57.0, not nightly).
Any idea what clippy wants me to do?
Edit:
the outer if let is itself again inside another if let:
if let Some(turn) = board.getLastMove() {
And it seems you can indeed combine them like so:
if let Some(Move::Normal { piece, from, to }) = board.getLastMove() {
In my opinion the clippy lint should include the line above as it is otherwise, at least for me, somewhat confusing
Edit 2:
Turns out I just cant read, below the warning listed above was some more information telling me exactly what to do.
= note: `#[warn(clippy::collapsible_match)]` on by default
help: the outer pattern can be modified to include the inner pattern
--> src\main.rs:126:29
|
126 | if let Some(turn) = board.getLastMove() {
| ^^^^ replace this binding
127 | if let Move::Normal { piece, from, to } = turn {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ with this pattern
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_match

Let a variable equal multiple values in an if-statement [duplicate]

This question already has an answer here:
Generating a new variable using conditional statements
(1 answer)
Closed 3 years ago.
I am doing data clean-up in Stata and I need to recode a variable to equal 1 if a whole set of other variables are equal to 1, 6, or 7.
I can do this using the code below:
replace anyadl = 1 if diffdress==1 | diffdress==6 | diffdress==7 | ///
diffwalk==1 | diffwalk==6 | diffwalk==7 | ///
diffbath==1 | diffbath==6 | diffbath==7 | ///
diffeat==1 | diffeat==6 | diffeat==7 | ///
diffbed==1 | diffbed==6 | diffbed==7 | ///
difftoi==1 | difftoi==6 | difftoi==7
However, this is very inefficient to type out and it is easy to make errors.
Is there a simpler way to do this?
For example, something along the following lines:
replace anyadl = 1 if diff* == (1 | 6 | 7)

Your fantasy syntax wouldn't do what you want even if it were legal, as for example 1|6|7 would be evaluated as 1. That is, in Stata 1 OR 6 OR 7 is in effect true OR true OR true, so true, and thus 1, given the rules non-zero is true as input and true is 1 as output. The expression is 1|6|7 is legal; it's the wildcard in an equality or inequality that isn't.
Stepping back, your code is producing an indicator (some people say dummy) variable with values 1 or missing. In practice such a variable is much more useful if created with values 0 and 1 (and in some instances missing too).
generate anyad1 = 0
foreach v in dress walk bath eat bed toi {
replace anyad1 = 1 if inlist(diff`v', 1, 6, 7)
}
is one approach. In general, note both inlist(foo, 1, 6, 7) and inlist(1, foo, bar, bazz) as useful constructs.
Reading:
This paper on generating indicators
This one on useful functions
This one on inlist() and inrange()
FAQ on true and false in Stata

"Borrowed Value Does Not Live Long Enough" when pushing into a vector

I am trying a daily programmer problem to shuffle a list of arguments and output them.
I'm not sure if this is the correct approach but it sounded like a good idea: remove the element from the args vector so it doesn't get repeated, and insert it into the result vector.
extern crate rand; // 0.7.3
use std::io;
use std::cmp::Ordering;
use std::env;
use rand::Rng;
fn main() {
let mut args: Vec<_> = env::args().collect();
let mut result: Vec<_> = Vec::with_capacity(args.capacity());
if args.len() > 1 {
println!("There are(is) {} argument(s)", args.len() - 1)
}
for x in args.iter().skip(1) {
let mut n = rand::thread_rng().gen_range(1, args.len());
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
I get the error:
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:18:22
|
18 | result.push(&args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
...
21 | for y in result.iter() {
| ------ borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
Older compilers said:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:18:42
|
18 | result.push(&args.swap_remove(n));
| ------------------- ^ temporary value dropped here while still borrowed
| |
| temporary value created here
...
24 | }
| - temporary value needs to live until here
|
= note: consider using a `let` binding to increase its lifetime

Let's start with a smaller example. This is called an Minimal, Reproducible Example , and is very valuable for both you as a programmer and for us to answer your question. Additionally, it can run on the Rust Playground, which is convenient.
fn main() {
let mut args = vec!["a".to_string()];
let mut result = vec![];
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
The problem arises because when you call swap_remove, the item is moved out of the vector and given to you - the ownership is transferred. You then take a reference to the item and try to store that reference in the result vector. The problem is that the item is dropped after the loop iteration has ended because nothing owns it. If you were allowed to take that reference, it would be a dangling reference, one that points to invalid memory. Using that reference could cause a crash, so Rust prevents it.
The immediate fix is to not take a reference, but instead transfer ownership from one vector to the other. Something like:
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
The problem with this is that you will get
error[E0502]: cannot borrow `args` as mutable because it is also borrowed as immutable
--> src/main.rs:7:21
|
5 | for _ in args.iter() {
| -----------
| |
| immutable borrow occurs here
| immutable borrow later used here
6 | let n = args.len() - 1;
7 | result.push(args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
See the args.iter? That creates an iterator that refers to the vector. If you changed the vector, then the iterator would become invalid, and allow access to an item that may not be there, another potential crash that Rust prevents.
I'm not making any claim that this is a good way to do it, but one solution would be to iterate while there are still items:
while !args.is_empty() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
I'd solve the overall problem by using shuffle:
use rand::seq::SliceRandom; // 0.8.3
use std::env;
fn main() {
let mut args: Vec<_> = env::args().skip(1).collect();
args.shuffle(&mut rand::thread_rng());
for y in &args {
println!("{}", y);
}
}

Operating on an F# List of Union Types

This is a continuation of my question at F# List of Union Types. Thanks to the helpful feedback, I was able to create a list of Reports, with Report being either Detail or Summary. Here's the data definition once more:
module Data
type Section = { Header: string;
Lines: string list;
Total: string }
type Detail = { State: string;
Divisions: string list;
Sections: Section list }
type Summary = { State: string;
Office: string;
Sections: Section list }
type Report = Detail of Detail | Summary of Summary
Now that I've got the list of Reports in a variable called reports, I want to iterate over those Report objects and perform operations based on each one. The operations are the same except for the cases of dealing with either Detail.Divisions or Summary.Office. Obviously, I have to handle those differently. But I don't want to duplicate all the code for handling the similar State and Sections of each.
My first (working) idea is something like the following:
for report in reports do
let mutable isDetail = false
let mutable isSummary = false
match report with
| Detail _ -> isDetail <- true
| Summary _ -> isSummary <- true
...
This will give me a way to know when to handle Detail.Divisions rather than Summary.Office. But it doesn't give me an object to work with. I'm still stuck with report, not knowing which it is, Detail or Summary, and also unable to access the attributes. I'd like to convert report to the appropriate Detail or Summary and then use the same code to process either case, with the exception of Detail.Divisions and Summary.Office. Is there a way to do this?
Thanks.

You could do something like this:
for report in reports do
match report with
| Detail { State = s; Sections = l }
| Summary { State = s; Sections = l } ->
// common processing for state and sections (using bound identifiers s and l)
match report with
| Detail { Divisions = l } ->
// unique processing for divisions
| Summary { Office = o } ->
// unique processing for office

The answer by #kvb is probably the approach I would use if I had the data structure you described. However, I think it would make sense to think whether the data types you have are the best possible representation.
The fact that both Detail and Summary share two of the properties (State and Sections) perhaps implies that there is some common part of a Report that is shared regardless of the kind of report (and the report can either add Divisions if it is detailed or just Office if if is summary).
Something like that would be better expressed using the following (Section stays the same, so I did not include it in the snippet):
type ReportInformation =
| Divisions of string list
| Office of string
type Report =
{ State : string;
Sections : Section list
Information : ReportInformation }
If you use this style, you can just access report.State and report.Sections (to do the common part of the processing) and then you can match on report.Information to do the varying part of the processing.
EDIT - In answer to Jeff's comment - if the data structure is already fixed, but the view has changed, you can use F# active patterns to write "adaptor" that provides access to the old data structure using the view that I described above:
let (|Report|) = function
| Detail dt -> dt.State, dt.Sections
| Summary st -> st.State, st.Sections
let (|Divisions|Office|) = function
| Detail dt -> Divisions dt.Divisions
| Summary st -> Office st.Office
The first active pattern always succeeds and extracts the common part. The second allows you to distinguish between the two cases. Then you can write:
let processReport report =
let (Report(state, sections)) = report
// Common processing
match report wiht
| Divisions divs -> // Divisions-specific code
| Office ofc -> // Offices-specific code
This is actually an excellent example of how F# active patterns provide an abstraction that allows you to hide implementation details.

kvb's answer is good, and probably what I would use. But the way you've expressed your problem sounds like you want classic inheritance.
type ReportPart(state, sections) =
member val State = state
member val Sections = sections
type Detail(state, sections, divisions) =
inherit ReportPart(state, sections)
member val Divisions = divisions
type Summary(state, sections, office) =
inherit ReportPart(state, sections)
member val Office = office
Then you can do precisely what you expect:
for report in reports do
match report with
| :? Detail as detail -> //use detail.Divisions
| :? Summary as summary -> //use summary.Office
//use common properties

You can pattern match on the Detail or Summary record in each of the union cases when you match and handle the Divisions or Office value with a separate function e.g.
let blah =
for report in reports do
let out = match report with
| Detail({ State = state; Divisions = divisions; Sections = sections } as d) ->
Detail({ d with Divisions = (handleDivisions divisions) })
| Summary({ State = state; Office = office; Sections = sections } as s) ->
Summary( { s with Office = handleOffice office })
//process out

You can refactor the code to have a utility function for each common field and use nested pattern matching:
let handleReports reports =
reports |> List.iter (function
| Detail {State = s; Sections = ss; Divisions = ds} ->
handleState s
handleSections ss
handleDivisions ds
| Summary {State = s; Sections = ss; Office = o} ->
handleState s
handleSections ss
handleOffice o)
You can also filter Detail and Summary to process them separately in different functions:
let getDetails reports =
List.choose (function Detail d -> Some d | _ -> None) reports
let getSummaries reports =
List.choose (function Summary s -> Some s | _ -> None) reports

Simple Text Analysis library for C

I'm in the midst of creating my school project for our programming class.
I'm making a Medical Care system console app and I want to implement this kind of feature:
When a user enters what they are feeling. (Like they are feeling sick, having sore throat, etc) I want the C Text analysis library to help me analyze and parse the info given by the user (which have been saved into a string) and determine the medicine to be given. (I'll be the one to give which medicine is for which, I just want the library to help me analyze the info given by the user).
Thanks!
A good example would be this one:
http://www.codeproject.com/Articles/32175/Lucene-Net-Text-Analysis
Unfortunately it's for C#
Update:
Any C library that can help me even for the simple tokenizing and indexing of the words? I know I could do it by brute force coding... But a reliable and stable api would be better. Thanks!

Analyzing natural language text is one of the most difficult problems you could possibly pick.
Most likely your solution will come down to simply looking for keywords like "sick" "sore throat", etc - which can be accomplished with a simple dictionary of keywords and results.
As far as truly "understanding" what the user typed though - good luck with that.
EDIT:
A few technologies worth pointing out:
Regarding your question about a lexer - you can easily use flex if you feel you need something like that. Probably faster (in terms of execution speed AND development speed) than trying to code the multi-token search by hand.
On Mac there is a very cool framework called Latent Semantic Mapping. There is a WWDC 2011 video on it - and it's awesome. You basically feed it a ton of example inputs and train it on what result you want. It may be as close as you're going to get. It is C-based.
http://en.wikipedia.org/wiki/Latent_semantic_mapping
https://developer.apple.com/library/mac/#documentation/TextFonts/Reference/LatentSemanticMapping/index.html

This is what wakkerbot makes of your question. (The scores are low, because wakkerbot/Hubert is all Dutch.)
But the tokeniser seems to do fine on English:
[ 6]: | 29/ 27| 4.792 | weight |
------|--------+----------+---------+--------+
0 11| 15645 | 10/ 9 | 0.15469 | 0.692 |'to'
1 0| 19416 | 10/10 | 0.12504 | 0.646 |'i'
2 10| 10483 | 4/ 3 | 0.10030 | 0.84 |'and'
3 3| 3292 | 5/ 5 | 0.09403 | 1.4 |'be'
4 7| 27363 | 3/ 3 | 0.06511 | 1.4 |'one'
5 12| 36317 | 3/ 3 | 0.06511 | 8.52 |'this'
6 2| 35466 | 2/ 2 | 0.05746 | 10.7 |'just'
7 4| 12258 | 2/ 2 | 0.05301 | 0.56 |'info'
8 18| 81898 | 2/ 2 | 0.04532 | 20.1 |'ll'
9 20| 67009 | 3/ 3 | 0.04124 | 48.8 |'text'
10 13| 70575 | 2/ 2 | 0.03897 | 156 |'give'
11 19| 16806 | 2/ 2 | 0.03426 | 1.13 |'c'
12 14| 5992 | 2/ 2 | 0.03376 | 0.914 |'for'
13 1| 3940 | 1/ 1 | 0.02561 | 1.12 |'my'
14 5| 7804 | 1/ 1 | 0.02561 | 2.94 |'class'
15 17| 7920 | 1/ 1 | 0.02561 | 7.35 |'feeling'
16 15| 20429 | 3/ 2 | 0.01055 | 3.93 |'com'
17 16| 36544 | 2/ 1 | 0.00433 | 4.28 |'www'
To support my lex/nonlex tokeniser argument, this is the relevant part of wakkerbot's tokeniser:
for(pos=0; str[pos]; ) {
switch(*sp) {
case T_INIT: /* initial */
if (myisalpha(str[pos])) {*sp = T_WORD; pos++; continue; }
if (myisalnum(str[pos])) {*sp = T_NUM; pos++; continue; }
/* if (strspn(str+pos, "-+")) { *sp = T_NUM; pos++; continue; }*/
*sp = T_ANY; continue;
break;
case T_ANY: /* either whitespace or meuk: eat it */
pos += strspn(str+pos, " \t\n\r\f\b" );
if (pos) {*sp = T_INIT; return pos; }
*sp = T_MEUK; continue;
break;
case T_WORD: /* inside word */
while ( myisalnum(str[pos]) ) pos++;
if (str[pos] == '\0' ) { *sp = T_INIT;return pos; }
if (str[pos] == '.' ) { *sp = T_WORDDOT; pos++; continue; }
*sp = T_INIT; return pos;
...
As you can see, most of the time will be spent in the line with while ( myisalnum(str[pos]) ) pos++;,
which catches all the words. myisalnum() is a static function, which will probably be inlined. (There are similar tight loops for numbers and whitespace, of course)
UPDATE: for completeness, the definition for myisalpha():
static int myisalpha(int ch)
{
/* with <ctype.h>, this is a table lookup, too */
int ret = isalpha(ch);
if (ret) return ret;
/* don't parse, just assume valid utf8 */
if (ch == -1) return 0;
if (ch & 0x80) return 1;
return 0;
}

Yes, There's a C++ Data science toolkit called MeTA - ModErn Text Analysis Toolkit. Here's follow the features:
text tokenization, including deep semantic features like parse trees
inverted and forward indexes with compression and various caching strategies
a collection of ranking functions for searching the indexes
topic models
classification algorithms
graph algorithms
language models
CRF implementation (POS-tagging, shallow parsing)
wrappers for liblinear and libsvm (including libsvm dataset parsers)
UTF8 support for analysis on various languages
multithreaded algorithms
It comes with tests and examples. In your case I think statistical classifiers, like Bayes, will perfectly do the job, but, you can also do manual classification. It was the best feat to my personal case. Hope it helps.
Here's the link https://meta-toolkit.org/
Best Regards,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

antlr visitor: lookup of reserved words efficiently - c++

Related

rust how to collapse if let - clippy suggestion

Let a variable equal multiple values in an if-statement [duplicate]

"Borrowed Value Does Not Live Long Enough" when pushing into a vector

Operating on an F# List of Union Types

Simple Text Analysis library for C

Categories

Resources