ANTLR4 grammar doesn't recognize declaration

ANTLR4 grammar doesn't recognize declaration - regex

I have a problem with my ANTLR grammar.
In SQL there are declaration types like UNSIGNED INT or UNSIGNED BIGINT. If I run my grammar with ANTLRWorks in Testrig the parser has a problem with the UNSIGNED.
This is my grammar part for declare_type
declare_type
: BIT
| BOOLEAN
| CHAR ('(' expression ')')?
| CHARACTER ('(' expression ')')?
...
Attempt 1:
...
| UNSIGNED? INT
| UNSIGNED? BIGINT
;
Attempt 2:
...
| INT
| BIGINT
| UNSIGNED INT
| UNSIGNED BIGINT
;
Attempt 3:
...
| INT
| BIGINT
| unsigned_separable_element
;
unsigned_separable_element
: UNSIGNED INT
| UNSIGNED BIGINT
;
I hope you guys know what I my problem is, thanks.
EDITED:
I uploaded the full grammar to GitHub
Example: DECLARE value UNSIGNED INT; doesn't work because the grammar doesn't recognize UNSIGNED
If I use only INT then it works

I went through the grammar and the problem is not poetic at all. Just a typo in the UNSIGNED lexer rule... It was " U N S I G N D" - missing E

Could you specify what kind of a problem do you have? Or at least post whole grammar file? The way lexer rules are specified is important in solving a lot of problems in ANTLR. Thanks.

Related

What does double ampersand do in bpf filter?

I'm reading bpf(berkeley packet filter) core part in linux kernel, and I'm a little bit confused in the following.
This is the part of the code:
static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
u64 *stack)
{
u64 tmp;
static const void *jumptable[256] = {
[0 ... 255] = &&default_label,
/* Now overwrite non-defaults ... */
/* 32 bit ALU operations */
[BPF_ALU | BPF_ADD | BPF_X] = &&ALU_ADD_X,
[BPF_ALU | BPF_ADD | BPF_K] = &&ALU_ADD_K,
[BPF_ALU | BPF_SUB | BPF_X] = &&ALU_SUB_X,
[BPF_ALU | BPF_SUB | BPF_K] = &&ALU_SUB_K,
So, what I am wondering is a role of the double ampersand. I already know about rvalue reference in C++, but it is C, not C++, isn't it?
I am so appreciate the help!

Even if this were C++, &&ALU_ADD_X and so on are expressions, not types, so the && couldn't indicate an rvalue reference.
If you scroll down a bit, you will find that all the ALU_* things, and default_label, are labels.
You will also find a goto *jumptable[op];, where op is a number.
GCC has an extension where you can take the "address" of a label as a value and use it as the target for goto.
&& is the operator that produces such a value.
A shorter example:
void example()
{
void* where = test_stuff() ? &&here : &&there;
goto *where;
here:
do_something();
return;
there:
do_something_else();
}
There's more information in the documentation (which is pretty much impossible to find unless you know what you're looking for).

see this document.
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
It likes gcc non-standard syntax.

Is there a limit of n bits when bit shifting?

While trying to come up with a scheme for a bitboard class, I decided use global compile time variables to represent key bit board configurations, ex. the initial position of all black rooks.
constexpr uint64_t BLACK_ROOK_INIT = 0x1 | (0x1 << 56);
However I am getting compiler errors. The compiler appears to be treating this value as a 32 bit value and type casting or adding additional 0's does not seem to make a difference. The type definition is from .
As soon as I drop constexp from this expression it compiles, however still produces the equivalent warning. Why is this happening? I thought it might be a limitation of the pre-processor, but the problem persists without constexp.
chess.cpp:16:64: error: right operand of shift expression ‘(1 << 56)’ is >= than the precision of the left operand [-fpermissive]
FYI, this also does not compile
constexpr int64_t BLACK_ROOK_INIT = (int64_t)0x1 | (int64_t)(0x1 << 32);

This is what you want:
#include <iostream>
int main(){
constexpr uint64_t BLACK_ROOK_INIT = 0x1ULL | (0x1ULL << 56);
std::cout<<BLACK_ROOK_INIT<<std::endl;
}
Your 0x1 value is, by default, an int, which is usually implemented as a 32-bit integer.
The suffixes are discussed here. If they make you a bit uncomfortable, as they do me, you can cast as follows:
#include <iostream>
int main(){
constexpr uint64_t BLACK_ROOK_INIT = (uint64_t)(0x1) | ((uint64_t)(0x1) << 56);
std::cout<<BLACK_ROOK_INIT<<std::endl;
}

here 1 is int and after shifting it crosses the limit of int.
so firstly we need to convert int to long long int(a/c to your requirements)
#include<bits/stdc++.h>
using namespace std;
int main(){
int n;
cin>>n;
ll l=(long long int)1<<n;
cout<<l<<endl;
}

What type is a macro considered? [duplicate]

This question already has answers here:
Type of #define variables
(7 answers)
Closed 7 years ago.
If I define a macro as #define LOGIC_ONE 1 and want to use the LOGIC_ONE in a case statement, what type is LOGIC_ONE considered?
Is it recognized as an int since I am defining it for value 1?

C++ macros are simple text replacements.
At the time the compiler starts, your LOGIC_ONE has already been replaced by 1by the precompiler. Its just the same as if you would have written 1right away. (Which, in this case, is an int literal...)
Edit to include the discussion in the comments:
If you (or someone else with access to your code) changes your #define LOGIC_ONE 1 to #define LOGIC_ONE "1"it would change its behaviour in your program and become a const char[] literal.
Edit:
Since this post got more attention than i expected, i thought i add the references to the C++ 14 Standard for those curious:
2.2 Phases of translation [lex.phases]
(...)
4. Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. (...) All preprocessing directives are then deleted.
(...)
7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.6). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
As stated, macros are replaced in phase 4 and no longer present afterwards. "Syntactical and semantical" analysation take place in phase 7, where the code gets compiled ("translated").
Integer literals are specified in
2.13.2 Integer literals [lex.icon]
(...)
An integer literal is a sequence of digits that has no period or exponent part, with optional separating single quotes that are ignored when determining its value. An integer literal may have a prefix that specifies its base and a suffix that specifies its type.
(...)
Table 5 — Types of integer literals
Suffix | Decimal literal | Binary, octal, or hexadecimal literal
-----------------------------------------------------------------------------
none | int | int
| long int | unsigned int
| long long int | long int
| | unsigned long int
| | long long int
| | unsigned long long int
-----------------------------------------------------------------------------
u or U | unsigned int | unsigned int
| unsigned long int | unsigned long int
| unsigned long long int | unsigned long long int
-----------------------------------------------------------------------------
l or L | long int | long int
| long long int | unsigned long int
| | long long int
| | unsigned long long int
-----------------------------------------------------------------------------
Both u or U | unsigned long int | unsigned long int
and l or L | unsigned long long int | unsigned long long int
-----------------------------------------------------------------------------
ll or LL | long long int | long long int
| unsigned long long int
-----------------------------------------------------------------------------
Both u or U |unsigned long long int | unsigned long long int
and ll or LL | |
String literals are specified in
2.13.5 String literals [lex.string]
(...)
1 A string-literal is a sequence of characters (as defined in 2.13.3) surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, rLR, as in "...", R"(...)", u8"...", u8R"**(...)**", u"...", uR"*~(...)*~", U"...", UR"zzz(...)zzz", L"...", or LR"(...)", respectively.
(...)
6 After translation phase 6, a string-literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.
7 A string-literal that begins with u8, such as u8"asdf", is a UTF-8 string literal.
8 Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration (3.7).

Preprocessor defines have no type - they are fundamentally just "pasted" in to the code where they appear. If for example, you use it in the statement;
int foo = LOGIC_ONE;
Then it'll be interpreted as integer. (The compiler, which runs after the preprocessor, just sees that code as int foo = 1;) You can even use it in a grotty statement such as;
int foo##LOGIC_ONE;
Then you'll be creating a variable foo1. Yuk!
Take an alternative example of macro definition;
#define LOGIC_ONE hello
int LOGIC_ONE = 5;
printf("%d\n", hello);
That's perfectly valid, and declares an int called hello, but shows that there is no "type" for defines - hello was merely substituted wherever LOGIC_ONE was encountered in the code.
Avoid using preprocessor macros unless absolutely necessary. Professional coding standards often prohibit or severely restrict the use of the preprocessor. There are generally always better ways to do things than use macros. For example, consider these alternatives;
static const int LOGIC_ONE = 1;
enum { LOGIC_ONE = 1 };
The preprocessor is a quick way for a learner to get in a real mess in C.

LOGIC_ONE is replaced by 1 everywhere it appears. As far as the compiler is concerned LOGIC_ONE doesn't exist , it just sees 1. So your question is 'is 1 an int?'. The answer to that is -> it depends where you type the 1

A macro is a text replacement. 1 is of type constexpr int.

Should I use static_cast or INT64_C to assign 64-bit constant portably?

Assigning a 64-bit constant as
int64_t foo = 0x1234LL;
is not portable, because long long isn't necessarily int64_t. This post Which initializer is appropriate for an int64_t? discusses use of INT64_C() macro from <stdint.h>, but isn't it also possible to use static_cast as
int64_t foo = static_cast<int64_t>(0x1234);
?
Which one should I prefer and why, or do both of them work well?
I have searched on the internet and on SO, but did not find any place where the static_cast option is explored. I have also done tests using sizeof() to confirm that it works in the simple cases.

Actually, long long is guaranteed to be at least 64 bits by the C implementation limits header <climits>. The minimum limit on the minimum and maximum values for an object of type long long is given as:
LLONG_MIN -9223372036854775807 // −(2^63 − 1)
LLONG_MAX +9223372036854775807 // 2^63 − 1
This corresponds to a signed 64 bit integer. You cannot store such a range of values without at least 64 information bits.
So go ahead and use 0x1234LL. In fact, you can just as much use no suffix, because the first of the following types that can fit the value will be chosen:
Suffix | Decimal constants | Octal or hexadecimal constant
-------|-------------------|------------------------------
none | int | int
| long int | unsigned int
| long long int | long int
| | unsigned long int
| | long long int
| | unsigned long long int
... | ... | ...

Is it possible to generate a parser for a language using the Reverse Polish notation with bison/yacc?

Is it possible to generate a parser for a scripting language that uses the Reverse Polish notation (and a Postscript-like syntax) using bison/yacc?
The parser should be able to parse code similar to the following one:
/fib
{
dup dup 1 eq exch 0 eq or not
{
dup 1 sub fib
exch 2 sub fib
add
} if
} def

Given the short description above and the notes on Wikipedia:
http://en.wikipedia.org/wiki/Stack-oriented_programming_language#PostScript_stacks
A simple bison grammer for the above could be:
%token ADD
%token DUP
%token DEF
%token EQ
%token EXCH
%token IF
%token NOT
%token OR
%token SUB
%token NUMBER
%token IDENTIFIER
%%
program : action_list_opt
action_list_opt : action_list
| /* No Action */
action_list : action
| action_list action
action : param_list_opt operator
param_list_opt : param_list
| /* No Parameters */
param_list : param
| param_list param
param : literal
| name
| action_block
operator : ADD
| DUP
| DEF
| EQ
| EXCH
| IF
| NOT
| OR
| SUB
literal : NUMBER
name : '/' IDENTIFIER
action_block : '{' program '}'
%%

Yes. Assuming you mean one that also uses postscript notation, it means you'd define your expressions something like:
expression: operand operand operator
Rather than the more common infix notation:
expression: operand operator operand
but that hardly qualifies as a big deal. If you mean something else by "Postcript-like", you'll probably have to clarify before a better answer can be given.
Edit: Allowing an arbitrary number of operands and operators is also pretty easy:
operand_list:
| operand_list operand
;
operator_list:
| operator_list operator
;
expression: operand_list operator_list
;
As it stands, this doesn't attempt to enforce the proper number of operators being present for any particular operand -- you'd have to add those checks separately. In a typical case, a postscript notation is executed on a stack machine, so most such checks become simple stack checks.
I should add that although you certainly can write such parsers in something like yacc, languages using postscript notation generally require such minimal parsing that you frequently feed them directly to some sort of virtual machine interpreter that executes them quite directly, with minimal parsing (mostly, the parsing comes down to throwing an error if you attempt to use a name that hasn't been defined).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ANTLR4 grammar doesn't recognize declaration - regex

I went through the grammar and the problem is not poetic at all. Just a typo in the UNSIGNED lexer rule... It was " U N S I G N D" - missing E

Could you specify what kind of a problem do you have? Or at least post whole grammar file? The way lexer rules are specified is important in solving a lot of problems in ANTLR. Thanks.

Related

What does double ampersand do in bpf filter?

Is there a limit of n bits when bit shifting?

What type is a macro considered? [duplicate]

Should I use static_cast or INT64_C to assign 64-bit constant portably?

Is it possible to generate a parser for a language using the Reverse Polish notation with bison/yacc?

Categories

Resources