Display valid LR(0) items - c++

I have to create a C++ program to display the valid LR(0) items in SLR parsing in compiler design. Till now I am able to take the grammar as an input from the user and find its closure. But i am not able to proceed further with the goto implementation in SLR. Can anyone please provide me the links or code as to how to display the valid LR(0) items of a grammar.
-Thanks in advance

You're able to take the closure of the grammar? Technically, the closure function is defined on sets of items (which are sets of productions with a position associated with each production).
Now, you ask for how to display the valid LR(0) items of a grammar. You either mean displaying all the items, as defined in the paragraph above, or displaying all states of the LR(0) automaton. The first is trivial because all possible items are valid, so I'm guessing you want all states. This is what you do (straight from the dragon book).
SetOfItems getValidStates(Grammar G) {
// S' -> S is the "first" production of G (which must be augmented)
SetOfItems C = {[S' -> *S]};
do {
bool added = false;
for (Item I : C) {
for (Symbol X : G) {
L = GOTO(I, X);
if (L.size() > 0 && !C.contains(L)) {
added = true;
C.add(L);
}
}
}
} while (added);
return C;
}
The only question is how to implement GOTO(SetOfItems, Symbol).
So,
SetOfItems GOTO(SetOfItems S, Symbol X) {
SetOfItems ret = {}
for (Item I : S)
if (I.nextSymbol().equals(X))
ret.add(I.moveDotByOne())
return closure(ret);
}
Each item in the set has the form [A -> a*Yb], where A is the head of some production and aXb is the body of the production (a and b are just a string of grammar symbols, Y is a single symbol). The '*' is just the position I mentioned - it's not in the grammar, and [A->a*Yb].nextSymbol() is Y. Basically, Item.nextSymbol() just returns whatever symbol is to the right of the dot. [A->a*Yb].moveDotByOne() returns [A->aY*b].
Now, I just finished the parsing chapter in the compiler book, and I'm not completely happy with my understanding, so be careful with what I've written.
As for a link to real code: http://ftp.gnu.org/gnu/bison/ is where you'll find bison's source, but that's a LALR parser generator, and I don't think it implements LR(0).

Related

Safely unwrap consecutively

I have an if statement which needs to check for the existence of a value in a nested Option. The statement currently looks like
if my_vec.get(0).is_some() && my_vec.get(0).unwrap().is_some() {
// My True Case
} else {
// My Else Case
}
I feel like this is an amateurish way of checking if this potential nested value exists. I want to maintain safety when fetching the Option from the array as it may or may not exist, and also when unwrapping the Option itself. I tried using and_then and similar operators but haven't had any luck.
I would check the length first and access it like a regular array instead of using .get(x) unless there is some benefit in doing so (like passing it to something which expects an option).
if my_vec.len() > x && my_vec[x].is_some() {
// etc
}
Another option is to just match the value with an if let x = y or full match statement.
if let Some(Some(_)) = my_vec.get(x) {
// etc
}
The matches! macro can also be used in this situation similarly to the if let when you don't need to take a reference to the data.
if matches!(my_vec.get(x), Some(Some(_))) {
// etc
}
Or the and_then version, but personally it is probably my least favorite since it is longer and gargles the intention.
if my_vec.get(x).and_then(|y| y.as_ref()).is_some() {
// etc
}
You can pick whichever one is your favorite. They all compile down to the same thing (probably, I haven't checked).

define-fun macro & regex in Z3 C++ binding

I'm writing some code that uses Z3 strings to evaluate permissions in ACLs. So far with SMT2 this has been relatively easy. An eg. code of what I'm trying to acheive is:
(declare-const Group String)
(declare-const Resource String)
(define-fun acl1() Bool
(or (and
(= Group "employee")
(str.prefixof "shared/News_" Resource))
(and
(= Group "manager")
(or (str.prefixof "shared/Internal_" Resource)
(str.prefixof "shared/News_" Resource))
)))
(define-fun acl2() Bool
(and (and (str.prefixof "shared/" Resource)
(str.in.re Group re.allchar))
(not (and (str.prefixof "shared/Internal_" Resource)
(= Group "employee")))))
;; perm(acl1) <= perm(acl) iff acl1 => acl2
(define-fun conjecture() Bool
(=> (= acl1 true)
(= acl2 true)))
(assert (not conjecture))
(check-sat)
Reading the z3 c++ bindings, I can't figure out how to stick a z3::function to this yet. So far, assuming that define-fun is just a lisp macro, I have this.
#include <z3++.h>
z3::expr acl1(z3::context& c, z3::expr& G, z3::expr& R)
{
return (((G == c.string_val("employee")) &&
z3::prefixof(c.string_val("shared/News_"), R)) ||
((G == c.string_val("manager")) &&
(z3::prefixof(c.string_val("shared/Internal_"), R) ||
z3::prefixof(c.string_val("shared/News_"), R))));
}
z3::expr acl2(z3::context& c, z3::expr& G, z3::expr& R)
{
return ((z3::prefixof(c.string_val(""), G) &&
z3::prefixof(c.string_val("shared/"), R)) &&
!((G == c.string_val("employee")) &&
(z3::prefixof(c.string_val("shared/Internal"), R))));
}
z3::expr MakeStringFunction(z3::context* c, std::string s) {
z3::sort sort = c->string_sort();
z3::symbol name = c->str_symbol(s.c_str());
return c->constant(name, sort);
}
void acl_eval()
{
z3::context c;
auto Group = MakeStringFunction(&c, "Group");
auto Resource = MakeStringFunction(&c, "Resource");
auto acl1_f = acl1(c, Group, Resource);
auto acl2_f = acl2(c, Group, Resource);
auto conjecture = implies(acl1_f == c.bool_val(true),
acl2_f == c.bool_val(true));
z3::solver s(c);
s.add(!conjecture);
std::cout << s.to_smt2() << std::endl;
switch(s.check()){
case z3::unsat: std::cout<< "Valid Conjecture" << std::endl; break;
case z3::sat: std::cout << "Invalid Conjecture" << std::endl; break;
case z3::unknown: [[fallthrough]]
default:
std::cout << "Unknown" << std::endl;
}
}
int main(){
acl_eval();
return 0;
}
Is this how this is to be done wrt functions in C++ bindings?
while the smt2 code generated by C++ bindings don't exactly look like the other one, I see a whole expr inside an assert with let bindings which kind of does what I want. Additionally, I also want to know if C++ bindings support regex functions like the SMT lib of z3 exposes? I can't find any examples and the docs aren't very clear.
In general, you do not need to create "functions" in SMTLib when you're using the C++ (or any other high-level) API. Instead, you simply write functions in those languages, which generate the required code directly. This does sound confusing at first, but it is the intended use case: SMTLib functions get replaced by functions in the host language. Running them in the host language then produces the necessary syntax trees in the object language; i.e., Z3's internal AST representation. Especially in your case, you do not need any "arguments" passed to these functions, so you shouldn't be creating any at all. So, what you did here is correct.
(Side note: There can be scenarios where you do want to spit out functions in SMTLib. For instance if you want to use uninterpreted functions. Or perhaps you want to use the recursive function definitions, which you cannot really do in the host language. But let's not conflate the matters here. If you do feel you actually do need them, please ask a separate question about that. From your description, I see no reason for them.)
Regarding regular-expression expressions: They're all available in the C++ API, take a look here: https://z3prover.github.io/api/html/z3_09_09_8h_source.html#l03334
In particular, the functions you're looking for are:
in_re: For checking membership
re_full: Regular expression accepting all strings (Somewhat confusingly, SMTLibs allchar is called re_full in the C++ API.)
Hopefully that'll get you started!

Can I use virtual tokens (tokens with identical return value) in ANTLR4 similar to c++?

In C++ I can use virtual functions to process data from similar classes that have the same parent/ancestor, does ANTLR4 support this and how would I have to set up the grammar?
I have tried to set up a grammar, using arguments that have the same return value and use that value in a token that contains the different "subclassed" tokens.
Here is some code I have tried to work with:
amf_group
: statements=amf_statements (GROUPSEP WS? LINE_COMMENT? EOL? | EOF)
;
amf_statements returns [amf::AmfStatements stmts]
: ( WS? ( stmt=amf_statement { stmts.emplace_back(std::move($stmt.stmtptr)); } WS? EOL) )*
;
amf_statement returns [amf::AmfStatementPtr stmtptr]
: (
stmt = jsonparent_statement
| stmt = jsonvalue_statement
)
{
$stmtptr = std::move($stmt.stmtptr);
}
;
jsonparent_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
:
(T_JSONPAR { $lineno = $T_JSONPAR.line;} ) WS (arg=integer_const)
{
$stmtptr = std::make_shared<amf::JSONParentStatement>($lineno, nullptr);
}
;
jsonvalue_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
: ( T_JSONVALUE { $lineno = $T_JSONVALUE.line; } ) WS (arg=integer_const) (WS fmt=integer_const)?
{
$stmtptr = std::make_shared<amf::JSONValueStatement>($lineno, std::move($arg.argptr), std::move($fmt.argptr));
}
;
I receive the following error:
error(75): amf1.g4:23:10: label stmt=jsonvalue_statement type mismatch with previous definition: stmt=jsonparent_statement
This error is or course quite logical, because the tokens are indeed of a different type, but there return value types are identical. For two (virtual) tokens I can write all the code separatelty, but in my case I have some 40+ different tokens that either represent arguments or statements and writing all the combinations would be cumbersome. The above code did work in Antlr3 by the way.
Is there another way to get around these errors using ANTLR4? Does anybody have any suggestions?
What's specified in a rule return value is not really a return value in a functional sense. Instead the context representing the rule will get a new member field that takes the "return" value. Given that it makes no sense trying to treat parser rules like C++ functions, they are simply not comparable.
Instead of handling all the fields in your grammar, I recommend a different approach: with ANTLR4 you will get a parse tree (if enabled), which represents the matched rules using parse rule contexts (which is super view of the previously generated AST). This context contains all the values that have been parsed out. You just need a listener in a second step after the parse run (often called the semantic phase) to walk over this tree, pick those values up and create your own data structures from them. This separation also allows to use your parser for quick syntax checks, since you don't do all the heavy work in the parse run.

C++ Running actual code from file without compiling it [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying to make a program which can read code from a file (similar to an interpretor but not as complex) store it into a linear list and then execute it when needed.
This is what i want this "interpretor" to know:
variable declaration
array declaration
for, while, if control structures
input and output files (output files can be opened in append mode)
non recursive functions
Because i execute it command by command i can stop/start at a specific number of commands executed which is helpful when trying to do more at the same time (similar to multithreading) and for this reason, i will name the class _MINI_THREAD. Here are the declarations of the struct COMMAND and class _MINI_THREAD:
struct COMMAND
{
unsigned int ID;
char command_text[151];
COMMAND* next;
COMMAND* prev;
};
class _MINI_THREAD
{
public:
void allocate_memory()
{
while (start_of_list == NULL)
start_of_list = new (std::nothrow) COMMAND;
while (end_of_list == NULL)
end_of_list = new (std::nothrow) COMMAND;
start_of_list -> prev = NULL;
start_of_list -> next = end_of_list;
end_of_list -> prev = start_of_list;
end_of_list -> next = NULL;
}
void free_memory()
{
for(COMMAND* i=start_of_list -> next;i!=end_of_list;i=i->next)
delete i -> prev;
delete end_of_list -> prev;
delete end_of_list;
}
bool execute_command(unsigned int number_of_commands)
{
for(unsigned int i=0;i<number_of_commands;i++)
{
/*match id of current command pointed by the cursor with a function from the map*/
if (cursor==end_of_list) return false;
else cursor=cursor->next;
}
return true;
}
bool if_finished()
{
if (cursor==end_of_list)return true;
else return false;
}
unsigned int get_ticks()
{
return ticks_per_loop;
}
void set_ticks(unsigned int ticks)
{
ticks_per_loop = ticks;
}
private:
unsigned int ticks_per_loop;
COMMAND* cursor=NULL;
COMMAND* start_of_list=NULL;
COMMAND* end_of_list=NULL;
};
I also try to keep the syntax of the "invented code" from the source files as close as possible to the c/c++ syntax but sometimes i placed a new parameter because it makes verification a lot easier. Please notice that even the while has a name so that i can manage nested loops faster.
Here is an example i came up with:
Source_file.txt
int a;
input_file fin ("numbers.in");
output_file fout ("numbers.out");
while loop_one ( fin.read(a,int,skipws) )
{
fout.print(a,int);
fout.print(32,char); /*prints a space after each number*/
}
close_input_file fin;
close_output_file fout;
/*This code is supposed to take all numbers from the input file and */
/* move them into the output file */
In the real program the object thread1 of class _MINI_THREAD contains a dinamically allocated list (i will display it as an array for simple understanding)
_MINI_THREAD thread1;
/*read from Source_file.txt each command into thread1 command array*/
thread1.commandarr={
define_integer("a"),
open_input_file("numbers.in",fin),
open_output_file("numbers.out",fout),
define_label_while("loop_one",fin.read()), /*if the condition is false the `cursor` will jump to labe_while_end*/
type_to_file(fout,a,int),
type_to_file(fout,32,char),
label_while_return("loop_one"), /*returns the cursor to the first line after the while declaration*/
label_while_end("loop_one"), /*marks the line after the while return point*/
close_input_file("numbers.in",fin),
close_output_file("numbers.out",fout),
};
/*the cursor is already pointing at the first command (define_integer("a"))*/
/*this will execute commands until the cursor reaches the end_of_list*/
while(thread1.execute_commands(1))NULL;
thread1.free_memory();
Now my problem is actually implementing the IF_CONTROL_STRUCTURE. Because you may want to type if (a==b) or if (foo()) etc... and i don't know how can i test all this stuff.
I somehow managed to make the cursor move accordingly to any structure (while,do ... while,for etc) with the idea of labels but still i cannot check the condition each structure has.
You really want to write some interpreter (probably using some bytecode). Read more about semantics.
Writing well a good interpreter is not a trivial task. Consider using some existing one, e.g. Lua, Guile, Neko, Python, Ocaml, .... and take some time to study their free software implementation.
Otherwise, spend several months reading stuff, notably:
SICP is an absolute must to read (and freely downloadable).
the Dragon Book
Programming Language Pragmatics
Lisp In Small Pieces
about the SECD machine
the GC handbook
Notice that an entire book (at least) is needed to explain how an interpreter works. See also relevant SIGPLAN conferences.
Many (multi-thread friendly) interpreters have some GIL. A genuinely multi-threaded interpreter (without any GIL) is very difficult to design (what exactly would be its REPL ???), and a multi-threaded garbage collector is also very difficult to implement and debug (consider using an existing one, perhaps MPS or Boehm's GC).
So "your simple work" could require several years of full-time work (and could get you a PhD).
a simpler approach
After having read SICP and becoming familiar with some Lisp-like language (probably some Scheme, e.g. thru Guile), you could decide on some simpler approach (basically a tiny Lisp interpreter which you could code in a few hundred lines of C++; not as serious as full-fledged interpreters mentioned before).
You first need to define on paper, at least in English, the syntax and the semantics of your scripting language. Take a strong inspiration from Lisp and its S-expressions. You probably want your scripting language to be homoiconic (so your AST would be values of your languages), and it would have (like Lisp) only expressions (and no statements). So the conditional is ternary like C++ ? :
You would represent the AST of your scripting language as some C++ data structure (probably some class with a few virtual methods). Parsing some script file into an AST (or a sequence of AST, maybe feed to some REPL) is so classical that I won't even explain; you might use some parser generator -improperly called compiler-compilers (like bison or lemon).
You would then at least implement some eval function. It takes two arguments Exp and Env: the first one, Exp, is the AST of the expression to be evaluated, and the second one, Env is some binding environment (defining the binding of local variables of your scripting language, it could be as simple as a stack of mapping from variables to values). And that eval function returns some value. It could be a member function of your AST class (then Exp is this, the receiver ....). Of course ASTs and values of your scripting language are some tagged union (which you might, if so wished, represent as a class hierarchy).
Implementing recursively such an eval in C++ is quite simple. Here is some pseudo code:
eval Exp Env :
if (Exp is some constant) {
return that constant }
if (Exp is a variable Var) {
return the bounded value of that Var in Env }
if (Exp is some primitive binary operator Op /* like + */
with operands Exp1 Exp2) {
compute V1 = eval Exp1 Env
and V2 = Exp2 Env
return the application of Op /* eg addition */ on V1 and V2
}
if (Exp is a conditional If Exp1 Exp2 Exp3) {
compute V1 = eval Exp1 Env
if (V1 is true) {
compute V2 = eval Exp2 Env
return V2
} else { /*V1 is false*/
compute V3 = eval Exp3 Env
return V3
}
}
.... etc....
There are many other cases to consider (e.g. some While, some Let or LetRec which probably would augment Env, primitive operations of different arities, Apply of an arbitrary functional value to some sequence of arguments, etc ...) that are left as an exercise to the reader. Of course some expressions have side effects when evaluated.
Both SICP and Lisp In Small Pieces explain the idea quite well. Read about meta-circular evaluators. Don't code without having read SICP ...
The code chunk in your question is a design error (even the MINI_THREAD thing is a mistake). Take a few weeks to read more, throw your code to the thrash bin, and start again. Be sure to use some version control system (I strongly recommend git).
Of course you want to be able to interpret recursive functions. There are not harder to interpret than non-recursive ones.
PS. I am very interested by your work. Please send me some email, and/or publish your tentative source code.

Function with a custom return type and the "false" return conditions?

I have a function that returns a custom class structure, but how should I handle the cases where I wish to inform the user that the function has failed, as in return false.
My function looks something like this:
Cell CSV::Find(std::string segment) {
Cell result;
// Search code here.
return result;
}
So when succesful, it returns the proper result, but how should I handle the case when it could fail?
I thought about adding a boolean method inside Cell to check what ever Cell.data is empty or not (Cell.IsEmpty()). But am I thinking this issue in a way too complicated way?
There are three general approaches:
Use exceptions. This is what's in Bathsheba's answer.
Return std::optional<Cell> (or some other type which may or may not hold an actual Cell).
Return bool, and add a Cell & parameter.
Which of these is best depends on how you intend this function to be used. If the primary use case is passing a valid segment, then by all means use exceptions.
If part of the design of this function is that it can be used to tell if a segment is valid, exceptions aren't appropriate, and my preferred choice would be std::optional<Cell>. This may not be available on your standard library implementation yet (it's a C++17 feature); if not, boost::optional<Cell> may be useful (as mentioned in Richard Hodges's answer).
In the comments, instead of std::optional<Cell>, user You suggested expected<Cell, error> (not standard C++, but proposed for a future standard, and implementable outside of the std namespace until then). This may be a good option to add some indication on why no Cell could be found for the segment parameter passed in, if there are multiple possible reasons.
The third option I include mainly for completeness. I do not recommend it. It's a popular and generally good pattern in other languages.
Is this function a query, which could validly not find the cell, or is it an imperative, where the cell is expected to be found?
If the former, return an optional (or nullable pointer to) the cell.
If the latter, throw an exception if not found.
Former:
boost::optional<Cell> CSV::Find(std::string segment) {
boost::optional<Cell> result;
// Search code here.
return result;
}
Latter:
as you have it.
And of course there is the c++17 variant-based approach:
#include <variant>
#include <string>
struct CellNotFound {};
struct Cell {};
using CellFindResult = std::variant<CellNotFound, Cell>;
CellFindResult Find(std::string segment) {
CellFindResult result { CellNotFound {} };
// Search code here.
return result;
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
void cellsAndStuff()
{
std::visit(overloaded
{
[&](CellNotFound)
{
// the not-found code
},
[&](Cell c)
{
// code on cell found
}
}, Find("foo"));
}
The C++ way of dealing with abject failures is to define an exception class of the form:
struct CSVException : std::exception{};
In your function you then throw one of those in the failure branch:
Cell CSV::Find(std::string segment) {
Cell result;
// Search code here.
if (fail) throw CSVException();
return result;
}
You then handle the fail case with a try catch block at the calling site.
If however the "fail" branch is normal behaviour (subjective indeed but only you can be the judge of normality), then do indeed imbue some kind of failure indicator inside Cell, or perhaps even change the return type to std::optional<Cell>.
If you can use C++17, another approach would be to use an std::optional type as your return value. That's a wrapper that may or may not contain a value. The caller can then check whether your function actually returned a value and handle the case where it didn't.
std::optional<Cell> CSV::Find(std::string segment) {
Cell result;
// Search code here.
return result;
}
void clientCode() {
auto cell = CSV::Find("foo");
if (cell)
// do stuff when found
else
// handle not found
}
A further option is using multiple return values:
std::pair<Cell, bool> CSV::Find(std::string segment) {
Cell result;
// Search code here.
return {result, found};
}
// ...
auto cell = CSV::Find("foo");
if (cell->second)
// do stuff with cell->first
The boolean flag says whether the requested Cell was found or not.
PROs
well known approach (e.g. std::map::insert);
quite direct: value and success indicator are return values of the function.
CONs
obscureness of first and second which requires to always remember the relative positions of values within the pairs. C++17 structured bindings / if statement with initializer partially resolve this issue:
if (auto [result, found] = CSV::Find("foo"); found)
// do stuff with `result`
possible loss of safety (the calling code has to check if there is a result value, before using it).
Details
Returning multiple values from functions in C++
C++ Error Handling - downside of using std::pair or std::tuple for returning error codes and function returns
For parsing, it is generally better to avoid std::string and instead use std::string_view; if C++17 is not available, minimally functional versions can be whipped up easily enough.
Furthermore, it is also important to track not only what was parsed but also the remainder.
There are two possibilities to track the remainder:
taking a mutable argument (by reference),
returning the remainder.
I personally prefer the latter, as in case of errors it guarantees that the caller has in its hands a unmodified value which is useful for error-reporting.
Then, you need to examine what potential errors can occur, and what recovery mechanisms you wish for. This will inform the design.
For example, if you wish to be able to parse ill-formed CSV documents, then it is reasonable that Cell be able to represent ill-formed CSV cells, in which case the interface is rather simple:
std::pair<Cell, std::string_view> consume_cell(std::string_view input) noexcept;
Where the function always advances and the Cell may contain either a proper cell, or an ill-formed one.
On the other hand, if you only wish to support well-formed CSV documents, then it is reasonable to signal errors via exceptions and that Cell only be able to hold actual cells:
std::pair<std::optional<Cell>, std::string_view> consume_cell(...);
And finally, you need to think about how to signal end of row conditions. It may a simple marker on Cell, though at this point I personally prefer to create an iterator as it presents a more natural interface since a row is a range of Cell.
The C++ interface for iterators is a bit clunky (as you need an "end", and the end is unknown before parsing), however I recommend sticking to it to be able to use the iterator with for loops. If you wish to depart from it, though, at least make it work easily with while, such as std::optional<Cell> cell; while ((cell = row.next())) { ... }.