I'm writing some code that uses Z3 strings to evaluate permissions in ACLs. So far with SMT2 this has been relatively easy. An eg. code of what I'm trying to acheive is:
(declare-const Group String)
(declare-const Resource String)
(define-fun acl1() Bool
(or (and
(= Group "employee")
(str.prefixof "shared/News_" Resource))
(and
(= Group "manager")
(or (str.prefixof "shared/Internal_" Resource)
(str.prefixof "shared/News_" Resource))
)))
(define-fun acl2() Bool
(and (and (str.prefixof "shared/" Resource)
(str.in.re Group re.allchar))
(not (and (str.prefixof "shared/Internal_" Resource)
(= Group "employee")))))
;; perm(acl1) <= perm(acl) iff acl1 => acl2
(define-fun conjecture() Bool
(=> (= acl1 true)
(= acl2 true)))
(assert (not conjecture))
(check-sat)
Reading the z3 c++ bindings, I can't figure out how to stick a z3::function to this yet. So far, assuming that define-fun is just a lisp macro, I have this.
#include <z3++.h>
z3::expr acl1(z3::context& c, z3::expr& G, z3::expr& R)
{
return (((G == c.string_val("employee")) &&
z3::prefixof(c.string_val("shared/News_"), R)) ||
((G == c.string_val("manager")) &&
(z3::prefixof(c.string_val("shared/Internal_"), R) ||
z3::prefixof(c.string_val("shared/News_"), R))));
}
z3::expr acl2(z3::context& c, z3::expr& G, z3::expr& R)
{
return ((z3::prefixof(c.string_val(""), G) &&
z3::prefixof(c.string_val("shared/"), R)) &&
!((G == c.string_val("employee")) &&
(z3::prefixof(c.string_val("shared/Internal"), R))));
}
z3::expr MakeStringFunction(z3::context* c, std::string s) {
z3::sort sort = c->string_sort();
z3::symbol name = c->str_symbol(s.c_str());
return c->constant(name, sort);
}
void acl_eval()
{
z3::context c;
auto Group = MakeStringFunction(&c, "Group");
auto Resource = MakeStringFunction(&c, "Resource");
auto acl1_f = acl1(c, Group, Resource);
auto acl2_f = acl2(c, Group, Resource);
auto conjecture = implies(acl1_f == c.bool_val(true),
acl2_f == c.bool_val(true));
z3::solver s(c);
s.add(!conjecture);
std::cout << s.to_smt2() << std::endl;
switch(s.check()){
case z3::unsat: std::cout<< "Valid Conjecture" << std::endl; break;
case z3::sat: std::cout << "Invalid Conjecture" << std::endl; break;
case z3::unknown: [[fallthrough]]
default:
std::cout << "Unknown" << std::endl;
}
}
int main(){
acl_eval();
return 0;
}
Is this how this is to be done wrt functions in C++ bindings?
while the smt2 code generated by C++ bindings don't exactly look like the other one, I see a whole expr inside an assert with let bindings which kind of does what I want. Additionally, I also want to know if C++ bindings support regex functions like the SMT lib of z3 exposes? I can't find any examples and the docs aren't very clear.
In general, you do not need to create "functions" in SMTLib when you're using the C++ (or any other high-level) API. Instead, you simply write functions in those languages, which generate the required code directly. This does sound confusing at first, but it is the intended use case: SMTLib functions get replaced by functions in the host language. Running them in the host language then produces the necessary syntax trees in the object language; i.e., Z3's internal AST representation. Especially in your case, you do not need any "arguments" passed to these functions, so you shouldn't be creating any at all. So, what you did here is correct.
(Side note: There can be scenarios where you do want to spit out functions in SMTLib. For instance if you want to use uninterpreted functions. Or perhaps you want to use the recursive function definitions, which you cannot really do in the host language. But let's not conflate the matters here. If you do feel you actually do need them, please ask a separate question about that. From your description, I see no reason for them.)
Regarding regular-expression expressions: They're all available in the C++ API, take a look here: https://z3prover.github.io/api/html/z3_09_09_8h_source.html#l03334
In particular, the functions you're looking for are:
in_re: For checking membership
re_full: Regular expression accepting all strings (Somewhat confusingly, SMTLibs allchar is called re_full in the C++ API.)
Hopefully that'll get you started!
Related
I'm planning a bunch of refactorings on a large code base that I'd like to automate using Clang tooling. For this, I'm trying to write a Clang AST Matcher expression.
Specifically, I'm trying to match pairs of statements that I'd like to replace with something else, like
a(); => a_and_b(x);
b(x);
So I'm trying to match an a callExpr() followed by a b callExpr() (but could be any statement, really). I have constructed matchers for the first and the second statements, independently, let's call them aMatcher() and bMatcher() but haven't found how to combine them so that they match only if they're back-to-back, something like bMatcher(follows(aMatcher()). None of the existing matchers seems to be pertinent (looked for "next", "prev", "position", ...).
How do I go about this the correct way, please?
The implementation of UseAnyOfAllOfCheck contains a private nextStmt matcher:
/// Matches a Stmt whose parent is a CompoundStmt, and which is directly
/// followed by a Stmt matching the inner matcher.
AST_MATCHER_P(Stmt, nextStmt, ast_matchers::internal::Matcher<Stmt>,
InnerMatcher) {
DynTypedNodeList Parents = Finder->getASTContext().getParents(Node);
if (Parents.size() != 1)
return false;
auto *C = Parents[0].get<CompoundStmt>();
if (!C)
return false;
const auto *I = llvm::find(C->body(), &Node);
assert(I != C->body_end() && "C is parent of Node");
if (++I == C->body_end())
return false; // Node is last statement.
return InnerMatcher.matches(**I, Finder, Builder);
}
I don't know how robust that is, but I'll try to use it and report back.
I wonder if it is possible to change the parser at runtime given it does not change the compound attribute.
Lets say I want to be able to modify at runtime the character of my parser that detects whether I have to join a line from ; to ~. Both are just characters and since the c++ types and the template instantiations dont vary (in both cases we are talking about a char) I think there must be some way, but I dont find it. So is this possible?
My concrete situation is that I am calling the X3 parser via C++/CLI and have the need that the character shall be adjustable from .NET. I hope the following example is enough to be able to understand my problem.
http://coliru.stacked-crooked.com/a/1cc2f2836dbfaa46
Kind regards
You cannot change the parser at runtime (except a DSO trick I described under your other question https://stackoverflow.com/a/56135824/3621421), but you can make your parser context-sensitive via semantic actions and/or stateful parsers (like x3::symbols).
The state for semantic actions (or probably for your custom parser) can also be stored in a parser context. However, usually I see that folks use global or function local variables for this purpose.
A simple example:
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
int main()
{
char const* s = "sep=,\n1,2,3", * e = s + std::strlen(s);
auto p = "sep=" >> x3::with<struct sep_tag, char>('\0')[
x3::char_[([](auto& ctx) { x3::get<struct sep_tag>(ctx) = _attr(ctx); })] >> x3::eol
>> x3::int_ % x3::char_[([](auto& ctx) { _pass(ctx) = x3::get<struct sep_tag>(ctx) == _attr(ctx); })]
];
if (parse(s, e, p) && s == e)
std::cout << "OK\n";
else
std::cout << "Failed\n";
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying to make a program which can read code from a file (similar to an interpretor but not as complex) store it into a linear list and then execute it when needed.
This is what i want this "interpretor" to know:
variable declaration
array declaration
for, while, if control structures
input and output files (output files can be opened in append mode)
non recursive functions
Because i execute it command by command i can stop/start at a specific number of commands executed which is helpful when trying to do more at the same time (similar to multithreading) and for this reason, i will name the class _MINI_THREAD. Here are the declarations of the struct COMMAND and class _MINI_THREAD:
struct COMMAND
{
unsigned int ID;
char command_text[151];
COMMAND* next;
COMMAND* prev;
};
class _MINI_THREAD
{
public:
void allocate_memory()
{
while (start_of_list == NULL)
start_of_list = new (std::nothrow) COMMAND;
while (end_of_list == NULL)
end_of_list = new (std::nothrow) COMMAND;
start_of_list -> prev = NULL;
start_of_list -> next = end_of_list;
end_of_list -> prev = start_of_list;
end_of_list -> next = NULL;
}
void free_memory()
{
for(COMMAND* i=start_of_list -> next;i!=end_of_list;i=i->next)
delete i -> prev;
delete end_of_list -> prev;
delete end_of_list;
}
bool execute_command(unsigned int number_of_commands)
{
for(unsigned int i=0;i<number_of_commands;i++)
{
/*match id of current command pointed by the cursor with a function from the map*/
if (cursor==end_of_list) return false;
else cursor=cursor->next;
}
return true;
}
bool if_finished()
{
if (cursor==end_of_list)return true;
else return false;
}
unsigned int get_ticks()
{
return ticks_per_loop;
}
void set_ticks(unsigned int ticks)
{
ticks_per_loop = ticks;
}
private:
unsigned int ticks_per_loop;
COMMAND* cursor=NULL;
COMMAND* start_of_list=NULL;
COMMAND* end_of_list=NULL;
};
I also try to keep the syntax of the "invented code" from the source files as close as possible to the c/c++ syntax but sometimes i placed a new parameter because it makes verification a lot easier. Please notice that even the while has a name so that i can manage nested loops faster.
Here is an example i came up with:
Source_file.txt
int a;
input_file fin ("numbers.in");
output_file fout ("numbers.out");
while loop_one ( fin.read(a,int,skipws) )
{
fout.print(a,int);
fout.print(32,char); /*prints a space after each number*/
}
close_input_file fin;
close_output_file fout;
/*This code is supposed to take all numbers from the input file and */
/* move them into the output file */
In the real program the object thread1 of class _MINI_THREAD contains a dinamically allocated list (i will display it as an array for simple understanding)
_MINI_THREAD thread1;
/*read from Source_file.txt each command into thread1 command array*/
thread1.commandarr={
define_integer("a"),
open_input_file("numbers.in",fin),
open_output_file("numbers.out",fout),
define_label_while("loop_one",fin.read()), /*if the condition is false the `cursor` will jump to labe_while_end*/
type_to_file(fout,a,int),
type_to_file(fout,32,char),
label_while_return("loop_one"), /*returns the cursor to the first line after the while declaration*/
label_while_end("loop_one"), /*marks the line after the while return point*/
close_input_file("numbers.in",fin),
close_output_file("numbers.out",fout),
};
/*the cursor is already pointing at the first command (define_integer("a"))*/
/*this will execute commands until the cursor reaches the end_of_list*/
while(thread1.execute_commands(1))NULL;
thread1.free_memory();
Now my problem is actually implementing the IF_CONTROL_STRUCTURE. Because you may want to type if (a==b) or if (foo()) etc... and i don't know how can i test all this stuff.
I somehow managed to make the cursor move accordingly to any structure (while,do ... while,for etc) with the idea of labels but still i cannot check the condition each structure has.
You really want to write some interpreter (probably using some bytecode). Read more about semantics.
Writing well a good interpreter is not a trivial task. Consider using some existing one, e.g. Lua, Guile, Neko, Python, Ocaml, .... and take some time to study their free software implementation.
Otherwise, spend several months reading stuff, notably:
SICP is an absolute must to read (and freely downloadable).
the Dragon Book
Programming Language Pragmatics
Lisp In Small Pieces
about the SECD machine
the GC handbook
Notice that an entire book (at least) is needed to explain how an interpreter works. See also relevant SIGPLAN conferences.
Many (multi-thread friendly) interpreters have some GIL. A genuinely multi-threaded interpreter (without any GIL) is very difficult to design (what exactly would be its REPL ???), and a multi-threaded garbage collector is also very difficult to implement and debug (consider using an existing one, perhaps MPS or Boehm's GC).
So "your simple work" could require several years of full-time work (and could get you a PhD).
a simpler approach
After having read SICP and becoming familiar with some Lisp-like language (probably some Scheme, e.g. thru Guile), you could decide on some simpler approach (basically a tiny Lisp interpreter which you could code in a few hundred lines of C++; not as serious as full-fledged interpreters mentioned before).
You first need to define on paper, at least in English, the syntax and the semantics of your scripting language. Take a strong inspiration from Lisp and its S-expressions. You probably want your scripting language to be homoiconic (so your AST would be values of your languages), and it would have (like Lisp) only expressions (and no statements). So the conditional is ternary like C++ ? :
You would represent the AST of your scripting language as some C++ data structure (probably some class with a few virtual methods). Parsing some script file into an AST (or a sequence of AST, maybe feed to some REPL) is so classical that I won't even explain; you might use some parser generator -improperly called compiler-compilers (like bison or lemon).
You would then at least implement some eval function. It takes two arguments Exp and Env: the first one, Exp, is the AST of the expression to be evaluated, and the second one, Env is some binding environment (defining the binding of local variables of your scripting language, it could be as simple as a stack of mapping from variables to values). And that eval function returns some value. It could be a member function of your AST class (then Exp is this, the receiver ....). Of course ASTs and values of your scripting language are some tagged union (which you might, if so wished, represent as a class hierarchy).
Implementing recursively such an eval in C++ is quite simple. Here is some pseudo code:
eval Exp Env :
if (Exp is some constant) {
return that constant }
if (Exp is a variable Var) {
return the bounded value of that Var in Env }
if (Exp is some primitive binary operator Op /* like + */
with operands Exp1 Exp2) {
compute V1 = eval Exp1 Env
and V2 = Exp2 Env
return the application of Op /* eg addition */ on V1 and V2
}
if (Exp is a conditional If Exp1 Exp2 Exp3) {
compute V1 = eval Exp1 Env
if (V1 is true) {
compute V2 = eval Exp2 Env
return V2
} else { /*V1 is false*/
compute V3 = eval Exp3 Env
return V3
}
}
.... etc....
There are many other cases to consider (e.g. some While, some Let or LetRec which probably would augment Env, primitive operations of different arities, Apply of an arbitrary functional value to some sequence of arguments, etc ...) that are left as an exercise to the reader. Of course some expressions have side effects when evaluated.
Both SICP and Lisp In Small Pieces explain the idea quite well. Read about meta-circular evaluators. Don't code without having read SICP ...
The code chunk in your question is a design error (even the MINI_THREAD thing is a mistake). Take a few weeks to read more, throw your code to the thrash bin, and start again. Be sure to use some version control system (I strongly recommend git).
Of course you want to be able to interpret recursive functions. There are not harder to interpret than non-recursive ones.
PS. I am very interested by your work. Please send me some email, and/or publish your tentative source code.
I have read that GOTO is bad, but how do I avoid it? I don't know how to program without GOTO. In BASIC I used GOTO for everything. What should I use instead in C and C++?
I used GOTO in BASIC like this:
MainLoop:
INPUT string$
IF string$ = "game" THEN
GOTO game
ENDIF
Consider the following piece of C++ code:
void broken()
{
int i = rand() % 10;
if (i == 0) // 1 in 10 chance.
goto iHaveABadFeelingAboutThis;
std::string cake = "a lie";
// ...
// lots of code that prepares the cake
// ...
iHaveABadFeelingAboutThis:
// 1 time out of ten, the cake really is a lie.
eat(cake);
// maybe this is where "iHaveABadFeelingAboutThis" was supposed to be?
std::cout << "Thank you for calling" << std::endl;
}
Ultimately, "goto" is not much different than C++'s other flow-control keywords: "break", "continue", "throw", etc; functionally it introduces some scope-related issues as demonstrated above.
Relying on goto will teach you bad habits that produce difficult to read, difficult to debug and difficult to maintain code, and it will generally tend to lead to bugs. Why? Because goto is free-form in the worst possible way, and it lets you bypass structural controls built into the language, such as scope rules, etc.
Few of the alternatives are particularly intuitive, and some of them are arguably as ambiguous as "goto", but at least you are operating within the structure of the language - referring back to the above sample, it's much harder to do what we did in the above example with anything but goto (of course, you can still shoot yourself in the foot with for/while/throw when working with pointers).
Your options for avoiding it and using the language's natural flow control constructs to keep code humanly readable and maintainable:
Break your code up into subroutines.
Don't be afraid of small, discrete, well-named functions, as long as you are not perpetually hauling a massive list of arguments around (if you are, then you probably want to look at encapsulating with a class).
Many novices use "goto" because they write ridiculously long functions and then find that they want to get from line 2 of a 3000 line function to line 2998. In the above code, the bug created by goto is much harder to create if you split the function into two payloads, the logic and the functional.
void haveCake() {
std::string cake = "a lie";
// ...
// lots of code that prepares the cake
// ...
eat(cake);
}
void foo() {
int i = rand() % 10;
if (i != 0) // 9 times out of 10
haveCake();
std::cout << "Thanks for calling" << std::endl;
}
Some folks refer to this as "hoisting" (I hoisted everything that needed to be scoped with 'cake' into the haveCake function).
One-shot for loops.
These are not always obvious to programmers starting out, it says it's a for/while/do loop but it's actually only intended to run once.
for ( ; ; ) { // 1-shot for loop.
int i = rand() % 10;
if (i == 0) // 1 time in 10
break;
std::string cake = "a lie";
// << all the cakey goodness.
// And here's the weakness of this approach.
// If you don't "break" you may create an infinite loop.
break;
}
std::cout << "Thanks for calling" << std::endl;
Exceptions.
These can be very powerful, but they can also require a lot of boiler plate. Plus you can throw exceptions to be caught further back up the call stack or not at all (and exit the program).
struct OutOfLuck {};
try {
int i = rand() % 10;
if (i == 0)
throw OutOfLuck();
std::string cake = "a lie";
// << did you know: cake contains no fat, sugar, salt, calories or chemicals?
if (cake.size() < MIN_CAKE)
throw CakeError("WTF is this? I asked for cake, not muffin");
}
catch (OutOfLuck&) {} // we don't catch CakeError, that's Someone Else's Problem(TM).
std::cout << "Thanks for calling" << std::endl;
Formally, you should try and derive your exceptions from std::exception, but I'm sometimes kind of partial to throwing const char* strings, enums and occasionally struct Rock.
try {
if (creamyGoodness.index() < 11)
throw "Well, heck, we ran out of cream.";
} catch (const char* wkoft /*what kind of fail today*/) {
std::cout << "CAKE FAIL: " << wkoft << std::endl;
throw std::runtime_error(wkoft);
}
The biggest problem here is that exceptions are intended for handling errors as in the second of the two examples immediately above.
There are several reasons to use goto, the main would be: conditional execution, loops and "exit" routine.
Conditional execution is managed by if/else generally, and it should be enough
Loops are managed by for, while and do while; and are furthermore reinforced by continue and break
The most difficult would be the "exit" routine, in C++ however it is replaced by deterministic execution of destructors. So to make you call a routine on exiting a function, you simply create an object that will perform the action you need in its destructor: immediate advantages are that you cannot forget to execute the action when adding one return and that it'll work even in the presence of exceptions.
Usually loops like for, while and do while and functions have more or less disposed the need of using GOTO. Learn about using those and after a few examples you won't think about goto anymore.
:)
Edsger Dijkstra published a famous letter titled Go To Statement Considered Harmful. You should read about it, he advocated for structured programming. That wikipedia article describes what you need to know about structured programming. You can write structured programs with goto, but that is not a popular view these days, for that perspective read Donald Knuth's Structured Programming with goto Statements.
goto is now displaced by other programming constructs like for, while, do-while etc, which are easier to read. But goto still has it's uses. I use it in a situation where different code blocks in a function (for e.g., which involve different conditional checks) have a single exit point. Apart from this one use for every other thing you should use appropriate programming constructs.
goto is not inherently bad, it has it's uses, just like any other language feature. You can completely avoid using goto, by using exceptions, try/catch, and loops as well as appropriate if/else constructs.
However, if you realize that you get extremly out of your way, just to avoid it, it might be an indiaction that it would be better to use it.
Personally I use goto to implement functions with single entry and exit points, which makes the code much more readable. This is the only thing where I still find goto usefull and actually improves the structure and readabillity of the code.
As an example:
int foo()
{
int fd1 = -1;
int fd2 = -1;
int fd3 = -1;
fd1 = open();
if(fd1 == -1)
goto Quit:
fd2 = open();
if(fd2 == -1)
goto Quit:
fd3 = open();
if(fd3 == -1)
goto Quit:
... do your stuff here ...
Quit:
if(fd1 != -1)
closefile();
if(fd2 != -1)
closefile();
if(fd3 != -1)
closefile();
}
In C++ you find, that the need for such structures might be drastically reduced, if you properly implement classes which encapsulate access to resources. For example using smartpointer are such an example.
In the above sample, you would implement/use a file class in C++, so that, when it gets destructed, the file handle is also closed. Using classes, also has the advantage that it will work when exceptions are thrown, because then the compiler ensures that all objects are properly destructed. So in C++ you should definitely use classes with destructors, to achieve this.
When you want to code in C, you should consider that extra blocks also add additional complexity to the code, which in turn makes the code harder to understand and to control. I would prefer a well placed goto anytime over a series of artifical if/else clauses just to avoid it. And if you later have to revisit the code, you can still understand it, without following all the extra blocks.
Maybe instead of
if(something happens)
goto err;
err:
print_log()
one can use :
do {
if (something happens)
{
seterrbool = true;
break; // You can avoid using using go to I believe
}
} while (false) //loop will work only one anyways
if (seterrbool)
printlog();
It may not seem friendly because in the example above there is only one goto but will be more readable if there are many "goto" .
This implementation of the function above avoids using goto's. Note, this does NOT contain a loop. The compiler will optimize this. I prefer this implementation.
Using 'break' and 'continue', goto statements can (almost?) always be avoided.
int foo()
{
int fd1 = -1;
int fd2 = -1;
int fd3 = -1;
do
{
fd1 = open();
if(fd1 == -1)
break;
fd2 = open();
if(fd2 == -1)
break:
fd3 = open();
if(fd3 == -1)
break;
... do your stuff here ...
}
while (false);
if(fd1 != -1)
closefile();
if(fd2 != -1)
closefile();
if(fd3 != -1)
closefile();
}
BASIC originally is an interpreted language. It doesn't have structures so it relies on GOTOs to jump the specific line, like how you jump in assembly. In this way the program flow is hard to follow, making debugging more complicated.
Pascal, C and all modern high-level programming languages including Visual Basic (which was based on BASIC) are strongly structured with "commands" grouped into blocks. For example VB has Do... Loop, While... End While, For...Next. Even some old derivatives of BASIC support structures like Microsoft QuickBASIC:
DECLARE SUB PrintSomeStars (StarCount!)
REM QuickBASIC example
INPUT "What is your name: ", UserName$
PRINT "Hello "; UserName$
DO
INPUT "How many stars do you want: ", NumStars
CALL PrintSomeStars(NumStars)
DO
INPUT "Do you want more stars? ", Answer$
LOOP UNTIL Answer$ <> ""
Answer$ = LEFT$(Answer$, 1)
LOOP WHILE UCASE$(Answer$) = "Y"
PRINT "Goodbye "; UserName$
END
SUB PrintSomeStars (StarCount)
REM This procedure uses a local variable called Stars$
Stars$ = STRING$(StarCount, "*")
PRINT Stars$
END SUB
Another example in Visual Basic .NET
Public Module StarsProgram
Private Function Ask(prompt As String) As String
Console.Write(prompt)
Return Console.ReadLine()
End Function
Public Sub Main()
Dim userName = Ask("What is your name: ")
Console.WriteLine("Hello {0}", userName)
Dim answer As String
Do
Dim numStars = CInt(Ask("How many stars do you want: "))
Dim stars As New String("*"c, numStars)
Console.WriteLine(stars)
Do
answer = Ask("Do you want more stars? ")
Loop Until answer <> ""
Loop While answer.StartsWith("Y", StringComparison.OrdinalIgnoreCase)
Console.WriteLine("Goodbye {0}", userName)
End Sub
End Module
Similar things will be used in C++, like if, then, for, do, while... which together define the program flow. You don't need to use goto to jump to the next statement. In specific cases you can still use goto if it makes the control flow clearer, but in general there's no need for it
I have to create a C++ program to display the valid LR(0) items in SLR parsing in compiler design. Till now I am able to take the grammar as an input from the user and find its closure. But i am not able to proceed further with the goto implementation in SLR. Can anyone please provide me the links or code as to how to display the valid LR(0) items of a grammar.
-Thanks in advance
You're able to take the closure of the grammar? Technically, the closure function is defined on sets of items (which are sets of productions with a position associated with each production).
Now, you ask for how to display the valid LR(0) items of a grammar. You either mean displaying all the items, as defined in the paragraph above, or displaying all states of the LR(0) automaton. The first is trivial because all possible items are valid, so I'm guessing you want all states. This is what you do (straight from the dragon book).
SetOfItems getValidStates(Grammar G) {
// S' -> S is the "first" production of G (which must be augmented)
SetOfItems C = {[S' -> *S]};
do {
bool added = false;
for (Item I : C) {
for (Symbol X : G) {
L = GOTO(I, X);
if (L.size() > 0 && !C.contains(L)) {
added = true;
C.add(L);
}
}
}
} while (added);
return C;
}
The only question is how to implement GOTO(SetOfItems, Symbol).
So,
SetOfItems GOTO(SetOfItems S, Symbol X) {
SetOfItems ret = {}
for (Item I : S)
if (I.nextSymbol().equals(X))
ret.add(I.moveDotByOne())
return closure(ret);
}
Each item in the set has the form [A -> a*Yb], where A is the head of some production and aXb is the body of the production (a and b are just a string of grammar symbols, Y is a single symbol). The '*' is just the position I mentioned - it's not in the grammar, and [A->a*Yb].nextSymbol() is Y. Basically, Item.nextSymbol() just returns whatever symbol is to the right of the dot. [A->a*Yb].moveDotByOne() returns [A->aY*b].
Now, I just finished the parsing chapter in the compiler book, and I'm not completely happy with my understanding, so be careful with what I've written.
As for a link to real code: http://ftp.gnu.org/gnu/bison/ is where you'll find bison's source, but that's a LALR parser generator, and I don't think it implements LR(0).