Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm currently embedding Lua and using it as a glorified intelligent config file. However, I think I'm missing something since people rave about the uses of Lua.
For example, I can easily explain why you might use shell scripting instead of C by showing this example (admittedly , boost regexp is overkill):
#include <dirent.h>
#include <stdio.h>
#include <boost/regex.hpp>
int main(int argc, char * argv[]) {
DIR *d;
struct dirent *dir;
boost::regex re(".*\\.cpp$");
if (argc==2) d = opendir(argv[1]); else d = opendir(".");
if (d) {
while ((dir = readdir(d)) != NULL) {
if (boost::regex_match(dir->d_name, re)) printf("%s\n", dir->d_name);
}
closedir(d);
}
return(0);
and compare it to:
for foo in *.cpp; do echo $foo; done;
Are there any examples that you can give in Lua which can make it 'click' for me?
EDIT: Maybe my problem is that I don't know Lua well enough yet to use it fluently as I'm finding it easier to write C code.
EDIT2:
One example is a toy factorial program in C++ and Lua:
#include <iostream>
int fact (int n){
if (n==0) return 1; else
return (n*fact(n-1));
}
int main (){
int input;
using namespace std;
cout << "Enter a number: " ;
cin >> input;
cout << "factorial: " << fact(input) << endl;
return 0;
}
Lua:
function fact (n)
if n==0 then
return 1
else
return n * (fact(n-1))
end
end
print ("enter a number")
a = io.read("*number")
print ("Factorial: ",fact(a))
Here, the programs look alike, but there's clearly some cruft in the include, namespace and main() declarations you can get rid of. Also remove variable declarations and strong typing.
Now are people saying this is the advantage which adds up over a larger program, or is there more to it? This doesn't stand out in the same way as the bash example.
Using a scripting language such as Lua has many other benefits.
A couple of advantages to Lua vs. C++:
It's often shorter in terms of development time due to the high-level nature, as in your example.
It doesn't require recompilation to change behavior.
Behavior can be changed on non-development machines.
Prototyping is very fast and easy, since you can just tweak logic at runtime.
Scripting languages reduce the effort required to build complex GUIs that otherwise require a lot of framework glue and repetition of code. Several GUI toolkits are available with Lua bindings, including wxWidgets and the IUP toolkit.
In both of those bindings, first class function values and full closures make event callbacks easy to code and easy to use.
A large application using Lua at its core (such as Adobe Photoshop Lightroom) has an outer C/C++ program that hosts the Lua interpreter and provides access to its core features by registering C functions with that interpreter. It typically implements compute-intensive core functions in C functions, but leaves the overall flow, operation, and even the GUI layout to Lua scripts.
I have found in my own projects that it is often the case that the stock standalone Lua interpreter (lua.exe or wlua.exe) is sufficient for the outer application when combined with IUP loaded at run time along with one or two custom DLL-based Lua modules coded in C that implement features that require that level of performance, or features that are implemented via other C-callable libraries.
The important points for my projects have included:
True tail calls allow for a easy expression of finite state machines.
Garbage collected memory management.
Anonymous functions, closures, first class function values.
Hash tables.
Rich enough string library.
Userdata extends the garbage collector to C side allocations.
Metatables allow a rich variety of object oriented and functional techniques.
Small but sufficiently powerful C API.
Good documentation, with open source as a backup.
Good user to user support through the mailing list and wiki.
Powerful modules such as a PEG parser available from the authors and from the community.
One of my favorite examples to cite is a test jig I built for an embedded system that required about 1000 lines of Lua and 1000 lines of C, ran under lua.exe, and used IUP to present a full Windows GUI. The first version was running in about a day. In C++ with MFC, it would have been at least a week's work, and many thousands of lines of code.
I don't know if I make it 'click' for you but I'll try.
One of the advantages of embedding Lua is, that you can not only use it as a configfile but actually offer your C/C++-interfaces to lua and 'script' their usage via the Lua-scripting language.
If you want to change the behaviour/logic of your application, you just have to change the code in the Lua-script without the need to recompile the whole application.
Prominent uses are game logics like AI oder state machines, where a fast roundtrip-time from change to play is essential for developing the game.
Of course, the main logic has then to be present within the Lua-script, not within the C/C++-code, to be effectively used.
Try to implement a Lua table in C/C++, you'll see the strength of Lua right there.
In Lua:
a["index"] = "value"
In C, start by reading about linked list...
C++ STL may help, but it is going to be a lot more verbose than Lua.
Also, Lua makes great glue. It is so easy (IMHO) to interface to C.
Programming just in C can be a very tedious and redundant task, this certainly applies when compared to more abstract, high level languages.
In this sense, you can get started and finish things much more quickly than doing everything directly in C, this is because many things that need to be set up, done and cleaned up explicitly and manually in C, are often implicitly and automatically handled by a scripting language such as Lua (e.g. imagine memory management).
Similarly, many more abstract data structures and algorithms are often directly provided by such high level languages, so you don't have to re-invent the wheel and re-implement it, if all you need is a standard container (think linked list, tree, map etc).
So, you can get a fairly good ROI when using a fairly abstract scripting language such as Lua or even Python, especially if the corresponding language comes with a good library of core functionality.
So, scripting is actually ideal in order to prototype ideas and projects, because that's when you need to be able to concentrate on your effort, rather than all the mechanical redundancies that are likely to be identical for most projects.
Once you got a basic prototype done, you can always see how to improve and optimize it further, possibly be re-implementing essential key functionality in C space, in order to improve runtime performance.
LUA has closures, and closures rock. For example:
function newCounter ()
local i = 0
return function () -- anonymous function
i = i + 1
return i
end
end
c1 = newCounter()
print(c1()) --> 1
print(c1()) --> 2
You can create a function and pass it around. Sometimes it is more handy than creating separate class and instantiate it.
For an example of where Lua fits better then c++ look at distributing scripts. Mush Client offers Lua as a scripting language. As shown by the link above you can do a lot with Lua to extend the program. Unlike C++ though Lua doesn't have to be compiled and can be restricted. For example you can sandbox Lua so it can't access the file system. This means that if you get a script from someone else it is incapable of destroying your data since it can't write to the disk.
The main advantages of Lua as a programming language (apart from the embeddability) are
Powerful, efficient hash table as the main data structure
String-processing library with an excellent balance of complexity and expressive power
First-class functions and generic for loop
Automatic memory management!!
It's hard to find a short example that illustrates all these. I have 191 Lua scripts in my ~/bin directory; here's one that takes the output of pstotext and joins up lines that end in a hyphen:
local function printf(...) return io.stdout:write(string.format(...)) end
local function eprintf(...) return io.stderr:write(string.format(...)) end
local strfind, strlen = string.find, string.len
function joined_lines(f)
return coroutine.wrap(function()
local s = ''
for l in f:lines() do
s = s .. l
local n = strlen(s)
if strfind(s, '[%-\173]$', n-1) then
s = string.sub(s, 1, n-1)
else
coroutine.yield(s)
s = ''
end
end
end)
end
-- printf('hyphen is %q; index is %d\n', '', string.byte(''))
for _, f in ipairs(arg) do
for l in joined_lines(io.popen('pstotext ' .. f, 'r')) do
printf('%s\n', l)
end
end
This example shows several features to advantage but does nothing interesting with tables.
Here's a short snippet from a Key Word In Context indexing program, which fetches context from a table and formats the key word in context. This example makes more extensive use of nested functions and shows some more table and string stuff:
local function showpos(word, pos, lw, start)
-- word is the key word in which the search string occurs
-- pos is its position in the document
-- lw is the width of the context around the word
-- start is the position of the search string within the word
local shift = (start or 1) - 1 -- number of cols to shift word to align keys
lw = lw - shift -- 'left width'
local rw = cols - 20 - 3 - lw - string.len(words[pos]) -- right width
local data = assert(map:lookup(pos)[1], "no map info for position")
-- data == source of this word
local function range(lo, hi)
-- return words in the range lo..hi, but only in the current section
if lo < data.lo then lo = data.lo end
if hi > data.hi then hi = data.hi end
local t = { }
for i = lo, hi-1 do table.insert(t, words[i]) end
return table.concat(t, ' ')
end
-- grab words on left and right,
-- then format and print as many as we have room for
local left = range(pos-width, pos)
local right = range(pos+1, pos+1+width)
local fmt = string.format('[%%-18.18s] %%%d.%ds %%s %%-%d.%ds\n',
lw, lw, rw, rw)
printf(fmt, data.title, string.sub(left, -lw), word, right)
end
I use a game engine called Love2D which uses Lua for writing games. All the system calls and heavy-lifting is done in a C program which reads a Lua script.
Writing a game in C or C++, you find yourself trying to work with the subtleties of the system rather than just implementing your ideas.
Lua allows for "clean" dirty-style coding.
Here's an example of a game object written in pure lua:
local GameObj = {} -- {} is an empty table
GameObj.position = {x=0,y=0}
GameObj.components = {}
function GameObject:update()
for i,v in ipairs(self.components) do -- For each component...
v:update(self) -- call the update method
end
end
To instantiate:
myObj = setmetatable({},{__index=GameObj})
-- tables can have a meta table which define certain behaviours
-- __index defines a table that is referred to when the table
-- itself doesn't have the requested index
Let's define a component, how about keyboard control?
Assuming we have an object that does input for us (that would be supplied C-side)
KeyBoardControl = {}
function KeyBoardControl:update(caller)
-- assuming "Input", an object that has a isKeyDown function that returns
-- a boolean
if Input.isKeyDown("left") then
caller.position.x = caller.position.x-1
end
if Input.isKeyDown("right") then
caller.position.x = caller.position.x+1
end
if Input.isKeyDown("up") then
caller.position.y = caller.position.y-1
end
if Input.isKeyDown("down") then
caller.position.y = caller.position.y+1
end
end
--Instantiate a new KeyboardControl and add it to our components
table.insert(myObj.components,setmetatable({},{__index=KeyboardControl})
Now when we call myObj:update() it will check inputs and move it
Let's say we'll be using plenty of this kind of GameObj with a KeyboardControl, we can instantiate a prototype KeyObj and use THAT like an inherited object:
KeyObj = setmetatable( {}, {__index = GameObj} )
table.insert(KeyObj.components,setmetatable( {}, {__index = KeyboardControl} )
myKeyObjs = {}
for i=1,10 do
myKeyObjs[i] = setmetatable( {}, {__index = KeyObj} )
end
Now we have a table of KeyObj that we can play with.
Here we can see how Lua provides us with a powerful, easy to extend, flexible object system which allows us to structure our program in accordance with the problem we're trying to solve, rather than having to bend the problem to fit into our language.
Also, Lua has some nice other features like functions as first-class types, allowing for lambda programming, anonymous functions, and other stuff that usually has comp-sci teachers smiling creepily.
Related
I'm working with (https://github.com/ivmai/cudd) with the goal of the following repetitive process:
(1) Input: (Coherent, non-decreasing) Boolean function expression
top = a_1a_2a_3...+ x_1x_2x_3... + z_1z_2z_3...). The Booleans I'm working with have
thousands of vars (ai...zj) and hundreds of terms.
(2) Processing: Convert Boolean to a BDD to simplify the calculation of the minterms, or mutually
exclusive cut-sets (as we call them in the Reliability world).
(3) Output: Take the set of m.e. minimal cutsets (minterms). Calculate the top event probability by
adding up all the minterms found in (2).
I've found a way to do this with a labor-intensive manual C interface to build the Boolean. I've also found how to do it with the excellent tulip-dd Py interface, but unable to make it scale as with cudd.
Now I'm hoping with the C++ interface to cudd I can get the best of both worlds (Am I asking for too much?) Namely, the convenience of say tulip-dd with the scalability of cudd. So here's some sample code. Where I'm failing is in step 3, printing out the minterms, which I used to be able to do in C. How do I do it with the C++ interface?! Please see comments in the code for my specific thoughts and attempts.
int main()
{
/*DdManager* gbm; /* Global BDD manager. I suppose we do not use this if we use the Cudd type below.*/
/* (1-2) Declare the vars and build the Boolean. Convert Boolean to BDD */
Cudd mgr(0, 0);
BDD a = mgr.bddVar();
BDD b = mgr.bddVar();
BDD c = mgr.bddVar();
BDD d = mgr.bddVar();
BDD e = mgr.bddVar();
BDD top = a*(b + c + d*e);
/* How to print out the equivalent to below, which prints out all minterms and their relevant vars in C.
But the mgr below has to be a *DManager ? If so, how to convert? */
Cudd_PrintDebug(mgr, BDD, 2, 4);
return 0
}
Thanks,
Gui
The CUDD C++ classes are very little more than a wrapper around the "DdManager*" and "DdNode*" data types. They make sure that you don't accidentally forget to Cudd_Ref(..) or Cudd_RecursiveDeref(...) *DD nodes that you are using.
As such, these classes have functions that you can use to access the underlying data types. So for instance, if you want to call the "Cudd_PrintDebug" function on the "top" BDD, then you can do that with:
Cudd_PrintDebug(mgr.getManager(), top.getNode(), 2, 4);
The modification to your code was minimal.
Note that when using a plain CUDD DdNode* that you obtain with the "getNode" function, you have to make sure manually that you don't introduce node count leaks. If you use the DdNodes in a "read only fashion", only store DdNode* that correspond to BDD objects that you also store, and make sure that the BDD objects always live longer than the DdNode* pointers, this does not happen, though.
I'm only mentioning this since at some point you may want to iterate through the cubes of a BDD. These are essentially not-guaranteed-to-be-minimal minterms. There are special iterators in CUDD for this. However, if you really want the minterms, this may not be right approach. There is other software using CUDD that comes with its own functions for enumerating the minterms.
As a final note (outside of the scope of StackOverflow), you wrote that "The Booleans I'm working with have thousands of vars (ai...zj) and hundreds of terms." - There is no guarantee that using BDDs with so many variables is the way to go here. But please try it out. Having thousands of variables is often problematic for BDD-based approaches. Your application may or may not be an exception to this observation. An alternative approach may be to encode the search problem for all minterms of your original expression as an incremental satisfiability (SAT) solving problem.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to write a simple interpreted programming language in C++. I've read that a lot of people use tools such Lex/Flex Bison to avoid "reinventing the wheel", but since my goal is to understand how these little beasts work improving my knowledge, i've decided to write the Lexer and the Parser from scratch. At the moment i'm working on the parser (the lexer is complete) and i was asking myself what should be its output. A tree? A linear vector of statements with a "depth" or "shift" parameter? How should i manage loops and if statements? Should i replace them with invisible goto statements?
A parser should almost always output an AST. An AST is simply, in the broadest sense, a tree representation of the syntactical structure of the program. A Function becomes an AST node containing the AST of the function body. An if becomes an AST node containing the AST of the condition and the body. A use of an operator becomes an AST node containing the AST of each operand. Integer literals, variable names, and so on become leaf AST nodes. Operator precedence and such is implicit in the relationship of the nodes: Both 1 * 2 + 3 and (1 * 2) + 3 are represented as Add(Mul(Int(1), Int(2)), Int(3)).
Many details of what's in the AST depend on your language (obviously) and what you want to do with the tree. If you want to analyze and transform the program (i.e. split out altered source code at the end), you might preserve comments. If you want detailed error messages, you might add source locations (as in, this integer literal was on line 5 column 12).
A compiler will proceed to turn the AST into a different format (e.g. a linear IR with gotos, or data flow graphs). Going through the AST is still a good idea, because a well-designed AST has a good balance of being syntax-oriented but only storing what's important for understanding the program. The parser can focus on parsing while the later transformations are protected from irrelevant details such as the amount of white space and operator precedence. Note that such a "compiler" might also output bytecode that's later interpreted (the reference implementation of Python does this).
A relatively pure interpreter might instead interpret the AST. Much has been written about this; it is about the easiest way to execute the parser's output. This strategy benefits from the AST in much the same way as a compiler; in particular most interpretation is simply top-down traversal of the AST.
The formal and most properly correct answer is going to be that you should return an Abstract Syntax Tree. But that is simultaneously the tip of an iceberg and no answer at all.
An AST is simply a structure of nodes describing the parse; a visualization of the paths your parse took thru the token/state machine.
Each node represents a path or description. For example, you would have nodes which represents language statements, nodes which represent compiler directives and nodes which represent data.
Consider a node which describes a variable, and lets say your language supports variables of int and string and the notion of "const". You may well choose to make the type a direct property of the Variable node struct/class, but typically in an AST you make properties - like constness - a "mutator", which is itself some form of node linked to the Variable node.
You could implement the C++ concept of "scope" by having locally-scoped variables as mutations of a BlockStatement node; the constraints of a "Loop" node (for, do, while, etc) as mutators.
When you closely tie your parser/tokenizer to your language implementation, it can become a nightmare making even small changes.
While this is true, if you actually want to understand how these things work, it is worth going through at least one first implementation where you begin to implement your runtime system (vm, interpreter, etc) and have your parser target it directly. (The alternative is, e.g., to buy a copy of the "Dragon Book" and read how it's supposed to be done, but it sounds like you are actually wanting to have the full understanding that comes from having worked thru the problem yourself).
The trouble with being told to return an AST is that an AST actually needs a form of parsing.
struct Node
{
enum class Type {
Variable,
Condition,
Statement,
Mutator,
};
Node* m_parent;
Node* m_next;
Node* m_child;
Type m_type;
string m_file;
size_t m_lineNo;
};
struct VariableMutatorNode : public Node
{
enum class Mutation {
Const
};
Mutation m_mutation;
// ...
};
struct VariableNode
{
VariableMutatorNode* m_mutators;
// ...
};
Node* ast; // Top level node in the AST.
This sort of AST is probably OK for a compiler that is independent of its runtime, but you'd need to tighten it up a lot for a complex, performance sensitive language down the (at which point there is less 'A' in 'AST').
The way you walk this tree is to start with the first node of 'ast' and act acording to it. If you're writing in C++, you can do this by attaching behaviors to each node type. But again, that's not so "abstract", is it?
Alternatively, you have to write something which works its way thru the tree.
switch (node->m_type) {
case Node::Type::Variable:
declareVariable(node);
break;
case Node::Type::Condition:
evaluate(node);
break;
case Node::Type::Statement:
execute(node);
break;
}
And as you write this, you'll find yourself thinking "wait, why didn't the parser do this for me?" because processing an AST often feels a lot like you did a crap job of implementing the AST :)
There are times when you can skip the AST and go straight to some form of final representation, and (rare) times when that is desirable; then there are times when you could go straight to some form of final representation but now you have to change the language and that decision will cost you a lot of reimplementation and headaches.
This is also generally the meat of building your compiler - the lexer and parser are generally the lesser parts of such an under taking. Working with the abstract/post-parse representation is a much more significant part of the work.
That's why people often go straight to flex/bison or antlr or some such.
And if that's what you want to do, looking at .NET or LLVM/Clang can be a good option, but you can also fairly easily bootstrap yourself with something like this: http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/4/
Best of luck :)
I would build a tree of statements. After that, yes the goto statements are how the majority of it works (jumps and calls). Are you translating to a low level like assembly?
The output of the parser should be an abstract syntax tree, unless you know enough about writing compilers to directly produce byte-code, if that's your target language. It can be done in one pass but you need to know what you're doing. The AST expresses loops and ifs directly: you're not concerned with translating them yet. That comes under code generation.
People don't use lex/yacc to avoid re-inventing the wheel, the use it to build a more robust compiler prototype more quickly, with less effort, and to focus on the language, and avoid getting bogged down in other details. From personal experience with several VM projects, compilers and assemblers, I suggest if you want to learn how to build a language, do just that -- focus on building a language (first).
Don't get distracted with:
Writing your own VM or runtime
Writing your own parser generator
Writing your own intermediate language or assembler
You can do these later.
This is a common thing I see when a bright young computer scientist first catches the "language fever" (and its good thing to catch), but you need to be careful and focus your energy on the one thing you want to do well, and make use of other robust, mature technologies like parser generators, lexers, and runtime platforms. You can always circle back later, when you have slain the compiler dragon first.
Just spend your energy learning how a LALR grammar works, write your language grammar in Bison or Yacc++ if you can still find it, don't get distracted by people who say you should be using ANTLR or whatever else, that isn't the goal early on. Early on, you need to focus on crafting your language, removing ambiguities, creating a proper AST (maybe the most important skillset), semantic checking, symbol resolution, type resolution, type inference, implicit casting, tree rewriting, and of course, end program generation. There is enough to be done making a proper language that you don't need to be learning multiple other areas of research that some people spend their whole careers mastering.
I recommend you target an existing runtime like the CLR (.NET). It is one of the best runtimes for crafting a hobby language. Get your project off the ground using a textual output to IL, and assemble with ilasm. ilasm is relatively easy to debug, assuming you put some time into learning it. Once you get a prototype going, you can then start thinking about other things like an alternate output to your own interpreter, in case you have language features that are too dynamic for the CLR (then look at the DLR). The main point here is that CLR provides a good intermediate representation to output to. Don't listen to anyone that tells you you should be directly outputting bytecode. Text is king for learning in the early stages and allows you to plug and play with different languages / tools. A good book is by the author John Gough, titled Compiling for the .NET Common Language Runtime (CLR) and he takes you through the implementation of the Gardens Point Pascal Compiler, but it isn't a book about Pascal, it is a book about how to build a real compiler on the CLR. It will answer many of your questions on implementing loops and other high level constructs.
Related to this, a great tool for learning is to use Visual Studio and ildasm (the disassembler) and .NET Reflector. All available for free. You can write small code samples, compile them, then disassemble them to see how they map to a stack based IL.
If you aren't interested in the CLR for whatever reason, there are other options out there. You will probably run across llvm, Mono, NekoVM, and Parrot (all good things to learn) in your searches. I was an original Parrot VM / Perl 6 developer, and wrote the Perl Intermediate Representation language and imcc compiler (which is quite a terrible piece of code I might add) and the first prototype Perl 6 compiler. I suggest you stay away from Parrot and stick with something easier like .NET CLR, you'll get much further. If, however, you want to build a real dynamic language, and want to use Parrot for its continuations and other dynamic features, see the O'Reilly Books Perl and Parrot Essentials (there are several editions), the chapters on PIR/IMCC are about my stuff, and are useful. If your language isn't dynamic, then stay far away from Parrot.
If you are bent on writing your own VM, let me suggest you prototype the VM in Perl, Python or Ruby. I have done this a couple of times with success. It allows you to avoid too much implementation early, until your language starts to mature. Perl+Regex are easy to tweak. An intermediate language assembler in Perl or Python takes a few days to write. Later, you can rewrite the 2nd version in C++ if you still feel like it.
All this I can sum up with: avoid premature optimizations, and avoid trying to do everything at once.
First you need to get a good book. So I refer you to the book by John Gough in my other answer, but emphasize, focus on learning to implement an AST for a single, existing platform first. It will help you learn about AST implementation.
How to implement a loop?
Your language parser should return a tree node during the reduce step for the WHILE statement. You might name your AST class WhileStatement, and the WhileStatement has, as members, ConditionExpression and BlockStatement and several labels (also inheritable but I added inline for clarity).
Grammar pseudocode below, shows the how the reduce creates a new object of WhileStatement from a typical shift-reduce parser reduction.
How does a shift-reduce parser work?
WHILE ( ConditionExpression )
BlockStatement
{
$$ = new WhileStatement($3, $5);
statementList.Add($$); // this is your statement list (AST nodes), not the parse stack
}
;
As your parser sees "WHILE", it shifts the token on the stack. And so forth.
parseStack.push(WHILE);
parseStack.push('(');
parseStack.push(ConditionalExpression);
parseStack.push(')');
parseStack.push(BlockStatement);
The instance of WhileStatement is a node in a linear statement list. So behind the scenes, the "$$ =" represents a parse reduce (though if you want to be pedantic, $$ = ... is user-code, and the parser is doing its own reductions implicitly, regardless). The reduce can be thought of as popping off the tokens on the right side of the production, and replacing with the single token on the left side, reducing the stack:
// shift-reduce
parseStack.pop_n(5); // pop off the top 5 tokens ($1 = WHILE, $2 = (, $3 = ConditionExpression, etc.)
parseStack.push(currToken); // replace with the current $$ token
You still need to add your own code to add statements to a linked list, with something like "statements.add(whileStatement)" so you can traverse this later. The parser has no such data structure, and its stacks are only transient.
During parse, synthesize a WhileStatement instance with its appropriate members.
In latter phase, implement the visitor pattern to visit each statement and resolve symbols and generate code. So a while loop might be implemented with the following AST C++ class:
class WhileStatement : public CompoundStatement {
public:
ConditionExpression * condExpression; // this is the conditional check
Label * startLabel; // Label can simply be a Symbol
Label * redoLabel; // Label can simply be a Symbol
Label * endLabel; // Label can simply be a Symbol
BlockStatement * loopStatement; // this is the loop code
bool ResolveSymbolsAndTypes();
bool SemanticCheck();
bool Emit(); // emit code
}
Your code generator needs to have a function that generates sequential labels for your assembler. A simple implementation is a function to return a string with a static int that increments, and returns LBL1, LBL2, LBL3, etc. Your labels can be symbols, or you might get fancy with a Label class, and use a constructor for new Labels:
class Label : public Symbol {
public Label() {
this.name = newLabel(); // incrementing LBL1, LBL2, LBL3
}
}
A loop is implemented by generating the code for condExpression, then the redoLabel, then the blockStatement, and at the end of blockStatement, then goto to redoLabel.
A sample from one of my compilers to generate code for the CLR.
// Generate code for .NET CLR for While statement
//
void WhileStatement::clr_emit(AST *ctx)
{
redoLabel = compiler->mkLabelSym();
startLabel = compiler->mkLabelSym();
endLabel = compiler->mkLabelSym();
// Emit the redo label which is the beginning of each loop
compiler->out("%s:\n", redoLabel->getName());
if(condExpr) {
condExpr->clr_emit_handle();
condExpr->clr_emit_fetch(this, t_bool);
// Test the condition, if false, branch to endLabel, else fall through
compiler->out("brfalse %s\n", endLabel->getName());
}
// The body of the loop
compiler->out("%s:\n", startLabel->getName()); // start label only for clarity
loopStmt->clr_emit(this); // generate code for the block
// End label, jump out of loop
compiler->out("br %s\n", redoLabel->getName()); // goto redoLabel
compiler->out("%s:\n", endLabel->getName()); // endLabel for goto out of loop
}
As the topic indicates, my program needs to read several function expressions and plug-in different variables many times. Parsing the whole expression again every time I need to plug-in a new value is definitely way too ugly, so I need a way to store parsed expression.
The expression may look like 2x + sin(tan(5x)) + x^2. Oh, and the very important point -- I'm using C++.
Currently I have three ideas on it, but all not very elegant:
Storing the S-expression as a tree; evaluate it by recurring. It may
be the old-school way to handle this, but it's ugly, and I would
have to handle with different number of parameters (like + vs. sin).
Composing anonymous functions with boost::lambda. It may work nice,
but personally I don't like boost.
Writing a small python/lisp script, use its native lambda
expression and call it with IPC... Well, this is crazy.
So, any ideas?
UPDATE:
I did not try to implement support for parenthesis and functions with only one parameter, like sin().
I tried the second way first; but I did not use boost::lambda, but a feature of gcc which could be used to create (fake) anonymous functions I found from here. The resulting code has 340 lines, and not working correctly because of scoping and a subtle issue with stack.
Using lambda could not make it better; and I don't know if it could handle with scoping correctly. So sorry for not testing boost::lambda.
Storing the parsed string as S-expressions would definitely work, but the implementation would be even longer -- maybe ~500 lines? My project is not that kind of gigantic projects with tens of thousands lines of code, so devoting so much energy on maintaining that kind of twisted code which would not be used very often seems not a nice idea.
So finally I tried the third method -- it's awesome! The Python script has only 50 lines, pretty neat and easy to read. But, on the other hand, it would also make python a prerequisite of my program. It's not that bad on *nix machines, but on windows... I guess it would be very painful for the non-programmers to install Python. So is lisp.
However, my final solution is opening bc as a subprocess. Maybe it's a bad choice for most situations, however, it fits me well.
On the other hand, for projects work only under *nix or already have python as a prerequisite, personally I recommend the third way if the expression is simple enough to be parsed with hand-written parser. If it's very complicated, like Hurkyl said, you could consider creating a mini-language.
Why not use a scripting language designed for exactly this kind of purpose? There are several such languages floating around, but my experience is with lua.
I use lua to do this kind of thing "all the time". The code to embed and parse an expression like that is very small. It would look something like this (untested):
std::string my_expression = "2*x + math.sin( math.tan( x ) ) + x * x";
//Initialise lua and load the basic math library.
lua_State * L = lua_open();
lua_openmath(L);
//Create your function and load it into lua
std::string fn = "function myfunction(x) return "+my_expression+"end";
luaL_dostring( L, fn.c_str(), fn.size() );
//Use your function
for(int i=0; i<10; ++i)
{
// add the function to the stack
lua_getfield(L, LUA_GLOBALSINDEX, "myfunction");
// add the argument to the stack
lua_pushnumber(L, i);
// Make the call, using one argument and expecting one result.
// stack looks like this : FN ARG
lua_pcall(L,1,1)
// stack looks like this now : RESULT
// so get the result and print it
double result = lua_getnumber(L,-1);
std::cout<<i<<" : "<<result<<std::endl;
// The result is still on the stack, so clean it up.
lua_pop(L,1);
}
I have an app in C++ which actually processes a binary file. The binary file is a collection of events say A/B/C, and on detecting event A in the file, the app handles the event in "handler A".
Now i need to write another script in a custom language, which gets executed orthogonally to the binary file processing. The script can have something like,
define proc onA
{
c= QueryVariable(cat)
print ( c )
}
So when the app handles the event "A" from the binary file, the app has to parse this script file, check for OnA and convert the statements in OnA proc to routines supported by the app. For eg, QueryVariable should copy the value of variable "cat" defined in the app to the variable "C". The app should also check for syntax/semantics of the language in script. Where can i get the best info for deciding on the design? My knowledge on parse trees/grammar has really weakened.
Thanks
An easy way to build an interpreter:
Define a parser for the language from its syntax
Build an abstract syntax tree AST
Apply a visitor function is traverse the AST in preorder and "execute" actions suggested by the AST nodes.
Some AST nodes will be "definitional", e.g., will declare the existence of some named entity such as your "define proc onA " phrase above. Typically the action is to associate the named entity with the content, e.g., form a triplet <onA,proc,<body>> and store this away in a symbol table indexed by the first entry. This makes finding such definitions easier.
Later, when your event process encounters an A event, your application knows to look up "onA" in this symbol table. When found, the AST is traversed by the visitor function to execute its content. You'll usually need a value stack to record intermediate expression values, with AST leaves representing operands (variables, constants) pushing values onto that stack, and operators (+, -, <=) popping values off and computing new results to push. Assignment operations take the top stack value and put into the symbol table associated with the identifier name. Control operators (if, do) take values off the top of the stack and use them to guide what part off the program (e.g., what subtree) to execute next.
All of this is well known and can be found in most books on compilers and interpreters. Peter Brown's book on this is particularly nice even though it seems relatively old:
Writing Interactive Interpreters and Compilers.
There must be some interpreter or compiler for the scripting language. Check if it supports embedding in C or C++. Most script languages do.
Next choice, or perhaps first, would be to just run the script externally, using the existing compiler/interpreter.
I can't think of any reason why one of the first two options won't do, but if not, consider building an interpreter using ANTLR or for a small language Boost Spirit. Disclaimer: I haven't used the first, and I've only tried out Boost Spirit for a small toy example.
Cheers & hth.,
PS: If you can choose the script language, consider JavaScript and just use Google's reportedly excellent embedding API.
I am pretty new to scipting languages (Perl in particular), and most of the code I write is an unconscious effort to convert C code to Perl.
Reading about Perl, one of the things that is often mentioned as the biggest difference is that Perl is a dynamic language. So, it can do stuff at runtime that the other languages (static ones) can only do at compiletime, and so be better at it because it can have access to realtime information.
All that is okay, but what specific features should I, with some experience in C and C++, keep in mind while writing code in Perl to use all the dynamic programming features that it has, to produce some awesome code?
This question is more than enough to fill a book. In fact, that's precisely what happened!
Mark Jason Dominus' excellent Higher-Order Perl is available online for free.
Here is a quote from it's preface that really grabbed me by the throat when I first read the book:
Around 1993 I started reading books
about Lisp, and I discovered something
important: Perl is much more like Lisp
than it is like C. If you pick up a
good book about Lisp, there will be a
section that describes Lisp’s good
features. For example, the book
Paradigms of Artificial Intelligence
Programming, by Peter Norvig, includes
a section titled What Makes Lisp
Different? that describes seven
features of Lisp. Perl shares six of
these features; C shares none of them.
These are big, important features,
features like first-class functions,
dynamic access to the symbol table,
and automatic storage management.
A list of C habits not to carry over into Perl 5:
Don't declare your variables at the top of the program/function. Declare them as they are needed.
Don't assign empty lists to arrays and hashes when declaring them (they are empty already and need to be initialized).
Don't use if (!(complex logical statement)) {}, that is what unless is for.
Don't use goto to break deeply nested loops, next, last, and redo all take a loop label as an argument.
Don't use global variables (this is a general rule even for C, but I have found a lot of C people like to use global variables).
Don't create a function where a closure will do (callbacks in particular). See perldoc perlsub and perldoc perlref for more information.
Don't use in/out returns, return multiple values instead.
Things to do in Perl 5:
Always use the strict and warnings pragmas.
Read the documentation (perldoc perl and perldoc -f function_name).
Use hashes the way you used structs in C.
Use the features that solve your problem with the best combination of maintainability, developer time, testability, and flexibility. Talking about any technique, style, or library outside of the context of a particular application isn't very useful.
Your goal shouldn't be to find problems for your solutions. Learn a bit more Perl than you plan on using immediately (and keep learning). One day you'll come across a problem and think "I remember something that might help with this".
You might want to see some of these book, however:
Higher-Order Perl
Mastering Perl
Effective Perl Programming
I recommend that you slowly and gradually introduce new concepts into your coding. Perl is designed so that you don't have to know a lot to get started, but you can improve your code as you learn more. Trying to grasp lots of new features all at once usually gets you in trouble in other ways.
I think the biggest hurdle will not be the dynamic aspect but the 'batteries included' aspect.
I think the most powerful aspects of perl are
hashes : they allow you to easily express very effective datastructures
regular expressions : they're really well integrated.
the use of the default variables like $_
the libraries and the CPAN for whatever is not installed standard
Something I noticed with C converts is the over use of for loops. Many can be removed using grep and map
Another motto of perl is "there is more than one way to do it". In order to climb the learning curve you have to tell yourself often : "There has got to be a better way of doing this, I cannot be the first one wanting to do ...". Then you can typically turn to google and the CPAN with its riduculous number of libraries.
The learning curve of perl is not steep, but it is very long... take your time and enjoy the ride.
Two points.
First, In general, I think you should be asking yourself 2 slightly different questions:
1) Which dynamic programming features of Perl can be used in which situations/to solve which problems?
2) What are the trade-offs, pitfalls and downsides of each feature.
Then the answer to your question becomes extremely obvious: you should be using the features that solve your problem better (performance or code maintainability wise) than a comparable non-DP solution, and which incurs less than the maximum accaptable level of downsides.
As an example, to quote from FM's comment, string form of eval has some fairly nasty downsides; but it MIGHT in certain cases be an extremely elegant solution which is orders of magnitude better than any alternate DP or SP approach.
Second, please be aware that a lot of "dynamic programming" features of Perl are actually packaged for you into extremely useful modules that you might not even recognize as being of the DP nature.
I'll have to think of a set of good examples, but one that immediately springs to mind is Text template modules, many of which are implemented using the above-mentioned string form of eval; or Try::Tiny exception mechanism which uses block form of eval.
Another example is aspect programming which can be achieved via Moose (I can't find the relevant StackOverflow link now - if someone has it please edit in the link) - which underneath uses access to symbol table featrue of DP.
Most of the other comments are complete here and I won't repeat them. I will focus on my personal bias about excessive or not enough use of language idioms in the language you are writing code in. As a quip goes, it is possible to write C in any language. It is also possible to write unreadable code in any language.
I was trained in C and C++ in college and picked up Perl later. Perl is fabulous for quick solutions and some really long life solutions. I built a company on Perl and Oracle solving logistics solutions for the DoD with about 100 active programmers. I also have some experience in managing the habits of other Perl programmers new and old. (I was the founder / ceo and not in technical management directly however...)
I can only comment on my transition to a Perl programmer and what I saw at my company. Many of our engineers shared my background of primarily being C / C++ programers by training and Perl programmers by choice.
The first issue I have seen (and had myself) is writing code that is so idiomatic that it is unreadable, unmaintainable, and unusable after a short period of time. Perl, and C++ share the ability to write terse code that is entertaining to understand at the moment but you will forget, not be around, and others won't get it.
We hired (and fired) many programmers over the 5 years I had the company. A common Interview Question was the following: Write a short Perl program that will print all the odd numbers between 1 and 50 inclusive separated by a space between each number and terminated with a CR. Do not use comments. They could do this on their own time of a few minutes and could do it on a computer to proof the output.
After they wrote the script and explained it, we would then ask them to modify it to print only the evens, (in front of the interviewer), then have a pattern of results based on every single digit even, every odd, except every seventh and 11th as an example. Another potential mod would be every even in this range, odd in that range, and no primes, etc. The purpose was to see if their original small script withstood being modified, debugged, and discussed by others and whether they thought in advance that the spec may change.
While the test did not say 'in a single line' many took the challenge to make it a single terse line and with the cost of readability. Others made a full module that just took too long given the simple spec. Our company needed to delver solid code very quickly; so that is why we used Perl. We needed programmers that thought the same way.
The following submitted code snippets all do exactly the same thing:
1) Too C like, but very easy to modify. Because of the C style 3 argument for loop it takes more bug prone modifications to get alternate cycles. Easy to debug, and a common submission. Any programmer in almost any language would understand this. Nothing particularly wrong with this, but not killer:
for($i=1; $i<=50; $i+=2) {
printf("%d ", $i);
}
print "\n";
2) Very Perl like, easy to get evens, easy (with a subroutine) to get other cycles or patterns, easy to understand:
print join(' ',(grep { $_ % 2 } (1..50))), "\n"; #original
print join(' ',(grep { !($_ % 2) } (1..50))), "\n"; #even
print join(' ',(grep { suba($_) } (1..50))), "\n"; #other pattern
3) Too idiomatic, getting a little weird, why does it get spaces between the results? Interviewee made mistake in getting evens. Harder to debug or read:
print "#{[grep{$_%2}(1..50)]}\n"; #original
print "#{[grep{$_%2+1}(1..50)]}\n"; #even - WRONG!!!
print "#{[grep{~$_%2}(1..50)]}\n"; #second try for even
4) Clever! But also too idiomatic. Have to think about what happens to the annon hash created from a range operator list and why that creates odds and evens. Impossible to modify to another pattern:
print "$_ " for (sort {$a<=>$b} keys %{{1..50}}), "\n"; #orig
print "$_ " for (sort {$a<=>$b} keys %{{2..50}}), "\n"; #even
print "$_ " for (sort {$a<=>$b} values %{{1..50}}), "\n"; #even alt
5) Kinda C like again but a solid framework. Easy to modify beyond even/odd. Very readable:
for (1..50) {
print "$_ " if ($_%2);
} #odd
print "\n";
for (1..50) {
print "$_ " unless ($_%2);
} #even
print "\n";
6) Perhaps my favorite answer. Very Perl like yet readable (to me anyway) and step-wise in formation and right to left in flow. The list is on the right and can be changed, the processing is immediately to the left, formatting again to the left, final operation of 'print' on the far left.
print map { "$_ " } grep { $_ & 1 } 1..50; #original
print "\n";
print map { "$_ " } grep { !($_ & 1) } 1..50; #even
print "\n";
print map { "$_ " } grep { suba($_) } 1..50; #other
print "\n";
7) This is my least favorite credible answer. Neither C nor Perl, impossible to modify without gutting the loop, mostly showing the applicant knew Perl array syntax. He wanted to have a case statement really badly...
for (1..50) {
if ($_ & 1) {
$odd[++$#odd]="$_ ";
next;
} else {
push #even, "$_ ";
}
}
print #odd, "\n";
print #even;
Interviewees with answers 5, 6, 2 and 1 got jobs and did well. Answers 7,3,4 did not get hired.
Your question was about using dynamic constructs like eval or others that you cannot do in a purely compiled language such as C. This last example is "dynamic" with the eval in the regex but truly poor style:
$t='D ' x 25;
$i=-1;
$t=~s/D/$i+=2/eg;
print "$t\n"; # don't let the door hit you on the way out...
Many will tell you "don't write C in Perl." I think this is only partially true. The error and mistake is to rigidly write new Perl code in C style even when there are so many more expressive forms in Perl. Use those. And yes, don't write NEW Perl code in C style because C syntax and idiom is all you know. (bad dog -- no biscuit)
Don't write dynamic code in Perl just because you can. There are certain algorithms that you will run across that you will say 'I don't quite know how I would write THAT in C' and many of these use eval. You can write a Perl regex to parse many things (XML, HTML, etc) using recursion or eval in the regex, but you should not do that. Use a parser just like you would in C. There are certain algorithms though that eval is a gift. Larry Wall's file name fixer rename would take a lot more C code to replicate, no? There are many other examples.
Don't rigidly avoid C stye either. The C 3 argument form of a for loop may be the perfect fit to certain algorithms. Also, remember why you are using Perl: assumably for high programmer productivity. If I have a completely debugged piece of C code that does exactly what I want and I need that in Perl, I just rewrite the silly thing C style in Perl! That is one of the strengths of the language (but also its weakness for larger or team projects where individual coding styles may vary and make the overall code difficult to follow.)
By far the killer verbal response to this interview question (from the applicant who wrote answer 6) was: This single line of code fits the spec and can easily be modified. However, there are many other ways to write this. The right way depends on the style of the surrounding code, how it will be called, performance considerations, and if the output format may change. Dude! When can you start?? (He ended up in management BTW.)
I think that attitude also applies to your question.
At least IME, the "dynamic" nature isn't really that big of a deal. I think the biggest difference you need to take into account is that in C or C++, you're mostly accustomed to there being only a fairly minor advantage to using library code. What's in the library is already written and debugged, so it's convenient, but if push comes to shove you can generally do pretty much the same thing on your own. For efficiency, it's mostly a question of whether your ability to write something a bit more specialized outweighs the library author's ability to spend more time on polishing each routine. There's little enough difference, however, that unless a library routine really does what you want, you may be better off writing your own.
With Perl, that's no longer true. Much of what's in the (huge, compared to C) library is actually written in C. Attempting to write anything very similar at all on your own (unless you write a C module, of course) will almost inevitably come out quite a bit slower. As such, if you can find a library routine that does even sort of close to what you want, you're probably better off using it. Using pre-written library code is much more important than in C or C++.
Good programming practices arent specific to individual languages. They are valid across all languages. In the long run, you may find it best not to rely on tricks possible in dynamic languages (for example, functions that can return either integer or text values) as it makes the code harder to maintain and quickly understand. So ultimately, to answer your question, I dont think you should be looking for features specific to dynamicly typed languages unless you have some compelling reason that you need them. Keep things simple and easy to maintain - that will be far more valuable in the long run.
There are many things you can do only with dynamic language but the coolest one is eval. See here for more detail.
With eval, you can execute string as if it was a pre-written command. You can also access variable by name at runtime.
For example,
$Double = "print (\$Ord1 * 2);";
$Opd1 = 8;
eval $Double; # Prints 8*2 =>16.
$Opd1 = 7;
eval $Double; # Prints 7*2 =>14.
The variable $Double is a string but we can execute it as it is a regular statement. This cannot be done in C/C++.
The cool thing is that a string can be manipulated at run time; therefore, we can create a command at runtime.
# string concatenation of operand and operator is done before eval (calculate) and then print.
$Cmd = "print (eval (\"(\$Ord1 \".\$Opr.\" \$Ord2)\"));";
$Opr = "*";
$Ord1 = "5";
$Ord1 = "2";
eval $Cmd; # Prints 5*2 => 10.
$Ord1 = 3;
eval $Cmd; # Prints 5*3 => 15.
$Opr = "+";
eval $Cmd; # Prints 5+3 => 8.
eval is very powerful so (as in Spiderman) power comes with responsibility. Use it wisely.
Hope this helps.