I came across the following post on the boost mailing lists (emphasis mine):
hello all,
does anybody know of an existing spirit/lisp implimentation, and is there
any interest in developing such a project in open source?
None yet, AFAIK.
I'll be writing an example for Spirit2
to complement the tiny-C virtual
machine in there. What's equally
interesting though is that scheme (or
at least a subset of it) can be
implemented in pure c++. No parsing,
just pure DSEL in C++. Now, imagine a
parser that targets this DSEL (through
C++) -- a source to source translator.
Essentially, your scheme code will be
compiled into highly efficient C++.
Has anyone actually done this? I would be very interested in such a DSEL.
I wrote a Lisp-like language called Funky using Spirit in C++. An Open Source version is available at http://funky.vlinder.ca. It wouldn't take too much to turn that into a Lisp-like to C++ translator.
Actually, what it would take is a run-time support library to provide generic closure times and somesuch: if you want to turn the Lisp code into efficient C++, you will basically need C++ classes (functors, etc.) to do the heavy lifting once you get to run-time, so your Lisp-to-C++ translator would need to:
parse the Lisp
create an AST from the Lisp
transform the AST to optimize it, if possible (optimizations in Lisp are different from optimizations in C++, so if you want rally fast C++, you have to optimize the Lisp and let your C++ compiler optimize the generated C++)
generate the C++, for which you'd rely on your run-time support library for things like built-in functions, functor types, etc.
If you were to start from Funky, you'd already have the parse and the AST (though Funky doesn't optimize the AST), so you could go from there an create the run-time and generate the C++...
It wouldn't be overly complicated to write one from scratch either: Lisp grammar isn't that difficult, so most of the work would go into the AST and the run-time support.
If I weren't writing an object-oriented DSL right now, I might try my hand at this.
scheme to (readable) c++
http://www.suri.cs.okayama-u.ac.jp/servlets/APPLICATION.rkt
How about this
Not sure if this is what you want, but:
http://howtowriteaprogram.blogspot.com/2010/11/lisp-interpreter-in-90-lines-of-c.html
It looks like a start, at least.
Related
I am currently working on a Source-to-source compiler that transforms code wirtten in an OpenCL superset to "ordinary" OpenCL. I would really like to use clang as a library to parse and analyze the source code. Especially, I really need all the available type information and I would like to have an AST to make use of clang's Rewrite capabilities.
Fortunately, the OpenCL superset that needs to be parsed is really a "mixture" between OpenCL and C++, i.e. the code is basically OpenCL extended with some C++ stuff. In detail, there are possibly template annotations before a function definition and there may be structs containing methods (including operator definitions).
I was hoping that I can use clang to parse this language, since the clang parser is capable of parsing all these constructs. However, I am not sure how (if possible) to tell clang to parse OpenCL and C++ constructs at once. If possible, I really want to avoid touching the clang code base, but I would prefer using clang as a library instead. Maybe it is possible to setup an appropriate instance of clang's LangOptions class that tells clang to parse all these constructs?
Any ideas on how to make clang parse this mixture between OpenCL and C++? Any help is appreciated, and thanks in advance!
You're trying to mix two different front ends, involving both parsing and name resolution.
I think you are in for a rough trip. The key problem is you are trying to glue together things that had no effort expended, to make them gluable. This usually leads to integration hell. You don't see people doing this with Fortran and C++ for the same reasons.
To start with, you'll discover you will have to define the semantics of how the C++ extensions interact with those of OpenCL. If you check out the C++ standard, you'll discover 600 pages of results from committee arguments on how C++ interacts with itself. So unless you can define a radically simple interaction, you'll have a tough time knowing what your mixed OpenCL/C++ program means.
Your second problem will be interleaving the Clang parsing machinery for C++ (AFAIK hand written code) with the Clang parsing machinery for OpenCL (don't know anything about it, but assumed it follows the C++ style). There's no obviously good reason to believe you can just pick and choose these to interleave easily. It may work out fine; just not a bet I'd care to make.
The next place you are likely to have trouble is in building an AST for the joint language. Maybe it is the case that Clang has defined AST nodes for both C++ and OpenCL in a way that easily composes to a joint Clang/OpenCL tree. Since the node types are chosen by hand, and there was no specific reason to design them to work together, it is also not obvious they will compose nicely.
Your last task, given a "valid" OpenCL/C++ tree, to transform it to OpenCL. How in fact will you expand a C++ template (or any general C++ code) to OpenCL code?
[Check my bio for another system, DMS, that might be a bit better for this task; it provides uniform infrastructure for multiple languages that would make some of this easier. Somewhat similar to what you are trying to do, we have used DMS to mix C++ with F90 and APL concepts for easy expression of vector operations in a prototype Vector C++, but we did not try to preserve F90 and APL syntax and semantics exactly for all the above reasons].
It isn't my purpose to rain on your parade; progress is made by the unreasonable man. Just be sure you understand how big a task you are taking on.
I want to write a compiler for a custom markup language, I want to get optimum performance and I also want to have a good scalable design.
Multi-paradigm programming language (C++) is more suitable to implement modern design patterns, but I think that will degrade performance a little bit (think of RTTI for example) which more or less might make C a better choice.
I wonder what is the best language (C, C++ or even objective C) if someone wants to create a modern compiler (in the sense of complying to modern software engineering principles as a software) that is fast, efficient, and well designed.
The "expensive" features of C++ (e.g., exceptions, virtual functions, RTTI) simply don't exist in C. By the time you simulate them in C, you're likely to end up with something at least as expensive as it is in C++, but less known, less documented, etc. (let's face it: compiler writers aren't stupid -- while it's possible you can implement a feature "better" than them, it's not really particularly likely).
In the other direction, templates (for one example) often make it relatively easy to write code that is considerably faster than is practical in C. Just for one obvious example, C++ code using std::sort will often be two to three times as fast as equivalent C code using qsort.
Bottom line: the only reason for a C++ program to be slower than an equivalent written in C is if you've decided (for whatever reason) to write slower code. Common reasons are simplicity and readability -- and in most cases, those are more important than execution speed. Nonetheless, using C++ doesn't necessarily carry any speed penalty. It's completely up to you to decide whether to do something that might run more slowly.
C++ adheres to a "pay only for what you use" policy. You are not going to see performance hits due to the language choice; the performance of your application will be purely dependent upon your implementation.
Have you considered OCaml? Functional languages are well-suited for compiler writing. Pattern matching is an extremely useful construct, and the lack of side effects will make parallelization easy.
OCaml can be compiled to native code, and its performance is comparable to C and C++. Its standard library is somewhat lacking, but you don't really much else to write a compiler.
F# is a very similar language if you prefer a .NET environment.
People who write compilers in C as their basic language usually have the good sense to use tools for certain parts of it.
Specifically, go find out about lex and yacc (in their free implementations, flex and bison).
This advice almost certainly applies to any other language you choose, be it C++, Java or whatever.
I dont have any links but from what i hear and from experience C/C++ is a poor language to write a compiler with. First of all, do you really honestly need it to be scalable? Or scalable at this stage? Especially for a markup language? your not compiling 60+ mb of source so i dont think you actually need it to be scalable.
Anyways for my programming language i used bison for the parser (reading bison+flex is a must, try to avoid all conflicts my language has none). Then i use both C and C++ for the code. C because bison uses C and i just call a simple C function which creates and fill in a struct to create an abstract syntax tree. Then when its done it calls my C++ code that runs through the AST and generate the binary.
Standard ML is suppose to be really good with creating a language. If you dont use that a functional language is a good choice because it fits with the mindset (parsing may be left to right but your function calls wont be in that order). So i recommend that if you dont use bison (or know how to call it using C/C++ and bison).
Note: I tried writing a compiler twice. The first time in C without bison the 2nd time with bison. Theres no question that it would have taken me exponentially longer due to the fact that bison finds the conflicts for me and i am not doomed in debug land (i would probably in fact try to figure out a way to report conflicts before i write the code which is exactly what bison does)
Forget what programming language you use & also given that you have huge memory support in these modern computer era you could write good & fast programs using interpreted language and also very bad & slow running programs using C/C++ (compiled languages) & vice versa.
What is important is to use right data structures and algorithms & follow the style/patterns of the programming language you use to implement it. Remember that some one said "OO is not a panacea" & to the other extent some one else also said "show your data structures and I will code up the algorithm for the problem you are trying to solve".
The problems parsing C++ are well known. It can't be parsed purely based on syntax, it can't be done as LALR (whatever the term is, i'm not a language theorist), the language spec is a zillion pages, etc. For that and other reasons I'm deciding on an alternative language for my personal projects.
Vala looks like a good language. Although providing many improvements over C++, is just as troublesome to parse? Or does it have a neat, reasonable length formal grammar, or some logical description, suitable for building parsers for compilers, source analyzers and other tools?
Whatever the answer, does that go for the Genie alternative syntax?
(I also wonder albeit less intensely about D and other post-C++ non-VM languages.)
C++ is one of the most complex (if not the most complex) programming language to parse in common use. Of particular difficulty is it's name lookup rules and template instantiation rules. C++ is not parsable using a LALR(1) parser (such as the parsers generated by Bison and Yacc), but it is by all means parsable (after all, people use parsers which have no problem parsing C++ every day). (In fact, early versions of G++ were built on top of Bison's Generalized LR parser framework Actually not, see comments) before it was more recently replaced with a hand written recursive descent parser)
On the other hand, I'm not sure I see what "improvements" Vala offers over C++. The languages look to attempt to accomplish the same goals. On the other hand, you're probably not going to find much outside of GTK+ written with Vala interfaces. You're going to be using C interfaces to everything else, which really defeats the point of using such a language.
If you don't like C++ due to it's complexity, it might be a good idea to consider Objective-C instead, because it is a simple extension of C, (like Vala), but has a much larger community of programmers for you to draw upon given it's foundation for everything in Mac land.
Finally, I don't see why the difficulty of parsing the language itself has to do with what a programmer should be caring about in order to use the language. Just my 2 cents.
It's pretty simple. You can use libvala to do both parsing, semantic analyzing and code generation instead of writing your own.
I would like to create a domain specific language as an augmented-C++ language. I will need mostly two types of contructs:
Top-level constructs for specialized types or declarations
In-code constructs, i.e. to add primitives to make functions calls or idiom easier
The language will be used for scientific computing purposes, and will ultimately be translated into plain C++. C++ has been chosen as it seems to offer a good compromise between: ease of use, efficiency and availability of a wide range of libraries.
A previous attempt using flex and bison failed due to the complexity of the C++ syntax. The existing parser can still fail on some constructs. So we want to start over, but on better bases.
Do you know about similar projects? And if you attempted to do so, what tools would you use? What would be the main pitfalls? Would you have recommendations in term of syntax?
There are many (clever) attempts to have domain specific languages within the C++ language.
It's usually called DSEL for Domain Specific Embedded Language. For example, you could look up the Boost.Spirit syntax, or Boost.rdb (in the boost vault).
Those are fully compliant C++ libraries which make use of C++ syntax.
If you want to hide some complexity, you might add in a few macros.
I would be happy to provide some examples if you gave us something to work with :)
You can try extending an open source Elsa C++ parser (it is now a part of a Mozilla's Pork project):
https://wiki.mozilla.org/Pork
The way to extend C++ is not to try to extend the language, which will be extremely difficult and probably break as new base compiler releases implement new features, but to write class libraries to support your problem domain. This has been what C++ programming has been all about since the language's inception.
If you really want to extend C++, you'll need a full C++ parser plus name and type resolution. As you've found out, this is pretty hard. Your best solution is to get an existing one and modify it.
Our DMS Software Reengineering Toolkit is an infrastructure for implementing langauge processors. It is
designed to support the construction of tools that parse languages, carry out transformations, and spit out the same language (with enhanced code) or a different language/dialect.
DMS has a full C++ Front End, that parses C++, builds abstract syntax trees and symbol tables (e.g., all that name and type resolution stuff).
The DMS/C++ front end is provided with DMS in source form, so that it can be customized to achieve the kind of effect you want. You'd define your DSL as an extension of the C++ front end, and then write transformations that convert your special constructs into "vanilla" C++ constructs, and then spit out compilable result.
DMS/C++ have been used for a wide variety of transformation tasks, including ones that involved extending C++ as you've described, and including tasks that carry out massive reorganizations of large C++ applications. (See the Publications at that website).
To solve you first bullet, maybe you can use C++0x new features "initializer lists", and "user defined litterals" avoiding the need for a new parser. They may help for the second bullet, too.
A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.