The very first clojure compiler? - clojure

Clojure is written mainly in Clojure, but there had to be a "first" version of a clojure compiler that was written in something else, presumably Java.
Is the code of that compiler available anywhere?
My interest is purely academic, not productional, I'd like to see the way that Rich Hickey handled the chicken/egg problem.

The clojure compiler is written in java, not clojure. So the current version is the one that will satisfy your curiosity. Of course it's a reasonable point of view to say that macros are part of the compiler, and those are indeed written in clojure, but they are not relevant to the chicken/egg problem you mention, which is solved by having the compiler in Java.

Compiler bootstrapping is a common issue when you write your compiler in the same language as that which you are compiling.
In the case of Clojure, however, the compiler is written in Java, so no tricky games required.
For fun historcal reference, GHC, the Haskell compiler (written in Haskell), was originally compiled via Lazy ML.

Not sure if this relates to your interests, but Rich had originally worked on a language called DotLisp and for that he began with a study of JScheme, which he used as a basis for the original code and eventually replaced completely.
DotLisp is here: http://dotlisp.sourceforge.net/dotlisp.htm
JScheme is here: http://jscheme.sourceforge.net/jscheme/main.html
(Trivia: one of the authors of JScheme is Brandeis professor Tim Hickey, no known relation to Rich.)

Related

Where are `Malformed and `Uchar defined?

NB: This question is more about how to navigate through/decipher OCaml source code, than about the specifics of the items listed below.1
I've come across the expressions
`Malformed
`Uchar
in some OCaml source code, but I can't find definitions for them.
Are they somehow built-in, i.e. part of the standard language?
If not, where does one go when one wants to find out such definitions? (Actually, it's not clear to me whether the leading backtick is part of the name or is a separate operator.)
1I'm having a horrible time making sense of OCaml source code. Usually I have no problem picking up new programming languages, but with OCaml I can't find anything where I expect to find it, and when I do find something it comes along with 100x more undefined/differently-defined/outright bizarre names/concepts, so it's a losing battle.
It's a polymorphic variant, you can read about it in the OCaml manual here
For other basic language features you would not understand, well, you should first read the whole part 1 of the manual
Here is how the manual is organized:
Quick introduction through the language
The whole language (two parts: first the legacy language, then the features added through time)
The standards shell tools manpages (compilers, interprets, ocamlbuild etc.)
The standard libraries
I noted from your previous questions that you use non-standard tools, such as opam and several libraries, make sure to check their documentations and note that some can extend your syntax.
I have to admit that the doc generally assumes people reading it already know about the concepts presented but you don't need to use them at first to master the language.
Happy hacking !
It is a polymorhic variant, that do bot need to be defined before usage. It is something between lisp's symbols and common ADT's. You think of it as a syntactically lighter version version of common ADTs.
I would suggest you to use utop as a playing ground to learn OCaml. Indeed, it is a damn easy language, it looks like to me that you have atacked it from a wrong side, or working with a program that is obfuscated.

C++ From the Compiler's Point of View [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What are the stages of compilation of a C++ program?
I find that understanding how a given software language is compiled can be key to understanding best practices and getting the most out of that language. This seems to be doubly true with C++. Is there a good primer or document (for mortals) that describes C++ from the point of view of the compiler? (Obviously every compiler is a little different.)
I thought there may be something along those lines in the beginning of Stroustrup's book.
Personally I like this one. Not a compiler's eye view exactly, but it tells you what's going on "under the hood" of a C++ program.
Inside the C++ Object Model
It depends on what you want to obtain. I find that the Itaium ABI is a good document to understand some of the intricacies of the C++ object model. It will not deal with optimizations or the like, but I found it quite useful to understand how things like virtual inheritance can be implemented, or things that seemed much more simple as constructors and destructors (did you know that the compiler might generate up to 3 versions of each constructor that you provide? 2 destructors?)
Disclaimer: the document is quite dense, you will probably need to go over the sections more than once, at least I did. And you will need a good understanding of the semantics of the language to actually grasp why the solutions are so complex.
I've head Compilers: Principles, Techniques, and Tools is solid.
http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1?s=books&ie=UTF8&qid=1328013630&sr=1-1
I don't know of any such book, but if you wanted an understanding of how C++ is treated by the compiler the easiest way would be to write some code and get the compiler to spit out the annotated assembly listing and inspect that. This would give you an idea of how a particular compiler treats the code.
You could also get involved in a compiler project, perhaps something like the llvm's clang project?

Scripting language interpreter source code to learn from

I want to read, and learn from, the source code of a scripting language's interpreter/compiler. What scripting language interpreter/compiler has the simplest, cleanest, and easiest to read source code? I would prefer it to be written in C/C++ (what else are compilers written in anyway?) because I'm planning on writing a compiler in C.
Take a look at lua, you can go through the firsts versions of the programming language and see how it has evolved. It's written in C and has a clean and nice code. You can write a compiler in almost every programming language, but C has been the one that most programmers chose.
The CPython interrupter has been around for quite some time and I would imagine that it would be very useful to you.
AngelScript is a very good option for learning about compilers. This is a language with C/C++ familiar syntax, garbage collection, it is object-oriented with inheritance and polymorphism, cross-platform and compiles to byte-code.
My second choice would be Lua.
I would recommend, as a gentle introduction, having a look at the LLVM Tutorial.
Chris Lattner creates a simple toy language Kaleidoscope to show the various phases of compilation:
Lexing
Parsing
Code Generation
He then demonstrates how to add JIT capabilities (essential for an interpreter).
The toy language is extremely simple, and thus the resulting code is simple as well, and demonstrates nicely the architecture without drowning you in implementation details.
I am not sure that the tutorial is fully up-to-date and can be used as is against a recent LLVM version, but I do advise at least reading it.
(And of course, reading the Dragon Book).
Take a look on V8 for JavaScript. Every interpeter has a component called tokenizer. GNU has one whose name is bison. Take on look on it too. It can be helpful. Chromium uses some tokenizer for interpreting html on the Webkit too, but V8 is the javascript interpreter.
Claudio M. Souza Junior
a famous language, but not simple (PHP Source Code).
You can take advantage of the source code .
PHP Source Code

Good tools for creating a C/C++ parser/analyzer [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What are some good tools for getting a quick start for parsing and analyzing C/C++ code?
In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.
Here's what I've found so far, but haven't looked at them in detail (thoughts?):
CScope - Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
GCC - Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
PUMA - The PUre MAnipulator. (from the page: "The intention of this project is to
provide a library of classes for the analysis and manipulation of C/C++ sources. For this
purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++
sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
Any others?
I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.
Thanks!
-Matt
(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)
Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
Outstandingly complicated grammar
"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
You can look at clang that uses llvm for parsing.
Support C++ fully now link
The ANTLR parser generator has a grammar for C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.
Depending on your problem GCCXML might be your answer.
Basically it parses the source using GCC and then gives you easily digestible XML of parse tree.
With GCCXML you are done once and for all.
pycparser is a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.
Doesn't support C++, though. Granted, it's much harder than C.
Update (2012): at this time the answer, without any doubt, would be Clang - it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).
Have a look at how doxygen works, full source code is available and it's flex-based.
A misleading candidate is GOLD which is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languages refers to the languages in which one can implement parsers, not the list of supported parse grammars.
They only have grammars for C and C#, no C++.
Parsing C++ is a very complex challenge.
There's the Boost/Spirit framework, and a couple of years ago they did play with the idea of implementing a C++ parser, but it's far from complete.
Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ and the preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.
That said, you might want to have a look at pork/oink (elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!
On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.
Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc plugins using the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.
So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.
Another option just for C code is cscout.
The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it, but the gist is that C++ grammar can require arbitrarily much lookahead.
For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.
Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/ or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.
how about something easier to comprehend like tiny-C or Small C
Elsa beats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.
See our C++ Front End
for a full-featured C++ parser: builds ASTs, symbol tables, does name
and type resolution. You can even parse and retain the preprocessor
directives. The C++ front end is built on top of our DMS Software Reengineering
Toolkit, which allows you to use that information to carry out arbitrary
source code changes using source-to-source transformations.
DMS is the ideal engine for implementing such a translator.
Having said that, I don't see much point in your imagined task; I don't
see much value in trying to replace C++, and you'll find building
a complete translator an enormous amount of work, especially if your
target is a "toy" language. And there is likely little point in
parsing C++ using a robust parser, if its only purpose is to produce
an isomorphic version of C++ that is easier to parse (wait, we postulated
a robust C++ already!).
EDIT May 2012: DMS's C++ front end now handles GCC3/GCC4/C++11,Microsoft VisualC 2005/2010. Robustly.
EDIT Feb 2015: Now handles C++14 in GCC and MS dialects.
EDIT August 2015: Now parses and captures both the code and the preprocessor directives in a unified tree.
EDIT May 2020: Has been doing C++17 for the past few years. C++20 in process.
A while back I attempted to write a tool that will automatically generate unit tests for c files.
For preprosessing I put the files thru GCC. The output is ugly but you can easily trace where in the original code from the preprocessed file. But for your needs you might need somthing else.
I used Metre as the base for a C parser. It is open source and uses lex and yacc. This made it easy to get up and running in a short time without fully understanding lex & yacc.
I also wrote a C app since the lex & yacc solution could not help me trace functionality across functions and parse the structure of the entire function in one pass. It became unmaintainable in a short time and was abandoned.
What about using a tool like GNU's CFlow, that can analyse the code and produce charts of call-graphs, here's what the opengroup(man page) has to say about cflow. The GNU version of cflow comes with source, and open source also ...
Hope this helps,
Best regards,
Tom.

Isn't saying "C/C++" wrong? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've seen a lot of questions around that use improperly the expression "C/C++".
The reasons in my opinion are:
Newbie C and C++ programmers probably don't understand the difference between the two languages.
People don't really care about it since they want a generic, quick and "dirty" answer
While C/C++ could sometimes be interpreted as "either C or C++", I think it's a big error. C and C++ offer different approaches to programming, and even if C code can be easily implemented into C++ programs I think that referring to two separate languages with that single expression ( C/C++ ) is wrong.
It's true that some questions can be considered either as C or C++ ones, anyway.
What do you think about it?
C/C++ is a holdout from the early days of C++, where they were much more similar than they were today. It's something that wasn't really wrong at first, but is getting more-so all the time.
The basic structure is similar enough that most simple questions do still work between the two, though. There is an entire Wikipedia article on this topic: http://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B
The biggest fallacy that comes from this is that because someone is well-versed in C, they will be equally good at C++.
Please remember that the original implementations of C++ were simply as a pre-compiler that output C code for the 'real' compiler. All C++ concepts can be manually coded (but not compiler-enforced) in plain C.
"C/C++" is also valid when referring to compilers and other language/programming tools. Virtually every C++ compiler available will compile either - and are thus referred to as "C/C++" compilers. Most have options on whether to treat .C and .CPP files based on the extension, or compile all of them as C or all of them as C++.
Also note that mixing C and C++ source in a single compiler project was possible from the very first C/C++ compiler. This is probably the key factor in blurring the line between the languages.
Many language/programming tools that are created for C++ also work on C because the language syntax is virtually identical. Many language tools have separate Java, C#, Python versions - but they have a single "C/C++" version that works for C and C++ due to the strong similarities.
We in our company have noticed the following curious fact: if a job applicant writes in his CV about "advanced C/C++ knowledge", there is usually a good chance that he really knows neither ;)
The two languages are distinct, but they have a lot in common. A lot of C code would compile just fine on a C++ compiler. At the early-student level, a lot of C++ code would still work on a C compiler.
Note that in some circumstances the meaning of the code may differ in very subtle ways between the two compilers, but I suppose that's true in some circumstances even between different brands of C++ compiler if you're foolish enough to rely on undefined or contested/non-conformant behavior.
Yes and no.
C and C++ share a lot in common (in fact, the majority of C is a subset of C++).
But C is more oriented "imperating programming", whereas C++, in addition to C paradigm, has more paradigms easily accessible, like functional programing, generic programing, object oriented programing, metaprograming.
So I see the "C/C++" item saying either as "the intersection of C and C++" or "familiarity with C programing as well as C++ programing", depending on the context.
Now, the two languages are really different, and have different solutions to similar problems. A C developer would find it difficult to "parse/understand" a C++ source, whereas a C++ developer would not easily recognize the patterns used in a C source.
Thus, if you want to see how far the C is from the C++ in the "C/C++" expression, a good comparison would be the GTK+ C tutorials, and the same in C++ (GTKmm):
C : GTK+ Hello World: http://library.gnome.org/devel/gtk-tutorial/stable/c39.html#SEC-HELLOWORLD
C++ : GTKmm Hello World: http://www.gtkmm.org/docs/gtkmm-2.4/docs/tutorial/html/sec-helloworld.html
Reading those sources is quite enlightening, as they are, as far as I parsed them, producing exactly the same thing, the "same" way (as far as the languages are concerned).
Thus, I guess the C/C++ "expression" can quite be expressed by the comparison of those sources.
:-)
The conclusion of all this is that it is Ok if used on the following contexts:
describing the intersection of C and C++
describing familiarity with C programing as well as C++ programing
describing compatible code
But it would not be for:
justifying keeping to code in a subset of C++ (or C) for candy compatibility with C (or C++) when compatibility is not desired (and in most C++ project, it is not desired because quite limitating).
asserting that C and C++ can/should be coded the same way (as NOT shown by the GTK+/GTKmm example above)
I think it's more of the second answer - they want something that's easily integrated into their project.
While a C answer may not be idiomatic C++ (and vice versa), I think that's one of C++'s big selling points - you can basically embed C into it. If an idiomatic answer is important, they can always specify C/C++/C++ with STL/C++ with boost/etc.
An answer in lisp is going to be pretty unusable. But an answer in either C or C++ will be directly usable.
Yeah, C/C++ is pretty useless. It seems to be a term mostly used by C++ newbies. We C-only curmudgeons just say "C" and the experienced C++ folks know how much it has diverged from C and so they properly say "C++".
Even if C is (nearly) a subset of C++, this doesn't really have any bearing on their actual usage. Practically every interesting C feature is frowned upon in modern C++ code:
C pointers (use iterators/smart pointers/references instead), macros (use templates and inline functions instead), stdio (use iostreams instead), etc. etc.
So, as Alex Jenter put it, it's unlikely that anyone who knows either language well would say C/C++. Saying that you know how to program in "C/C++" is like saying you know how to program in "Perl/PHP"... sure they've got some significant similarities, but the differences in how they are actually used are vast.
C/C++ often means a programiing style, which is like C and classes, or C and STL :-) Technicaly it is C++, but minimum of its advantages are used.
I agree. I read the C tag RSS feed, and I see tons of C++ questions come through that really don't have anything to do with C.
I also see this exchange a lot:
Asker: How do you do this in C?
Answer: Use the X library for C++.
Asker: OK, how about someone actually answer my question in C?
I use that term myself, and it is because it is my style, I don't use boost, stl or some other things, not even standard C++ libs, like "cout" and "cin", I program C but using classes, templates and other (non-library) features to my advantage.
I can say that I am not a master of C, neither a master of C++, but I am really good at that particular style that I use since 10 years ago. (and I am still improving!)
I was under the impression that all c code is valid c++ code.
Isn’t saying “C/C++” wrong?
No, it isn't. Watcom International Corporation for example, founded more than 25 years ago, called their set of C and C++ compilers and tools "Watcom C/C++", and this product is still developed and available in the open-source form as OpenWatcom C/C++
If it is a complex question needing to write more than one function, yes, it can be wrong.
If it is just to ask a detail about sprintf or bit manipulation, I think it can be legitimate (the latter can even be tagged C/C++/Java/C#, I suppose...).
Whoever is asking the question should write C, C++ or C/C++ depending on the question.
From what I've seen, you can write C++ code the C way or the C++ way. Both work, but C++ code that is not written C style is usually easier to maintain in the long run.
In the end, it all depends on the particular question. If someone is asking how to concatenate strings, than it is very important whether he wants C or C++ solution. Or, another example, if someone is asking for a qsort algorithm. To support different types, you might want to use macros with C and templates with C++, etc.
This was too long for a comment, so I had to make it an answer, but it's in response to Jeff B's answer.
Please remember that the original
implementations of C++ were simply as
a pre-compiler that output C code for
the 'real' compiler.
I have a friend (who writes C++ compilers -- yes, plural), who would take offense to your first sentence. A compiler whose object code is C source code is every bit as much a compiler as any other. The essence of a compiler is that it understands that syntax of the language, and generates new code based on that. A pre-processor has no knowledge of the language and merely reformats its input.
Remember that the C Compilers which would compile the output of those C++ compilers, would themselves output ASM code would would then be run through an assembler.
I tend to put C / C++ in my questions.
Typically I am looking for something that I can use in my c++ application.
If the code is in C or in C++ then I can use it, so I would rather not just limit the possible answers to one or the other.
Not only these two languages are different, but also the approaches are different. C++ is an OO language, while C is procedural language.
Do I have to mention templates?
Also, there are differences in C and C++ standards. If something is good in C, doesn't have to compile in C++