Writing LLVM source files vs. using APIs

Writing LLVM source files vs. using APIs - llvm

I am creating an LLVM backend for a compiler. I am wondering if there is any downside to having my backend write IR code in files instead of using the APIs. The APIs are complicated (especially if one is using a language other than C++, in my case Haskell) and hard to use. The IR is much easier to understand. I don't need JIT compilation, the output code will be compiled to machine code by the standard command line tools.

The IR format changes from version to version. API changes much less frequently. There were examples in the past when IR format changed dramatically, so you'd need to invest big amount of time to tolerate these changes.
Using API is the preferable method. If sometimes it's not clear for you which API calls you will need - you can use cpp backend as a source of inspiration :)

As Anton said, there's a definite advantage in using the API as opposed to spitting out textual IR. I just want to address the point you raise regarding the complexity of the API and its usage from Haskell.
Note that LLVM has a C API, which (apart from being more stable) is suitable for foreign language interfaces. Python bindings exist for LLVM using this API, as well as Haskell bindings (this is easily found by Google) and for other languages as well.

Related

using iRODS c++ API

I intend to put/get file to/from an iRODS server. iRODS provides well documented JAVA and PHP APIs, however I'm looking for a C/C++ library providing such functions.
Are there libraries or examples of code I could use ?

Not exactly what you've requested, but you might be interested in Baton, which uses the iRODS C API, and has had a lot of work put into it, so you might be able to use it as-is, depending on your use case, rather than writing your own from scratch. Failing that, it provides a lot of examples of using the API.
If you have reasons that you must write your own, then in 4.x the the iRODS docs for the C API are much improved from the earlier 3.3.1 code path.
Good luck and do try the mailing list as a previous commenter mentioned - the developers respond often.

Are there C++ refactoring patterns implemented as a set of Clang tools?

So I found that nice video on Clang tooling... And could not help but wonder: is there any sample codebase/compiled tooling suite for full project beautification and cleanup (alike C# resharper)?
Code formating on project scale such as: extra space at line end, unification of member/class naming, ways of how {}brackets are placed after if etc?

Clang's libtooling is fairly new so there's not too much based on it currently.
Also in my experience it's a pain to link against (there's no clang version of llvm-config and in the tutorials the devs seem to think people will build their tools inside of the full clang repo rather than as nice separate projects. The Ubuntu builds of clang only include libtooling as a static .a, no .so. Official LLVM nightly builds for Ubuntu don't seem to include the static libclangTooling.a at all.
There is include-what-you-use which is designed to remove unused header files.
There is clReflect which generates reflection bindings. (Not sure if this actually uses libtooling or just libclang, but its the same kind of thing.)
There is also refactorial that supports some other operations.
There are some tools included as part of clang. Most notably A c++11 migration tool. There is also a tool for modules (A feature being worked on for a future version of C++).
This stuff should be very useful and powerful once it takes off.
Personally I'm trying (unsuccessfully currently) to build a simple CLI re-factoring tool, cppmv which is designed to just let you rename classes, functions, variables, move them around namespaces and such while keeping their uses syncd but I don't have anything useful at this stage. Other tools could be cppls (to list namespaces, classes functions and so on). Maybe cppcp, if you want to copy something for some reason (you could have a 'template' class for example) but it seems less useful.
I was also looking at making a FUSE userspace filesystem that would let you mount and browse your project so you can use traditional 'mv' and 'cp' commands, but that was more of an excuse to learn FUSE rather than because it would be useful to do things that way. Although it might be possible to edit source code of specific classes and functions in their own separate individual 'files', although that wouldn't be useful for many things like IDEs since you would loose information about headers and such.
It would also be nice to have a live, 'see as you edit', ASTMatcher based tool, or some simple re-factoring scripting language bindings.
EDIT:
There is now also clang-format for code style formatting and (as of 3.4) a clang-format.py script for Vim integration. clang-apply-replacements "that finds files containing
serialized Replacements and applies those changes after deduplication
and detecting conflicts."
It might be worth looking at this video where some of this stuff is demoed.

Need C++ parser

I need a good, stable and, maybe, easy to use C++ parser library with C/C++ interface (C is preferred).
I hear that cint is good c++ interpreter. Can I use it (or some part of it) for this purpose?
Any suggestions?

See: http://clang.llvm.org/
It has both a C++ and a C interface (libclang).

C++ parsing is famously hard. AFAIK there are only three parsers that are acceptable by todays standards: EDG (widely used as a frontend in popular C++ compilers), GCC's and Microsoft's. And apparently, Microsoft has started using EDG's parser in VS2010, for Intellisense.
When you're looking at the free options, you're pretty much stuck at GCC. It can produce XML, though, so the easy part is there. (Easy by C++ parsing standards, that is)

Clang is the most up-to-date and mature option, with a decent C++ API (but no plain C). Elsa is a bit out of date and unmaintained, but still a usable choice. Both could be used as libraries as well as standalone XML frontends.

If you want to parse C or C++ code, there are some options:
http://bellard.org/tcc/
http://students.ceid.upatras.gr/~sxanth/ncc/
If you want to create a parser using C/C++, you can try:
http://boost-spirit.com/home/
http://dinosaur.compilertools.net/ Lex and Yacc
http://www.codeguru.com/csharp/.net/net_general/patterns/article.php/c12805 Flex and Bison

Our C++ Front End is able to parse a variety of C++ dialects (ANSI, GCC, MSVS), automatically builds ASTs whose nodes are marked with precise source positions and are decorated with any nearby comment text, and builds a full symbol table. (EDIT Jan 2013: the C++ front end has been able to handle C++11 for quite awhile now).
The C++ front end is built on top of our DMS Software Reengineering Toolkit, generalized compiler technology for program analysis and transformation, designed to support custom tool building. The C++ front end includes a preprocessor, in which the preprocessor directives can be expanded or not collectively or individually as appropriate for the task. It also includes full symbol construction with all the nasty Koenig lookup stuff.
DMS accepts explicit language definitions (that's how it understands C++; there are also fron ends for C, C#, Java, COBOL, and variety of other languages). DMS provides general parsing, symbol table building, flow analysis machinery, procedural APIs for tree navigation/inspection/modification, source-to-source transformation, and AST-to-source text regeneration including the original comments, number radices, etc. All of these capabilities are available for use by the C++ Front End.
DMS is also designed to handle the scale required for serious tasks. Often you need not just one compilation unit (which is what GCC will give you at best) but access to an entire set. DMS has been used to analyze/transform thousands of C++ compilation units, and literally tens of thousands of C compilation units (on a 25 million line application).
"Easy to use library" is an oxymoron when it comes to program manipulation tools. The langauges themselves are complex (C++ being one of the most difficult and getting worse with C++0X) and that induces complexity in the nature of the questions you can ask and what the answers look like (e.g. "are there any template instantions that can modify local variable X in method Y in class C in any namespace N?"). The questions themselves are hard.
What you want is a library with the necessary complexity to let you carry off your task. DMS has been under continuous development for the last 15 years, to provide that necessary complexity. If you want to do serious program processing, I claim you will need that information.
As proof, DMS has been used to carry out massive automated reengineering of C++-based mission avionics software for Boeing. I don't believe there are any other tools that can do this. (Clang looks to be trying, but only for C++. YMMV).

I don't know for cint, but I heard people use gcc-xml for this.
I have been looking for a good stand-alone library too, but haven't found any.

If you're feeling brave the links in the answer to "is there a yacc-able C++ grammar?" might be helpful. Gcc-xml and clang have already been suggested and Swig also has an XML output which depending on what you're trying to achieve might be relevant.

I did not try it, but I think that best choice will be getting modules for parsing from some popular open source compiler like gcc for C++;
Maybe you'll find something interesting here http://www.nobugs.org/developer/parsingcpp/

Is there a library that can compile C++ or C

I came here to ask this question because this site has been very useful to me in the past, seems to have very knowledgeable users who are willing to discuss a question even if it is metaphysical at times. And also because googling it did not work.
Java has a compiler and then it has a JDT library that can compile java on the fly (for example used in JasperReports to turn a report template into Java code).
My question: Does anyone know of a library/project that would offer compiling as a set of library classes in c/c++. For example: a suite of classes to perform Preprocessing, Parsing, CodeOptimization and of course Binary rendering to executable images such as ELF or Win format. Basically something that would allow one to compile c or c++ scriptlets as part of an application.

Yes: llvm. In particular, clang. At least, that's how they advertise the projects. Also, check this question. It might be relevant if you decide to use llvm.

You might be able to adapt something from the LLVM project to your needs.

You can just require that a compiler be installed, then call it. This is a fairly hefty requirement, but about the only way to truly "embed" C or C++. There are interpreters that you may be able to embed, but that currently seems a poor choice, not the least because any libraries used in the script must have development versions (i.e. headers and source/compiled-libraries) installed, and those libraries could be restricted to the feature set supported by the interpreter (e.g. quality of template implementation).
You're better off using a language like Python or Lua to embed.

There is the ch interpreter but I have not used it. Generally for scripting type applications a more natural scripted language is used.

Great. It looks like LLVM is what I was after.
Thanks a lot for your feedback.
I am not primarily after C++ as a scripting language. I have noticed that Python is used as an embedded script engine.
My primary reason is two fold:
Get rid off Make,CMake and the hell that is Autoconf and replace it with something like Scons that binds into and interacts with all phases of compiling
Hook into the compiling process after parsing and auto generate code. Specificaly meta related code. In my case I have been able to implement almost every feature of Java in C++ except one: Reflection.
Why impose on your code uneeded bload like RTTI that is often inadequate. Instead one could selectively generate added features. But developer would have to choice when and how to use this extra code.

Creating a VHDL backend for LLVM?

LLVM is very modular and allows you to fairly easily define new backends. However most of the documentation/tutorials on creating an LLVM backend focus on adding a new processor instruction set and registers. I'm wondering what it would take to create a VHDL backend for LLVM? Are there examples of using LLVM to go from one higher level language to another?
Just to clarify: are there examples of translating LLVM IR to a higher level language instead of to an assembly language? For example: you could read in C with Clang, use LLVM to do some optimization and then write out code in another language like Java or maybe Fortran.

Yes !
There are many LLVM back-end targeting VHDL/Verilog around :
(open source) Legup paper
(commercial) Xilinx HLS
(online) C-to-verilog
And I know there are many others...
The interesting thing about such low-level representations as LLVM or GIMPLE (also called RTL by the the way) is that they expose static-single assignments (SSA) forms : this can be translated to hardware quite directly, as SSA can be seen as a tree of multiplexers...

There's nothing really special about the LLVM IR. It's a standard DAG with variable arity. Decompiling LLVM IR is a lot like decompiling machine language.
You might be able to leverage some frontend optimizations such as constant folding, but that sounds pretty minor compared to the whole task.
My only experience with LLVM was writing a binary translator for a class project, from a toy CISC to a custom RISC.
I'd say, since it's the closest thing to a standard IR (well, GCC GIMPLE is a close second), see if it fits with your algorithms and style and evaluate it as one alternative.
Note that GCC also started out prioritizing portability above all, and has also accomplished a lot.

I'm not sure I follow how parts of your question relate one to another.
To target LLVM into a high-level language like C is very possible and you seem to have found one reference point.
VHDL is a whole other business however. Do you consider VHDL a high-level language? It may be, but but describing hardware/logic. Sure VHDL has some constructs that you can employ to actually program in it, but it's hardly a fruitful endeavor. VHDL describes hardware and thus makes translating LLVM IR into it a very hard problem, unless of course you design a CPU with a custom instruction set in VHDL and translate LLVM IR into your instructions.

This thread was one of the first things I found while looking for the same thing.
I found a project that's rather far along that cleanly builds under/with llvm 3.5. It's pretty darn cool. It spits out HDL and does various other cool FPGA related things. While it's designed to work with TTAs and generate images for FPGA (or simulate them), it can probably also be made to do some trivial HDL generation from c functions.
It was perfect for my purposes because I wanted to upload to an Altera FPGA, and the fpga_stdout example even spits out Quartus build scripts and project files.
TTA-Based Co-design Environment
I also tried the things listed in the accepted answer and a couple others and found that they weren't going to work for me or weren't very high quality (usually both). TCE is professional feeling, but purely academic I believe. Very nice all the way around.

It seems the question was partially answered, so I’d like to give it a shot:
What it would take to create a VHDL backend for LLVM?
What it would take to translate LLVM IR to a higher level language (presumably with the intention of converting between high-level langs)?
I will give you some background on 2. And expand at a later date on 1.
If you want to convert LLVM IR to a high-level language such as C or Java:
You would have to take the LLVM instructions, and abstract that out into its equivalent C code. Then you need to take the remaining features that LLVM does not have an equivalent for (like classes and abstractions for C++) and write a routine that would find those patterns in the LLVM (like reused blocks) and write C. For the basic stuff, its pretty straightforward. But, just follow the train of thought and you quickly find yourself realizing the true difficultly of the problem, after all not everyone writes simple C. To compound the difficulty further, you may not get the same LLVM IR when compiling the generated C! (Consider the resulting feedback loop)
As for Java, you are in for an even harder battle going direct from LLVM IR, and in either case still have the problem you likely won't get the same code compiling to LLVM IR, if one even can do that. Rather, you would translate LLVM IR to JVM Bytecode. Then you could use a reverse compiler to get your Java.
A group of Chinese students was apparently able to do this, but they wondered why such little interest in their research. I would say its bc they don't fully understand just what the LLVM guys have done, and how it is better than the JVM. (In fact, LLVM arguably makes the JVM obsolete ;)
Even though this seems useful in that one can use LLVM as an intermediary between C and Java to convert bidirectionally, this solution is actually of little use because we are asking the wrong question. See, the entire reason you would want that for practical purposes is to have a common code base and increase performance.
But the real problem is that we need a language that has abstracted the common features of modern languages, and that gives you a central language that you can build from. http://julialang.org/ has answered the question 😉

Looks like the best place to start is with the CBackend in the LLVM source:
llvm/lib/Target/CBackend/CBackend.cpp

tl,dr: I don't think LLVM is the right tool
What your are looking for is way to translate LLVM code to a higher language that's what emscripten do for Javascript.
But it looks like you miss a bit the point of LLVM as it's meant to generate static code in order to achieve that they use a specific intermediate language build for that purpose.
As you can see the way emscripten works is by implementing a stack, but without using javascript as a human would have done it.
They are several project that try to achieve what you original question was, like MyHDL that turns python to VHDL or Verilog.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js