Alternative to LLVM for C++ (bytecode generation) - c++

I have written my lexer and parser in flex and bison. My project is based on C++ and I would love to stick to it. Currently my Interpreter is written in C++ but I want to achieve faster execution time by converting to bytecode (some form of a VM-level bytecode) when my interpreter works. I know this can be achieved through LLVM. I had problems using it from a x64 OS and developing on a Visual Studio 2012 (32-bit). Some of which can be found # LLVM linker errors on VS. The other tool I came across is ANTLR and if I understand correctly then the latest release does not easily integrate into C++ yet. Many references were found for the same but a quick one can be # ANTLR integration with C++ issue. Also I do not want to dispose off my lexer and parser written in flex and bison. What are my options if I want to generate bytecode from my AST?
EDIT: My aim is to generate bytecode from my AST (for the target architecture) so the code can be executed at a Virtual Machine level. Currently I have an Interpretor which interpretes (executes the AST) based on C++ library and generates bytecode. I want to generate Bytecode straight from my AST and execute the AST in its bytecode.
Would be appreciated.

Generating native bytecode directly from your AST is not possible (well actually it is, but that would be extremely difficult). You need some kind of intermediary step like emitting LLVM bytecode or code in some programming language of your choice. Please note that LLVM bytecode is not the same as native target machine bytecode. The LLVM bytecode has to be compiled to native binaries for target machines which is done by the respective frontend. So you could as well just generate C++ code from your AST using a handwritten code emitter which traverses your syntax tree. Then you use a C++ compiler for the target platform to compile it to the desired native binary.

Related

Is it possible to compile Clang LibTooling into standalone library?

I`m trying to find best solution for parsing C++ code for my refactor support tool. I decided to use clang frontend because it is meant to be frontend of C++ compiler. I found the documentation page that describes parsing of C++ code using clang LibTooling. But, unfortunately, it looks for me that LibTooling cannot be used separately from clang infrastructure installed on the tool user side.
So, my question: is it possible to compile Clang LibTooling to standalone parser that my be reused between C++ parse projects without clang infrastructure? Ideal solution for me is shown on the picture. I want to compile LibTooling linking it with my wrapping library (that may simplify / adopt LibTooling API for my purposes) and then reuse it between different projects.
P.S.: Maybe, something like it where performed in Firolino's clang project - but it looks that full clang installation should be used for too.
P.P.S.: As for me, solution that is whanted for my may be useful for all C++ community. So if somebody want to cooperate with me in making this project - you are welcome!

How is clang able to steer C/C++ code optimization?

I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.
(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.
It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.

How to cross compile Flash SWF file(s) into Keil Compilable C/C++ file(s)

I have been doing research in cross compile flash SWF files into C/C++ source files
There are plenty of tools in decompiling SWF files into plain-text files formats.
The Free SWF Decompiler provides plenty opensource solution on this.
To decompiler SWF into C/C++ source files,
I had tried the following solutions
1) Haxe: The Haxe Compiler is responsible for translating the Haxe programming language to the target platform native source code or binary
To do this, I need to
a) Decompile SWF into actionscripts files
b) Convert actionscripts to Haxe language script
c) Compiler haxe into C++ source file
d) Recompiler C++ source files with Keil MDK-ARM
Drawback: The output C++ file is huge & contains many Flex SDK like resources in C source format, which is hard to re-compiler in Keil MDK-ARM. And it seem quite an inefficient job to get the Keil compilable binary code.
Recently, I had found another possible solution to do this by Adobe Flash C++ Compiler, ie FlasCC (
A complete BSD-like C/C++ development environment with a GCC based cross-compiler capable of targeting the Adobe Flash Runtime)
But I am not sure if it would work as I expected.
Since FlasCC can compile c/c++ code into ActionScript bytecode (ABC) as well as LLVM bytecode.
My thought is
a) Parse SWF ActionScript bytecode (ABC)
b) Read ActionScript bytecode (ABC) in FlahCC (not sure if this can be done?)
c) By the equivalent ActionScript bytecode (ABC) in FlasCC, output its LLVM bytecode (not sure if this can be done?)
d) Convert LLVM bytecode to C++ code by LLC
In this way, the ActionScript bytecode can be optimized through LLVM LTO(Link Time Optimized) Build
Since I am not llvm expert, I need some advise on this.
Is this workable? Or is there any other way to do this?
Generally speaking, no. Auto-compilation of flash to binary will be quite complex thing to do and ineffective one too.
I'm surprised the first approach you used actually worked, I'd expect the resulting code to fail here and there randomly. Anyhow, there is not a single chance as3->haxe compilation can be effective in tems of resulting code performance.
As for the second approach, compilers are one-way thing, you can't use FlasCC to decompile ABC bytecode nor to compile as3, it can only be used to compile c/c++ to ABC.
Compilation of ABC to LLVM and then to binary seems much more feasible and doable, but will require quite a lot of effort and yet again I doubt the result will be any fast. And I'm almost sure it will be far slower than just an swf running in flash player on target platform. (Yet, considering the things you've already doneI'd estimate that you will be able to make a prototype in 2 or 3 month of hard work, so it may be worth a shot.)
It seems much more clear to me now to decompose the whole job.
The todo job is, decompose SWF files to Keil compilable C/C++ source files.
I found an open source solution that was design to convert
ABC to TESSA (Type Enriched SSA) Tamarin on LLVM
TESSA to LLVM IR
LLVM IR to C (By llc)
I will give it a try!

LLVM for parsing math expressions

I have some troubles wrapping my head around what LLVM actually does...
Am I right to assume that it could be used to parse mathematical expressions at runtime in a C++ program?
Right now at runtime, I'm getting the math expressions and build a C program out of it, compile it on the fly by doing system call to gcc. Then I dynamically load the .so produced by gcc and extract my eval function...
I'd like to replace this workflow by something simpler, maybe even faster...
Can LLVM help me out? Any resources out there to get me started?
You're describing using LLVM as a JIT compiler, which is absolutely possible. If you generate LLVM IR code (in memory) and hand it off to the library, it will generate machine code for you (still in memory). You can then run that code however you like.
If you want to generate LLVM IR from C code, you can also link clang as a library.
Here is a PDF I found at this answer, which has some examples of how to use LLVM as a JIT.

How to embed LLVM?

The LLVM Core project consists of:
Compiler - converts source code to LLVM IR
VM - executes compiled IR code
How can I embed the VM to a C++ application?
The LLVM is really a collection of libraries that you can link to, so it's pretty easy to embed. More often the LLVM takes IR that you generate and compiles it directly to machine code. There is also a library available to interpret and execute IR for platforms that do not support JIT compilation.
There's a pretty good tutorial available on the LLVM website here: http://llvm.org/docs/tutorial/. I suggest that you go through that and then ask more specific questions if you have them.
Take a look at the HowToUseJIT example in LLVM.