Converting GCC IR to LLVM IR - c++

I have written an LLVM pass that modifies the Intermediate Representation (IR) code. To increase portability, I also want it to work with a gcc compiler. So I was wondering if there is any tool which can convert some Intermediate Representation (IR) of gcc to LLVM IR.

You probably want dragonegg (which is using the GCC front-end to build an LLVM IR).
And if you wanted to work on the GCC internal representations, MELT (a high level domain specific language to extend GCC) is probably the right tool.

It will probably be much easier to simply write another version of your code that works with gcc IR. What you want to do is likely not possible, and if it is possible, it's probably extremely difficult. (More so than writing the LLVM pass in the first place.)

Related

Customize Clang to convert C++ code to C

Here is my scenario:
I,m working on an embedded linux system and getting a shared library written in C++. It works well except that libstdc++ is required, which means an extra 1M memory is occupied. I want to convert the shared library to C so that 1M memory will be saved.
I know how to convert C++ code to C by hand but it will be really boring. So I searched for solution and getting a similar question: Use Clang to convert C++ to C code. However the generated code is not readable. I want to get maintainable C source code to obsolete the original C++ code.
I'm a newbie at Clang. I have learnt that Clang can be used to build a tool that processes code. My question is:
Is it possible using Clang to achieve my goal?
If it is possible, how can I do that? To be more specific, how can I use Clang to remove code blocks wrapped by macro as the first step?
In practice converting (semi-automatically) genuine C++ code to maintainable C code is not realistic.
I want to get maintainable C source code to obsolete the original C++ code.
You certainly won't get maintainable and readable and portable C code (for instance, as soon as standard containers are used in C++; their template expansion is not readable, and probably not portable to something with different word size, alignment, endianness ...). You could transform LLVM IR to some unportable and unreadable subset of C.
It works well except that libstdc++ is required, which means an extra 1M memory is occupied
Perhaps you could try linking (everything) statically; maybe only a part of libstdc++ is used in your particular application.
BTW, you could get GIMPLE from GCC, and convert that GIMPLE to unreadable C code (perhaps by customizing GCC with a plugin or a GCC MELT extension).
You might also try to compile and link with link time optimization, e.g. with -flto -Os (with recent GCC or Clang).
Don't forget that development efforts also has some costs. Is it worth spending a whole year of work (or more) for a team of a few developers to win a few hundred kilobytes? In most cases upgrading the hardware to something with slightly more memory would cost a lot less. YMMV

C++ Native to Intermediate

Is it theoretically and/or practically possible to compile native c++ to some sort of intermediate language which will then be compiled at run time?
Along the same lines, is "portable" the term used to denote this?
LLVM which is a compiler infrastructure parses C++ code, transforming it to an intermediate language called LLVM IR (IR stands for Intermediate Representation) which looks like high-level assembly language. It is a machine independent language. Generating IR is one phase. In the next phase, it passes through various optimizers (called pass). which then reaches to third phase which emits machine code (i.e machine dependent code).
It is a module-based design; output of one phase (module) becomes input of another. You could save IR on your disk, so that the remaining phases can resume later, maybe on entirely different machine!
So you could generate IR and then do rest of the things on runtime? I've not done that myself, but LLVM seems really promising.
Here is the documentation of LLVM IR:
LLVM Language Reference Manual
This topic on Stackoverlow seems interesting, as it says,
LLVM advantages:
JIT - you can compile and run your code dynamically.
And these articles are good read:
The Design of LLVM (on drdobs.com)
Create a working compiler with the LLVM framework, Part 1

How can I generate metadata output with clang similar to that of gcc-xml?

I found that GCCXML is not being maintained anymore (I think the last version is from 2009 from their CVS repository). People usually suggest to check out clang, but I couldn't find a comprehensive documentation that described how to generate a similar output. Not necessarily XML, but the same information in a parsable (documented, if binary or obscure) format. If there is a way to get the same information from a recent gcc version, that is also fine.
This is for a hobby project for dynamic invocation of C++ code. I know about similar projects (pygccxml, xrtti, openc++), but the point is to make it, for fun.
There used to be a way to print an xml dump with Clang but it was more or less supported and has been removed. There are developers options to get a dump at various stages, but the format is for human consumption, and unstable.
The recommendation for Clang users has always been code integration:
Either directly using Clang from C++, and for example use a RecursiveASTVisitor implementation
Or use libclang from C or C++.
Unlike gcc, clang is conceived as a set of a libraries to be reused, so it does not make much sense to try and write a parser for some clang output: it is just much more error-prone than just consuming the information right at the source.
You could use the Clang based CastXML to generate XML file.
https://github.com/CastXML/CastXML
You could code a GCC plugin, or better yet, a MELT extension, to do that. You'll need a GCC 4.6 or later version (4.7 is coming out soon).
However, extending GCC, either thru a plugin coded in C or better yet with an extension coded in MELT (a domain specific language suited to extend GCC) takes some time, because you need to understand and handle most of main GCC internal representations (Gimple and Tree-s).
If you want to use MELT, I'll be delighted to help you.

llvm ir back to human-readable source language?

Is there an easy way of going from llvm ir to working source code?
Specifically, I'd like to start with some simple C++ code that merely modifies PODs (mainly arrays of ints, floats, etc), convert it to llvm ir, perform some simple analysis and translation on it and then convert it back into C++ code?
It don't really mind about any of the names getting mangled, I'd just like to be able to hack about with the source before doing the machine-dependent optimisations.
There are number of options actually. The 2 that you'll probably be interested in are -march=c and -march=cpp, which are options to llc.
Run:
llc -march=c -o code.c code.ll
This will convert the LLVM bitcode in code.ll back to C and put it in code.c.
Also:
llc -march=cpp -o code.cpp code.ll
This is different than the C output engine. It actually will write out C++ code that can be run to reconstruct the IR. I use this personal to embed LLVM IR in a program without having to deal with parsing bitcode files or anything.
-march=cpp has more options you can see with llc --help, such as -cppgen= which controls how much of the IR the output C++ reconstructs.
CppBackend was removed. We have no -march=cpp and -march=c option since 2016-05-05, r268631.
There is an issue here... it might not be possible to easily represent the IR back into the language.
I mean, you'll probably be able to get some representation, but it might be less readable.
The issue is that the IR is not concerned with high-level semantic, and without it...
I'd rather advise you to learn to read the IR. I can read a bit of it without that much effort, and I am far from being a llvm expert.
Otherwise, you can C code from the IR. It won't be much more similar to your C++ code, but you'll perhaps feel better without ssa and phi nodes.

LLVM what is it and how can i use it to cross platform compilations

I was reading here and there about llvm that can be used to ease the pain of cross platform compilations in c++ , i was trying to read the documents but i didn't understand how can i
use it in real life development problems can someone please explain me in simple words how can i use it ?
The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.
The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.
In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.
To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.
You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.
In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.
Why don't you go to the LLVM website and check out all the documentation there. They explain in great detail what LLVM is and how to use it. For example they have a Getting Started page.
LLVM is, as its name says a low level virtual machine which have code generator. If you want to compile to it, you can use either gcc front end or clang, which is c/c++ compiler for LLVM which is still work in progress.
It's important to note that a bunch of information about the target comes from the system header files that you use when compiling. LLVM does not defer resolving things like "size of pointer" or "byte layout" so if you compile with 64-bit headers for a little-endian platform, you cannot use that LLVM source code to target a 32-bit big-endian assembly output pater.
There is a good chapter in a book explaining everything nicely here: www.aosabook.org/en/llvm.html