C++ Source-to-Source Transformation with Clang - c++

I am working on a project for which I need to "combine" code distributed over multiple C++ files into one file. Due to the nature of the project, I only need one entry function (the function that will be defined as the top function in the Xilinx High-Level-Synthesis software -> see context below). The signature of this function needs to be preserved in the transformation. Whether other functions from other files simply get copied into the file and get called as a subroutine or are inlined does not matter. I think due to variable and function scopes simply concatenating the files will not work.
Since I did not write the C++ code myself and it is quite comprehensive, I am looking for a way to do the transformation automatically. The possibilities I am aware of to do this are the following:
Compile the code to LLVM IR with inlining flags and use a C++/C backend to turn the LLVM code into the source language again. This will result in bad source code and require either an old release of Clang and LLVM or another backend like JuliaComputing. 1
The other option would be developing a tool that relies on using the AST and a library like LibTooling to restructure the code. This option would probably result in better code and put everything into one file without the unnecessary inlining. 2 However, this options seems too complicated to put the all the code into one file.
Hence my question: Are you aware of a better or simply alternative approach to solve this problem?
Context: The project aims to run some of the code on a Xilinx FPGA and the Vitis High-Level-Synthesis tool requires all code that is to be made into a single IP block to be contained in a single file. That is why I need to realise this transformation.

Related

C++ code to input function as string and then use the function ahead in the code

I need to define functions in c++ code to be user defined. Basically that he writes the function in form of a string which is exact c++ code, then use that function in the very next line of code.
I have tried to append output to a file which is imported, but it obviously failed
You simply cannot do it. C++ code can not be interpreted at run-time. You may want to try Qt/QML which will give an opportunity to run a javascript code or an entire QML file from network/string or any other method which can deliver your code to the host application.
I assume you are talking about a pure function such as a mathematical formula.
To my knowledge, what you ask is not possible without
a) writing your own parser, that effectively creates functions from strings or
b) using external libraries - a quick google search brought be to this library that seems to provide the functionality you are looking for. I have no personal experience with it, though.
As #Useless pointed out, "editing" the code after compilation is not intended in a compiled language as c++. This could be tricked by having a second code compiled and executed in the background; this, however, seems rather unelegant and would rely on additional threads, compilers and the operating system.

Creating a modular language in LLVM?

I'm developing a new language in LLVM using the C++ API which compiles down to target the C ABI.
I would like to support modular compilation by allowing end users to build what are effectively static libraries. I noticed the LLVM C++ API has a llvm::Linker class that I can use during compilation to combine source files (llvm::Module), however I wanted to guarantee library compatibility via metadata version numbers or at least the publicly exposed interface between separate compilation runs.
Much of the information available on metadata in LLVM suggest that it should only be used for extended information that would not break correctness when silently removed.
llvm
blog
IntrinsicsMetadataAttributes
pdf
I wouldn't think this would be a deal breaker as it could be global metadata, but it would be good to get a second opinion on that point.
I also know there is a method in IRReader to parseIRFile so I can load some previously built bc files. I would be curious if it would be reasonable practice to include size and CRC information for comparison when loading these files.
My language has concepts similar to C# including interfaces. I figure I could allow modular compilation by importing/exporting an interface type along with external functions (Much like C++, I don't restrict the language to only methods of classes).
This approach allows me to include language specific information in the interface without needing to encode it in the IR as both the library and the calling code would be required to build with the interface. This again requires the interfaces to be compatible.
One language feature that would require extended information would be named parameters in functions.
My language is very type-safe and also mandates named parameters so there is no predetermined function parameter order. This allows call sites to be more explicit, the compiler to catch erroneous parameter usage, and authors have more liberty in determining default parameters as they are not restricted to the last parameters to the function.
The compiler will need to know names, modifiers, defaults, etc. of these parameters to correctly map calls at compile time, so I figure the interface approach would work well here.
TL;DR
Does LLVM have any predefined facilities for building static libraries?
Is version number, size, and CRC information reasonable use cases for LLVM's metadata?
This is probably not QUITE an answer... Or at least not a complete answer.
I like this question, as I'm going to need a solution in the future too (some time in the next few months or years) for my Pascal compiler. It supports "units" which is meant to be a separately compiled object, but currently what I do is simply drag in the source file and compile it into the main llvm::Module - that's neither efficient nor flexible (can't use the linker to choose between the "Linux" and "Windows" version of some code, for example - not that I think there is 5% chance that my compiler will work on Windows without modification anyway...)
However, I'm not sure storing the "object" file as LLVM IR would be the right thing to do. I was thinking that a better way would be to store your AST in some serialized form - then
you don't depend on LLVM versions changing the IR format.
You can add whatever metadata you like. There won't be much
difference in generating LLVM-IR from this during your link phase or
building the IR at compile and then reading the IR to figure out if
the metadata is correct. [The slow part, as you may have already found out, is the optimisation and MC generation, and you'd still have to do that either way]
Like I started out, I'm not sure this is an answer, but it's my thoughts so far on the subject. Now I'll go back to adding debug symbol stuff to my Pascal compiler... Before Christmas, I couldn't see the source in GDB. Now I can step, but no viewing of variables yet...

How to hide a C++ function definition before releasing source code

I am building a Windows (Visual Studio, C++ based) console application and would like to release the source code for it. However, I do not want the definition of a particular function be visible to it. Is there a way to pre-compile (just the file containing the definition) it so that no one can view it, but the rest of the source is visible and can be built/ran using the 'pre-compiled' function definition?
You have two basic choices:
have that function in a compiled form
if you only want to ship source code, you could compile a shared library, then find or write a utility to generate C/C++ source code with a character array containing the binary file content, then your program can ifstream::write() to a file and link (e.g. dlopen()) the shared library on the fly at runtime (this doesn't buy you much over just shipping the shared library unless you have pressing reasons for needing a "source-only" release, and the above qualifies for your purposes)
obfuscate the source code in some way that removes the value or insight potentially gained by reading it
there are number of source code obfuscation utilities around (specific recommendations are off-topic for S.O.)
one form of obfuscation is to generate an assembly language version (e.g. g++ -S, cl /S) of the C++ sources, which would be verbose and harder for most programmers to understand and modify further
No you can't. When you release the source code, then the information about the function is there. If you release it in some way 'pre-compiled' and the function is still called then the information is still there. If you require the function to stay a secret you must not release it.

clang libTooling: How to find which header an AST item came out of?

Examples found on the web for clang tools are always run on toy examples, which are usually all really trivial C programs.
I am building a tool which performs source-to-source transformations on C++ code, which is obviously a very, very challenging task, but clang is up to this task.
The issue I am facing now is that the AST that clang generates for any C++ code that utilizes the STL is enormous. For example I have some C++ code for which clang++ -ast-dump ... | wc -l is 67,018 lines of horrifying AST gobbledygook!
99% of this is standard library stuff, which I aim to ignore in my source-to-source metaprogramming task. So, to achieve this I want to simply filter out files. Suppose I want to look at only the class definitions in the headers of the project that I'm analyzing (and ignore all standard library headers's stuff), I will need to just figure out which header each of my CXXRecordDecl's came from!
Can this be done?
Edit: Hopefully this is a way to go about it. Trying this out now... The important bit is that it has to tell me the header that the decls came out of, not the cpp file corresponding to the translation unit.
In my experience so far, the "source" of some given AST node is best retrieved by using Locations. For example every node at least has a start location, and when you print this out it will contain the header file path.
Then it's possible to use this path to decide whether it is a system library or part of your application code that you still are interested in examining.
One route I'm looking at is to narrow matches with things like hasName() (as found here. For example:
recordDecl(hasName("MyBaseClass")) // etc.
However your comment above using -ast-dump is something I tried as well to get a lay of the land on my own CLang tool. I found this post to be extremely helpful. Armed with their suggestion, I used clang-check to filter to a specific class name and fed it my top-level CPP file. The output was a much more manageable few hundred lines representing the class declarations and definitions of interest.

llvm: strategies to build JIT content incrementally

I want my language backend to build functions and types incrementally but don't pollute the main module and context when functions and types fail to build successfully (due to problems with the user input).
I ask an earlier question regarding this.
One strategy i can see for this would be building everything in temp module and LLVMContext, migrating to main context only after success, but i am not sure if that is possible with the current API. For instance, i wouldn't know know to migrate that content between different contexts, as they are supposed to represent isolated islands of LLVM functionality, but maybe there is always the alternative to save everything to .bc and load somewhere else?
what other strategies would you suggest for achieving this?
Assuming you have two modules - source and destination, it's possible to copy a function from source to destination. The code in LLVM you can use as an example is the body of the LLVM linker, in lib/linker/LinkModules.cpp.
In particular, look at the linkFunctionProto and linkFunctionBody methods in that file. linkFunctionBody copies the function definition, and uses the llvm::CloneFunctionInto utility for the heavy lifting.
As for LLVMContext, unless you specifically need to run several LLVM instances simultaneously in different threads, don't worry about it too much and just use getGlobalContext() everywhere a context is required. Read this doc page for more information.