Customize Clang to convert C++ code to C - c++

Here is my scenario:
I,m working on an embedded linux system and getting a shared library written in C++. It works well except that libstdc++ is required, which means an extra 1M memory is occupied. I want to convert the shared library to C so that 1M memory will be saved.
I know how to convert C++ code to C by hand but it will be really boring. So I searched for solution and getting a similar question: Use Clang to convert C++ to C code. However the generated code is not readable. I want to get maintainable C source code to obsolete the original C++ code.
I'm a newbie at Clang. I have learnt that Clang can be used to build a tool that processes code. My question is:
Is it possible using Clang to achieve my goal?
If it is possible, how can I do that? To be more specific, how can I use Clang to remove code blocks wrapped by macro as the first step?

In practice converting (semi-automatically) genuine C++ code to maintainable C code is not realistic.
I want to get maintainable C source code to obsolete the original C++ code.
You certainly won't get maintainable and readable and portable C code (for instance, as soon as standard containers are used in C++; their template expansion is not readable, and probably not portable to something with different word size, alignment, endianness ...). You could transform LLVM IR to some unportable and unreadable subset of C.
It works well except that libstdc++ is required, which means an extra 1M memory is occupied
Perhaps you could try linking (everything) statically; maybe only a part of libstdc++ is used in your particular application.
BTW, you could get GIMPLE from GCC, and convert that GIMPLE to unreadable C code (perhaps by customizing GCC with a plugin or a GCC MELT extension).
You might also try to compile and link with link time optimization, e.g. with -flto -Os (with recent GCC or Clang).
Don't forget that development efforts also has some costs. Is it worth spending a whole year of work (or more) for a team of a few developers to win a few hundred kilobytes? In most cases upgrading the hardware to something with slightly more memory would cost a lot less. YMMV

Related

How to call an assembly function from C++ dynamically?

REQUIREMENT: For a certain project we have unique requirement. The application supports an expression language that allows the user to define their own complex expressions that can be evaluated at run time (many hundred times a second) and they need to be executed at machine level for performance.
WORKING: Our expression parser translates the script into corresponding assembly language routine perfectly. We checked it by statically linking the object files generated with our C test program and they produce correct result.
Since the client can change the script anytime, our program (at run time) detects the change, calls the parser which generates the corresponding assembly routine. We then call the assembler from back end to create the object code.
PROBLEM
How can we call this assembly routine dynamically from the C++ program
(Loader)?
We are not supposed to call the C++ compiler to link it with the loader because the loader already would have other subroutines running and we cannot take the loader off, recompile and then execute the new loader program.
I tried searching for a solution online but every time the results are littered with .NET assembly dynamic calling. Our app has nothing to do with .NET.
First, the "generated plugin" approach (on Linux; my answer focuses on Linux but could be adapted to Windows with some effort; you could use many-platform frameworks like Qt or POCO or Glib from GTK; then all wrap plugin loading abilities à la dlopen with a common API that you could use on Windows, on Linux, on MacOSX, on Android) :
generate C (or assembly) code in some file /tmp/generated01.c (you might even generate C++ code using standard C++ containers, but its compilation would be significantly slower; beware of name mangling so emit and use extern "C" functions; read the C++ dlopen mini HowTo). See this answer explaining why generating C is worthwhile (and could be better, and more portable, than generating assembler code).
run (using fork+execve+waitpid, or simply system) a compilation of that generated file into a shared object /tmp/genenerated01.so by running gcc -fPIC -Wall -O /tmp/generated01.c -shared -o /tmp/generated01.so command; you practically need to get position-independent code, hence the -fPIC flag. If using dlopen on your generated assembler code you'll need to improve your assembler generator to emit PIC code.
dlopen that new /tmp/generated01.so (so use the dynamic linker), see dlopen(3); you could even remove the now useless generated C file /tmp/generated01.c
dlsym the relevant symbols to get function pointers to the generated code, see dlsym(3); your application would simply call the generated code using these function pointers.
when you are sure that you don't need any functions from it and that no call frame uses it, you could dlclose that shared object library (but you might accept to leak some address space by not calling dlclose at all)
The above approach is worthwhile and can be used a big lot of times (my manydl.c demonstrates that you could dlopen a million different shared objects), and is practically even compatible (even when emitting C code!) with an interactive Read-Eval-Print-Loop -on most current desktops and laptops and servers-, since most of the time the generated /tmp/generated01.c would be quite small (e.g. a few hundred lines at most) to be very quickly generated and compiled (by gcc, etc...). I am even using this in MELT for its REPL mode. On Linux this plugin approach generally requires to link the main application with -rdynamic (so that dlopen-ed plugins can reference and call functions from the main application).
Then, other approaches could be to use some Just-In-Time compilation library, like
GNU lightning (which emits slow machine code very quickly - so very short JIT emission time, but the generated code is running slowly since it is very unoptimized)
asmjit; it is x86-64 specific, and enables you to generate individual x86-64 machine instructions
GNU libjit is available for several platforms, and offer an "interpreter" mode for other platforms
LLVM (part of Clang/LLVM compiler, usable as a JIT library)
GCCJIT (a new JIT library front-end to GCC)
Grossly speaking, the first elements of that list are able to emit JIT machine code fairly quickly, but that code won't run as fast as compiling with gcc -fPIC -O1 or -O2 the equivalent generated C code (but would run typically 2x to 5x slower!); the last two elements (LLVM & GCCJIT) are compiler based: so they are able to optimize and emit efficient code, at the expense of slower JIT code emission. All the JIT libraries are able (like dlsym does for plugins) to give function pointers to newly JIT-constructed functions.
Notice that there is a trade-off to be made: some techniques are able to generate quickly some machine code, if you accept that generated code to later run a bit slowly; other techniques (notably GCCJIT or LLVM) are spending time to optimize the generated machine code, so takes more time to emit the machine code, but that code would later run quickly. You should not expect both (small generation time, quick execution time), since there is no such thing as a free lunch.
I believe that generating manually some assembler code is practically not worthwhile. You won't be able to generate very optimized code (because optimization is a very difficult art, and both GCC and Clang have millions of source line code for optimization passes), unless you spend many years of work for that. Using some JIT library is easier, and "compiling" to C or C++ is also quite easy (you leave the burden of optimization to the C compiler you are calling).
You could also consider rewriting your application into some language with homoiconicity and metaprogramming abilities (e.g. multi-stage programming), such as Common Lisp (and many others, e.g. those providing eval). Its SBCL implementation is always emitting machine code...
You could also embed an interpreter like Lua -perhaps even LuaJit- or Guile in your application. The main advantage of embedding an existing language is that there are resources (books, modules, ...) and community of people knowing them (designing a good language is difficult!). Also, the embedded interpreter library is well designed and probably well debugged (since used a lot), and some of them are fast enough (since using bytecode techniques).
As the comments already suggest, LoadLibrary (Windows) and dlopen (Linux/POSIX) are by far the easiest solution. These are specifically intended to dynamically load code. Equally important, they both allow unloading as well, and there are functions to then get a function entry point by name.
You can dynamically do it. I will take linux case as an example. Since your parser working fine and generates machine code, you should be able to generate .so (for linux) or .dll for windows.
Next, load the library as
handle = dlopen(so_file_name, RTLD_LAZY);
Next get function pointer
func = dlsym(handle, "function_name");
Then you should be able to execute it as func()
One thing you need to experiment (in case you do not get desired result) is close and open the so file or dll file (you need to do only if required, else it may reduce performance)
It sounds like you can generate the proper byte code. So you could just ensure that you generate position independent code, write it into an executable piece of memory, and then call or create thread upon the code. The simplest way would just be to cast the pointer to the base of the memory you wrote the code into as a function pointer, and then call it.
If you write your bytecode to avoid referencing different sections, and instead reference offsets from its loaded base, 'loading' the code is as easy as writing it to executable memory. You could do a call/pop/jmp to find the base of the code once it begins executing.
Conversely, and probably the easiest solution, would be to just write the code out as function expecting arguments, that way you could pass the code's base and any other arguments to it, as you would with any other function, as long as you use the proper typedef for your function pointer, and the generated assembly handles the arguments properly. As long as you avoid creating absolute jumps or data references to absolute addresses, you shouldn't have any issue.
too late but I think it would help someone else.
in case you want to dynamically execute a piece of code, you can create an interpreter for this.
compile your expressions into some byte code then write the interpreter for executing this.
here is a tutorial about writing interpreters, but in python.
https://ruslanspivak.com/lsbasi-part1/
you can write it using c/c++

C++ add new code during runtime

I am using C++ (in xcode and code::blocks), I don't know much.
I want to make something compilable during runtime.
for eg:
char prog []={"cout<<"helloworld " ;}
It should compile the contents of prog.
I read a bit about quines , but it didn't help me .
It's sort of possible, but not portably, and not simply.
Basically, you have to write the code out to a file, then
compile it to a dll (invoking the compiler with system), and
then load the dll. The first is simple, the last isn't too
difficult (but will require implementation specific code), but
the middle step can be challenging: obviously, it only works if
the compiler is installed on the system, but you have to find
where it is installed, verify that it is the same version (or
at least a version which generates binary compatible code),
invoke it with the same options that were used when your code
was compiled, and process any errors.
C++ wasn't designed for this. (Compiled languages generally
aren't.)
The short answer is "no, you can't do that". C and C++ were never designed to do this.
That's pretty much also the long answer to the actual question, but I'll expand a bit on a few ideas.
The code, as compiled by the compiler is pretty certainly not trivial to add things to. There are a few techniques that can be used to "add more code" to a program:
Add a dynamic shared library (DLL), which contains code that has been compiled separately to the existing code. You could of course also have code in your program to output some code, compile this code with the compiler, link it into a dynamic library, and load it in your code.
You could build your own little code-generator that generates machine code in a chunk of memory. Note that you probably need to call a "special" memory allocation function, as "normal" memory allocations are typically not allowed to be executed - you need to allocate "with execute permission" - VirtualAlloc in Windows does have such a flag, and mmap in Linux/Unix flavours does too. And of course, you pretty much have to "be a compiler" to achieve this.
You could naturally also invent your own interpreted language, which would allow your program to load in for example a text-file with commands/instructions to be executed, or contain text inside the program for execution with this language.
But like I said to start with, this is not what C and C++ (and most other compiled languages) were meant for, so it's not going to be as simple as "stick some C++ code in a string, and make it run".
It depends why you want to do this.
If it's for efficiency reasons - you know what a function does only at run time, but it has to be very efficient - then what was already suggested (writing to a file, compiling to a dll / so and dynamically loading it) is your best option.
BUT if the reason you want this is to allow for user-input behaviour, say a general function your read from a database (behaviour or a unit ingame? value of a field in a plot?) - or more generally you just want to change / augment behaviour at runtime with little concern for efficiency, I recommend using an outside scripting language like lua, which easily interacts with your compiled C++ code.
The C and C++ languages compile to binary machine code, unlike Java and C# which generate instructions for a 'virtual machine' or interpreted scripting languages such as JavaScript. The compilation of C++ is performed by a separate executable, the compiler, which is not incorporated into the resulting executable.
So the language does not have any built in "eval" capability to translate further code once compilation is finished.
It's not uncommon for new C/C++ programmers to think they need to do this, but they typically don't. Perhaps you could expand further on what you're actually looking to do.
But if you do actually need to be able to do this, your options are:
Write code to compile a new executable with the new code and then run the resulting program.
Write a simple parser and "virtual machine" of your own,
Look at incorporating an embedded scripting/interpreted language such as Lua,
Try and wrap your head around integrating CINT,
See also: Scripting language for C++

Get optimized source code from GCC

I have a task to create optimized C++ source code and give it to friend for compilation. It means, that I do not control the final compilation, I just write the source code of C++ program.
I know, that a can make optimization during compilation with -O1 (and -O2 and others) options of GCC. But how can I get this optimized source code instead of compiled program? I am not able to configure parameters of my friend's compiler, that is why I need to make a good source on my side.
The optimizations performed by GCC are low level, that means you won't get C++ code again but assembly code in best case. But you won't be able to convert it or something.
In sum: Optimize the source code on code level, not on object level.
You could ask GCC to dump its internal (Gimple, ...) representations, at various "stages". The middle-end of GCC is made of hundreds of passes, and you could ask GCC to dump them, with arguments like -fdump-tree-all or -fdump-gimple-all; beware that you can get hundreds of dump files for a single compilation!
However, GCC internal representations are quite low level, and you should not expect to understand them without reading a lot of material.
The dump options I am mentionning are mostly useful to those working inside GCC, or extending it thru plugins coded in C or extensions coded in MELT (a high-level domain specific language to extend GCC). I am not sure they will be very useful to your friend. However, they can be useful to make you understand that optimization passes do a lot of complex processing.
And don't forget that premature optimization is evil : you should first make your program run correctly, then benchmark and profile it, at last optimize the few parts worth of your efforts. You probably won't be able to write correct & efficient programs without testing and running them yourself, before giving them to your friend.
Easy - choose the best algorithm possible, let the rest be handled by the optimizer.
Optimizing the source code is different than optimizing the binary. You optimize the source code, the compiler will optimize the binary.
For anything more than algorithm choice, you'll need to do some profiling. Sure, there are practices that can speed up code speed, but some make the code less readable. Only optimize when you have to, and after you measure.

Are there conclusive studies/experiments on C compilation using a C++ compiler?

I've seen a lot of arguments over the general performance of C code compiled with a C++ compiler -- I'm curious as to whether there are any solid experimental studies buried beneath all the anecdotal flame wars you find in web searches. I'm particularly interested in the GCC suite, but any data points would be interesting. (Comparing the assembly of "Hello, World!" is not as robust as I'd like. :-)
I'm generally assuming you use the "embedded style" flags -- no exceptions or RTTI. I also wouldn't mind knowing if there are studies on the compilation time itself. TIA!
Adding a datapoint (or at least an anecdote):
We were recently writing a math library for a small embedded-like target, and started writing it in C. About halfway through the project, we switched some of the files to C++, largely in order to use templates for some of the functions where we'd otherwise be writing many nearly-identical pieces of code (or else embedding 40-line functions in preprocessor macros).
At the point where we started switching over, we had a very careful look at the generated assembly code (using GCC) on a number of the functions, and confirmed that it was in fact essentially identical whether the file was compiled as C or C++ -- where by "essentially identical" I mean the differences were in things like symbol names and the stuff at the beginning and end of the assembly file; the actual instructions in the middle of the functions were exactly identical.
Sorry that I don't have a more solid answer.
Edit to add, 2013-03-24: Recently I came across an article where Rusty Russell compared performance on GCC compiled with a C compiler and compiled with a C++ compiler, in response to the recent switch to compiling GCC as C++: http://rusty.ozlabs.org/?p=330. The conclusions are interesting: The version compiled with a C++ compiler was very slightly slower; the difference was about 0.3%. However, that was entirely explained by load time differences caused by larger debug info; when he stripped the binaries and removed the debug info, the differences were less than 0.1% -- i.e., essentially indistinguishable from measurement noise.
I don't know of any studies off-hand, but given the C++ philosophy that you don't pay the price for features you don't use, I doubt there'd be any significant difference between compiling C code with the C compiler and with the C++ compiler.
I don't know of any studies and I doubt that anyone will spend the time to do them. Basically, when compiling with a C++ compiler, the code has the same semantic as when compiling with a C compiler, so it's down to optimization and code generation. But IMO these are much too much compiler-specifc in order to allow any general statements about C vs. C++.
What you mainly gain when you compile C code with a C++ compiler is a much stricter checking (function declarations etc.). IMO this would make compiling C code with a C++ compiler quite attractive. But note that, if you have a large C code base that's never run through a C++ compiler, you're likely facing a very steep up-hill battle until the code compiles clean enough to be able to see any meaningful warnings.
The GCC project is currently under a transition from C to C++ - that is, GCC may be implemented in C++ in the future, it is currently written in C. The next release of GCC will be written in the subset of C which is also valid C++.
Some performance tests were performed on g++ vs gcc, on GCC's codebase. They compared the "bootstrap" time, which means compiling gcc with the sysmem compiler, then compiling it with the resulting compiler, then repeating and checking the results are the same.
Summary: Using g++ was 20% slower. The compiler versions were slightly different, but it was thought that this wouldn't cause there 20% difference.
Note that this measures different programs, gcc vs g++, which although they mostly use the same code, have different front-ends.
I've not tried it from a performance standpoint, but I think compiling C applications with the C++ compiler is a good idea, as it will prevent you from doing "naughty" things such as using functions not declared.
However, the output won't be the same - at the very least, you'll get different symbols, which will render it (mostly) unlinkable with code from the C compiler.
So I think what you really mean is "Is it ok from a performance standpoint, to write C++ code which is very C-like and compile it with the C++ compiler" ?
You would also have to not be using some C99 things such as bool_t which C++ doesn't support on the grounds of having its own ones.
Don't do it if the code has not been designed for. The same valid language constructs can lead to different behavior if interpreted as C or as C++. You would potentially introduce very difficult to understand bugs. Less problematic but still a maintainability nightmare; some C constructs (especially from C99) are not valid in C++.
In the past I have done things like look at the size of the binary which for C++ was huge, that doesnt mean they simply linked in a bunch of unusued libraries. The easiest might be to use gcc -S myprog.cpp vs gcc -S myprog.c and diff the assembler output.

How to convert C++ Code to C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have some C++ code. In the code there are many classes defined, their member functions, constructors, destructors for those classes, few template classes and lots of C++ stuff. Now I need to convert the source to plain C code.
I the have following questions:
Is there any tool to convert C++ code and header files to C code?
Will I have to do total rewrite of the code (I will have to remove the constructors,destructors and move that code into some init(), deinit() functions; change classes to structures, make existing member functions as function pointers in those newly defined structures and then invoke those functions using function pointers etc..)?
If I have to convert it manually myself, what C++ specific code-data constructs/semantics do I need to pay attention to while doing the conversion from C++ to C?
There is indeed such a tool, Comeau's C++ compiler. . It will generate C code which you can't manually maintain, but that's no problem. You'll maintain the C++ code, and just convert to C on the fly.
http://llvm.org/docs/FAQ.html#translatecxx
It handles some code, but will fail for more complex implementations as it hasn't been fully updated for some of the modern C++ conventions. So try compiling your code frequently until you get a feel for what's allowed.
Usage sytax from the command line is as follows for version 9.0.1:
clang -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
clang -march=c CPPtoC.bc -o CPPtoC.c
For older versions (unsure of transition version), use the following syntax:
llvm-g++ -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
llc -march=c CPPtoC.bc -o CPPtoC.c
Note that it creates a GNU flavor of C and not true ANSI C. You will want to test that this is useful for you before you invest too heavily in your code. For example, some embedded systems only accept ANSI C.
Also note that it generates functional but fairly unreadable code. I recommend commenting and maintain your C++ code and not worrying about the final C code.
EDIT : although official support of this functionality was removed, but users can still use this unofficial support from Julia language devs, to achieve mentioned above functionality.
While you can do OO in C (e.g. by adding a theType *this first parameter to methods, and manually handling something like vtables for polymorphism) this is never particularly satisfactory as a design, and will look ugly (even with some pre-processor hacks).
I would suggest at least looking at a re-design to compare how this would work out.
Overall a lot depends on the answer to the key question: if you have working C++ code, why do you want C instead?
Maybe good ol' cfront will do?
A compiler consists of two major blocks: the 'front end' and the 'back end'.
The front end of a compiler analyzes the source code and builds some form of a 'intermediary representation' of said source code which is much easier to analyze by a machine algorithm than is the source code (i.e. whereas the source code e.g. C++ is designed to help the human programmer to write code, the intermediary form is designed to help simplify the algorithm that analyzes said intermediary form easier).
The back end of a compiler takes the intermediary form and then converts it to a 'target language'.
Now, the target language for general-use compilers are assembler languages for various processors, but there's nothing to prohibit a compiler back end to produce code in some other language, for as long as said target language is (at least) as flexible as a general CPU assembler.
Now, as you can probably imagine, C is definitely as flexible as a CPU's assembler, such that a C++ to C compiler is really no problem to implement from a technical pov.
So you have: C++ ---frontEnd---> someIntermediaryForm ---backEnd---> C
You may want to check these guys out: http://www.edg.com/index.php?location=c_frontend
(the above link is just informative for what can be done, they license their front ends for tens of thousands of dollars)
PS
As far as i know, there is no such a C++ to C compiler by GNU, and this totally beats me (if i'm right about this). Because the C language is fairly small and it's internal mechanisms are fairly rudimentary, a C compiler requires something like one man-year work (i can tell you this first hand cause i wrote such a compiler myself may years ago, and it produces a [virtual] stack machine intermediary code), and being able to have a maintained, up-to-date C++ compiler while only having to write a C compiler once would be a great thing to have...
This is an old thread but apparently the C++ Faq has a section (Archived 2013 version) on this. This apparently will be updated if the author is contacted so this will probably be more up to date in the long run, but here is the current version:
Depends on what you mean. If you mean, Is it possible to convert C++ to readable and maintainable C-code? then sorry, the answer is No — C++ features don't directly map to C, plus the generated C code is not intended for humans to follow. If instead you mean, Are there compilers which convert C++ to C for the purpose of compiling onto a platform that yet doesn't have a C++ compiler? then you're in luck — keep reading.
A compiler which compiles C++ to C does full syntax and semantic checking on the program, and just happens to use C code as a way of generating object code. Such a compiler is not merely some kind of fancy macro processor. (And please don't email me claiming these are preprocessors — they are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C++ by translation to C, and except for exception handling, it typically results in object code with efficiency comparable to that of the code generated by a conventional C++ compiler.
Here are some products that perform compilation to C:
Comeau Computing offers a compiler based on Edison Design Group's front end that outputs C code.
LLVM is a downloadable compiler that emits C code. See also here and here. Here is an example of C++ to C conversion via LLVM.
Cfront, the original implementation of C++, done by Bjarne Stroustrup and others at AT&T, generates C code. However it has two problems: it's been difficult to obtain a license since the mid 90s when it started going through a maze of ownership changes, and development ceased at that same time and so it doesn't get bug fixes and doesn't support any of the newer language features (e.g., exceptions, namespaces, RTTI, member templates).
Contrary to popular myth, as of this writing there is no version of g++ that translates C++ to C. Such a thing seems to be doable, but I am not aware that anyone has actually done it (yet).
Note that you typically need to specify the target platform's CPU, OS and C compiler so that the generated C code will be specifically targeted for this platform. This means: (a) you probably can't take the C code generated for platform X and compile it on platform Y; and (b) it'll be difficult to do the translation yourself — it'll probably be a lot cheaper/safer with one of these tools.
One more time: do not email me saying these are just preprocessors — they are not — they are compilers.