JIT compilation of C++ templates at compile time - c++

Would JIT-compiling a C++ template at compile time be a viable strategy for faster compile times? Is this maybe already done in large compilers like LLVM, and if not, what are the (maybe obvious) downsides making this non-viable?
For clarification, what I mean is that one takes the C++ template language, not as an interpreted system for generating C++ AST, but as a JIT-compilable language that one passes to e.g. LLVMJit or similar systems that emit binary blobs that in-turn generate the resulting AST of the template application result, given the template arguments.
Would this theoretically speed up some compilation times? AFAIK, JIT/interpretation speedup depends heavily on the frequency of the called code, but I can imagine some templates as being applied many times.

Would JIT-compiling a C++ template at compile time be a viable strategy for faster compile times?
Certainly yes, but implementing that idea will take you several years of work full-time. Consider making your PhD thesis on that topic (the issue is to find a good PhD advisor).
In practice, every modern C++ compiler (e.g. GCC or Clang/LLVM) evolved from a simpler C (or old C++) compiler.
Another (perhaps related) research topic is to make a C++ compiler with both JIT-compiling techniques and multi-threading.
Both GCC or Clang/LLVM are (in end of 2021) monothreaded compilers. You might consider adding several pthreads inside them.
The issue is to find a good qualified PhD advisor. I am not qualified enough for that role.
A related book is of course: Pitrat's book (yellow cover) on Artificial Beings, the conscience of a conscious machine
On Linux, a possibility could be to generate GCC plugins tailored in compiling a known set of C++ templates, and dlopen-ed at compile time. My old Bismon could be a starting point. Or the RefPerSys project....

Related

Use clang as a library to parse OpenCL code extended with some C++ elements

I am currently working on a Source-to-source compiler that transforms code wirtten in an OpenCL superset to "ordinary" OpenCL. I would really like to use clang as a library to parse and analyze the source code. Especially, I really need all the available type information and I would like to have an AST to make use of clang's Rewrite capabilities.
Fortunately, the OpenCL superset that needs to be parsed is really a "mixture" between OpenCL and C++, i.e. the code is basically OpenCL extended with some C++ stuff. In detail, there are possibly template annotations before a function definition and there may be structs containing methods (including operator definitions).
I was hoping that I can use clang to parse this language, since the clang parser is capable of parsing all these constructs. However, I am not sure how (if possible) to tell clang to parse OpenCL and C++ constructs at once. If possible, I really want to avoid touching the clang code base, but I would prefer using clang as a library instead. Maybe it is possible to setup an appropriate instance of clang's LangOptions class that tells clang to parse all these constructs?
Any ideas on how to make clang parse this mixture between OpenCL and C++? Any help is appreciated, and thanks in advance!
You're trying to mix two different front ends, involving both parsing and name resolution.
I think you are in for a rough trip. The key problem is you are trying to glue together things that had no effort expended, to make them gluable. This usually leads to integration hell. You don't see people doing this with Fortran and C++ for the same reasons.
To start with, you'll discover you will have to define the semantics of how the C++ extensions interact with those of OpenCL. If you check out the C++ standard, you'll discover 600 pages of results from committee arguments on how C++ interacts with itself. So unless you can define a radically simple interaction, you'll have a tough time knowing what your mixed OpenCL/C++ program means.
Your second problem will be interleaving the Clang parsing machinery for C++ (AFAIK hand written code) with the Clang parsing machinery for OpenCL (don't know anything about it, but assumed it follows the C++ style). There's no obviously good reason to believe you can just pick and choose these to interleave easily. It may work out fine; just not a bet I'd care to make.
The next place you are likely to have trouble is in building an AST for the joint language. Maybe it is the case that Clang has defined AST nodes for both C++ and OpenCL in a way that easily composes to a joint Clang/OpenCL tree. Since the node types are chosen by hand, and there was no specific reason to design them to work together, it is also not obvious they will compose nicely.
Your last task, given a "valid" OpenCL/C++ tree, to transform it to OpenCL. How in fact will you expand a C++ template (or any general C++ code) to OpenCL code?
[Check my bio for another system, DMS, that might be a bit better for this task; it provides uniform infrastructure for multiple languages that would make some of this easier. Somewhat similar to what you are trying to do, we have used DMS to mix C++ with F90 and APL concepts for easy expression of vector operations in a prototype Vector C++, but we did not try to preserve F90 and APL syntax and semantics exactly for all the above reasons].
It isn't my purpose to rain on your parade; progress is made by the unreasonable man. Just be sure you understand how big a task you are taking on.

Compiler memory consumption with template libraries (boost + Eigen)

I am writing a template algorithm that makes use of boost::accumulators and the Eigen linear algebra library.
While compiling, the visual studio compiler (cl.exe), memory consumption peaks at over 2.5GB of RAM, and my PC (windows 7 32 bit with 3GB virtual address space) becomes unresponsive (for quite a long time: ~1 minute). The binary files (.obj) are 10-20MB for these compilation units.
My questions (not directed towards these specific libraries)
Is this normal behavior for code that heavily uses templates?
Is there something that can be done to reduce the memory demands and
compile time?
If there is no good solution to the problem, why isn't this
addressed by the people that design the programming language? The
more people understand C++, the more they are likely to use templates, and generate hard-to compile code, and bloated binaries.
If there is no good solution to the problem, why isn't this addressed
by the people that design the programming language?
Because there is no good solution, full stop.
The problem you are talking about has nothing to do with C++. It's an artefact from C- the old "translation unit". Fixing this problem would require re-doing the compilation model. The C++ Committee has been trying for years to make this happen without breaking every single line of existing C++ out there (which is a bigger consideration), but it's not a trivial problem. Fixing it would require vast changes.
Also, Clang has way better performance, and newer versions of GCC which are variadic-template-equipped can do as well.
Yes. Compiling templates is memory consuming. Some implementations suck more than others. In my personal experience, out of GCC, MSVC and Clang, the latter is the best at managing its memory use.
You can split your huge source files into several smaller ones. That would even out the load over several compile steps.
The people who designed the programming language only cared very little for the implementation, to give compiler writers enough freedom to excel and compete. Or Suck.

Explanation of CUDA C and C++

Can anyone give me a good explanation as to the nature of CUDA C and C++? As I understand it, CUDA is supposed to be C with NVIDIA's GPU libraries. As of right now CUDA C supports some C++ features but not others.
What is NVIDIA's plan? Are they going to build upon C and add their own libraries (e.g. Thrust vs. STL) that parallel those of C++? Are they eventually going to support all of C++? Is it bad to use C++ headers in a .cu file?
CUDA C is a programming language with C syntax. Conceptually it is quite different from C.
The problem it is trying to solve is coding multiple (similar) instruction streams for multiple processors.
CUDA offers more than Single Instruction Multiple Data (SIMD) vector processing, but data streams >> instruction streams, or there is much less benefit.
CUDA gives some mechanisms to do that, and hides some of the complexity.
CUDA is not optimised for multiple diverse instruction streams like a multi-core x86.
CUDA is not limited to a single instruction stream like x86 vector instructions, or limited to specific data types like x86 vector instructions.
CUDA supports 'loops' which can be executed in parallel. This is its most critical feature. The CUDA system will partition the execution of 'loops', and run the 'loop' body simultaneously across an array of identical processors, while providing some of the illusion of a normal sequential loop (specifically CUDA manages the loop "index"). The developer needs to be aware of the GPU machine structure to write 'loops' effectively, but almost all of the management is handled by the CUDA run-time. The effect is hundreds (or even thousands) of 'loops' complete in the same time as one 'loop'.
CUDA supports what looks like if branches. Only processors running code which match the if test can be active, so a subset of processors will be active for each 'branch' of the if test. As an example this if... else if ... else ..., has three branches. Each processor will execute only one branch, and be 're-synched' ready to move on with the rest of the processors when the if is complete. It may be that some of the branch conditions are not matched by any processor. So there is no need to execute that branch (for that example, three branches is the worst case). Then only one or two branches are executed sequentially, completing the whole if more quickly.
There is no 'magic'. The programmer must be aware that the code will be run on a CUDA device, and write code consciously for it.
CUDA does not take old C/C++ code and auto-magically run the computation across an array of processors. CUDA can compile and run ordinary C and much of C++ sequentially, but there is very little (nothing?) to be gained by that because it will run sequentially, and more slowly than a modern CPU. This means the code in some libraries is not (yet) a good match with CUDA capabilities. A CUDA program could operate on multi-kByte bit-vectors simultaneously. CUDA isn't able to auto-magically convert existing sequential C/C++ library code into something which would do that.
CUDA does provides a relatively straightforward way to write code, using familiar C/C++ syntax, adds a few extra concepts, and generates code which will run across an array of processors. It has the potential to give much more than 10x speedup vs e.g. multi-core x86.
Edit - Plans: I do not work for NVIDIA
For the very best performance CUDA wants information at compile time.
So template mechanisms are the most useful because it gives the developer a way to say things at compile time, which the CUDA compiler could use. As a simple example, if a matrix is defined (instantiated) at compile time to be 2D and 4 x 8, then the CUDA compiler can work with that to organise the program across the processors. If that size is dynamic, and changes while the program is running, it is much harder for the compiler or run-time system to do a very efficient job.
EDIT:
CUDA has class and function templates.
I apologise if people read this as saying CUDA does not. I agree I was not clear.
I believe the CUDA GPU-side implementation of templates is not complete w.r.t. C++.
User harrism has commented that my answer is misleading. harrism works for NVIDIA, so I will wait for advice. Hopefully this is already clearer.
The hardest stuff to do efficiently across multiple processors is dynamic branching down many alternate paths because that effectively serialises the code; in the worst case only one processor can execute at a time, which wastes the benefit of a GPU. So virtual functions seem to be very hard to do well.
There are some very smart whole-program-analysis tools which can deduce much more type information than the developer might understand. Existing tools might deduce enough to eliminate virtual functions, and hence move analysis of branching to compile time. There are also techniques for instrumenting program execution which feeds directly back into recompilation of programs which might reach better branching decisions.
AFAIK (modulo feedback) the CUDA compiler is not yet state-of-the-art in these areas.
(IMHO it is worth a few days for anyone interested, with a CUDA or OpenCL-capable system, to investigate them, and do some experiments. I also think, for people interested in these areas, it is well worth the effort to experiment with Haskell, and have a look at Data Parallel Haskell)
CUDA is a platform (architecture, programming model, assembly virtual machine, compilation tools, etc.), not just a single programming language. CUDA C is just one of a number of language systems built on this platform (CUDA C, C++, CUDA Fortran, PyCUDA, are others.)
CUDA C++
Currently CUDA C++ supports the subset of C++ described in Appendix D ("C/C++ Language Support") of the CUDA C Programming Guide.
To name a few:
Classes
__device__ member functions (including constructors and destructors)
Inheritance / derived classes
virtual functions
class and function templates
operators and overloading
functor classes
Edit: As of CUDA 7.0, CUDA C++ includes support for most language features of the C++11 standard in __device__ code (code that runs on the GPU), including auto, lambda expressions, range-based for loops, initializer lists, static assert, and more.
Examples and specific limitations are also detailed in the same appendix linked above. As a very mature example of C++ usage with CUDA, I recommend checking out Thrust.
Future Plans
(Disclosure: I work for NVIDIA.)
I can't be explicit about future releases and timing, but I can illustrate the trend that almost every release of CUDA has added additional language features to get CUDA C++ support to its current (In my opinion very useful) state. We plan to continue this trend in improving support for C++, but naturally we prioritize features that are useful and performant on a massively parallel computational architecture (GPU).
Not realized by many, CUDA is actually two new programming languages, both derived from C++. One is for writing code that runs on GPUs and is a subset of C++. Its function is similar to HLSL (DirectX) or Cg (OpenGL) but with more features and compatibility with C++. Various GPGPU/SIMT/performance-related concerns apply to it that I need not mention. The other is the so-called "Runtime API," which is hardly an "API" in the traditional sense. The Runtime API is used to write code that runs on the host CPU. It is a superset of C++ and makes it much easier to link to and launch GPU code. It requires the NVCC pre-compiler which then calls the platform's C++ compiler. By contrast, the Driver API (and OpenCL) is a pure, standard C library, and is much more verbose to use (while offering few additional features).
Creating a new host-side programming language was a bold move on NVIDIA's part. It makes getting started with CUDA easier and writing code more elegant. However, truly brilliant was not marketing it as a new language.
Sometimes you hear that CUDA would be C and C++, but I don't think it is, for the simple reason that this impossible. To cite from their programming guide:
For the host code, nvcc supports whatever part of the C++ ISO/IEC
14882:2003 specification the host c++ compiler supports.
For the device code, nvcc supports the features illustrated in Section
D.1 with some restrictions described in Section D.2; it does not
support run time type information (RTTI), exception handling, and the
C++ Standard Library.
As I can see, it only refers to C++, and only supports C where this happens to be in the intersection of C and C++. So better think of it as C++ with extensions for the device part rather than C. That avoids you a lot of headaches if you are used to C.
What is NVIDIA's plan?
I believe the general trend is that CUDA and OpenCL are regarded as too low level techniques for many applications. Right now, Nvidia is investing heavily into OpenACC which could roughly be described as OpenMP for GPUs. It follows a declarative approach and tackles the problem of GPU parallelization at a much higher level. So that is my totally subjective impression of what Nvidia's plan is.

Do all C++ compilers generate C code?

Probably a pretty vague and broad question, but do all C++ compilers compile code into C first before compiling them into machine code?
Because C compilers are nearly ubiquitous and available on nearly every platform, a lot of (compiled) languages go through this phase in their development to bootstrap the process.
In the early phases of language development to see if the language is feasible the easiest way to get a working compiler out is to build a compiler that converts your language to C then let the native C compiler build the actual binary.
The trouble with this is that language specific constructs are lost and thus potential opportunities for optimization may be missed thus most languages in phase two get their own dedicated compiler front end that understands language specific constructs and can thus provide optimization strategies based on these constructs.
C++ has gone through phase 1 and phase 2 over two decades ago. So it is easy to find a `front end' of a compiler that is dedicated to C++ and generates an intermediate format that is passed directly to a backed. But you can still find versions of C++ that are translated into C (as an intermediate format) before being compiled.
Nope. GCC for example goes from C++ -> assembler. You can see this by using the -S option with g++.
Actually, now that I think about it, I don't think any modern compiler goes to C before ASM.
No. C++ -> C was used only in the earliest phases of C++'s development and evolution. Most C++ compilers today compile directly to assembler or machine code. Borland C++ compiles directly to machine code, for example.
No. This is a myth, based around the fact that a very early version of Stroustrup's work was implemented that way. C++ compilers generate machine code in almost exactly the same way that C compilers do.
As of this writing in 2010, the only C++ compiler that I was aware of that created C code was Comeau*. However, that compiler hasn't been heard from in over 5 years now (2022). There may be one or two more for embedded targets, but it is certainly not a mainstream thing.
* - There's a link to their old website on this WP page. I'd suggest not clicking that unless your computer has all its shots up to date
This is not defined by the standard. Certainly, compiling to C-source is a reasonable way to do it. It only requires the destination platform to have a C-compiler with a reasonable degree of compliance, so it is a highly portable way of doing things.
The downside is speed. Probably compilation speed and perhaps also execution speed (due to loads of casts for e.g. virtual functions that prevents the compiler to optimise fully) will suffer.
Not that long ago there was a company that had a very nice C++ compiler doing exactly that. Unfortunately, I do not remember the name of the company and a short google did not bring the name back. The owner of the company was an active participant in the ISO C++ committee and you could test your code directly on the homepage, which also had some quite decent ressources about C++.
Edit: one of my fellow posters just reminded me. I was talking about Comeau, of course.

Developing embedded software library, C or C++?

I'm in the process of developing a software library to be used for embedded systems like an ARM chip or a TI DSP (for mostly embedded systems, but it would also be nice if it could also be used in a PC environment). Obviously this is a pretty broad range of target systems, so being able to easily port to different systems is a priority.The library will be used for interfacing with a specific hardware and running some algorithms.
I am thinking C++ is the best option, over C, because it is much easier to maintain and read. I think the additional overhead is worth it for being able to work in the object oriented paradigm. If I was writing for a very specific system, I would work in C but this is not the case.
I'm assuming that these days most compilers for popular embedded systems can handle C++. Is this correct?
Is there any other factors I should consider? Is my line of thinking correct?
If portability is very important for you, especially on an embedded system, then C is certainly a better option than C++. While C++ compilers on embedded platforms are catching up, there's simply no match for the widespread use of C, for which any self-respecting platform has a compliant compiler.
Moreover, I don't think C is inferior to C++ where it comes to interfacing hardware. The amount of abstraction is sufficiently low (i.e. no deep class hierarchies) to make C just as good an option.
There is certainly good support of C++ for ARM. ARM have their own compiler and g++ can also generate EABI compliant ARM code. When it comes to the DSPs, you will have to look at their toolchain to decide what you are going to do. Be aware that the library that comes with a DSP may well not implement the full C or C++ standard library.
C++ is suitable for low-level embedded development and is used in the SymbianOS Kernel. Having said that, you should keep things as simple as possible.
Avoid exceptions which may demand more library support than what is present (therefore use new (std::nothrow) Foo instead of new Foo).
Avoid memory allocations as much as possible and do them as early as possible.
Avoid complex patterns.
Be aware that templates can bloat your code.
I have seen many complaints that C++ is "bloated" and inappropriate for embedded systems.
However, in an interview with Stroustrup and Sutter, Bjarne Stroustrup mentioned that he'd seen heavily templated C++ code going into (IIRC) the braking systems of BMWs, as well as in missile guidance systems for fighter aircraft.
What I take away from this is that experts of the language can generate sophisticated, efficient code in C++ that is most certainly suitable for embedded systems. However, a "C With Classes"[1] programmer that does not know the language inside out will generate bloated code that is inappropriate.
The question boils down to, as always: in which language can your team deliver the best product?
[1] I know that sounds somewhat derogatory, but let me say that I know an awful lot of these guys, and they churn out an awful lot of relatively simple code that gets the job done.
C++ compilers for embedded platforms are much closer to 83's C with classes than 98's C++ standard, let alone C++0x. For instance, some platform we use still compile with a special version of gcc made from gcc-2.95!
This means that your library interface will not be able to provide interfaces with containers/iterators, streams, or such advanced C++ features. You'll have to stick with simple C++ classes, that can very easily be expressed as a C interface with a pointer to a structure as first parameter.
This also means that within your library, you won't be able to use templates to their full power. If you want portability, you will still be restricted to generic containers use of templates, which is, I'm sure you'll admit, only a very tiny part of C++ templates power.
C++ has little or no overhead compared to C if used properly in an embedded environment. C++ has many advantages for information hiding, OO, etc. If your embedded processor is supported by gcc in C then chances are it will also be supported with C++.
On the PC, C++ isn't a problem at all -- high quality compilers are extremely widespread and almost every C compiler is directly associated with a C++ compiler that's quite good, though there are a few exceptions such as lcc and the newly revived pcc.
Larger embedded systems like those based on the ARM are generally quite similar to desktop systems in terms of tool chain availability. In fact, many of the same tools available for desktop machines can also generate code to run on ARM-based machines (e.g., lots of them use ports of gcc/g++). There's less variety for TI DSPs (and a greater emphasis on quality of generated code than source code features), but there are still at least a couple of respectable C++ compilers available.
If you want to work with smaller embedded systems, the situation changes in a hurry. If you want to be able to target something like a PIC or an AVR, C++ isn't really much of an option. In theory, you could get (for example) Comeau to produce a custom port that generated code you could compile on that target's C compiler -- but chances are pretty good that even if you did, it wouldn't work out very well. These systems are really just too limitated (especially on memory size) for C++ to fit them well.
Depending on what your intended use is for the library, I think I'd suggest implementing it first as C - but the design should keep in mind how it would be incorporated into a C++ design. Then implement C++ classes on top of and/or along side of the C implementation (there's no reason this step cannot be done concurrently with the first). If your C design is done with a C++ design in mind, it's likely to be as clean, readable and maintainable as the C++ design would be. This is somewhat more work, but I think you'll end up with a library that's useful in more situations.
While you'll find C++ used more and more on various embedded projects, there are still many that restrict themselves to C (and I'd guess this is more often the case than not) - regardless of whether or not the tools support C++. It would be a shame to have a nice library of routines that you could bring to a new project you're working on, but be unable to use them because C++ isn't being used on that particular project.
In general, it's much easier to use a well-designed C library from C++ than the other way around. I've taken this approach with several sets of code including parsing Intel Hex files, a simple command parser, manipulating synchronization objects, FSM frameworks, etc. I'm planning on doing a simple XML parser at some point.
Here's an entirely different C++-vs-C argument: stable ABIs. If your library exports a C ABI, it can be compiled with any compiler that works on the system, because C ABIs are generally platform standards. If your library exports a C++ ABI, it can only be compiled with a matching compiler -- because C++ ABIs are usually not platform standards, and often differ from compiler to compiler and even version to version.
Interestingly, one of the rare exceptions to this is ARM; there's an ARM C++ ABI specification, and all compliant ARM compilers follow it. This is not true on x86; on x86, you're lucky if a C++ library compiled with a 4.1 version of GCC will link correctly with an application compiled with GCC 4.4, and don't even ask about 3.4.6.
Even if you export a C ABI, you can have problems. If your library uses C++ internally, it will then link to libstdc++ for things in the C++ std:: namespace. If your user compiles a C++ application that uses your library, they'll also link to libstdc++ -- and so the overall application gets linked to libstdc++ twice, and their libstdc++ may not be compatible with your libstdc++, which can (or so I understand) lead to odd errors from the intersection of the two. Considerably less likely, but still possible.
All of these arguments only apply because you're writing a library, and they're not showstoppers. But they are things to be aware of.