Writing a new jit - c++

I'm interested in starting my own JIT project in C++. I'm not that unfamiliar with assembly, or compiler design etc etc. But, I am very unfamiliar with the resulting machine code format - like, what does a mov instruction actually look like when all is said and done and it's time to call that function pointer. So, what are the best resources for creating such a thing?
Edit: Right now, I'm only interested in x86 on Windows, stretching a tiny bit to 64bit Windows in the future.

You want to have a look at the processor manuals for the architecture you are interested in. Those manuals describe the opcode encoding. For x86 processors, the manuals can be downloaded from this page.

Starting your project on top of LLVM might shield you from the platform details.
http://llvm.org/
LLVM is used by several dynamic language JIT compilers.

GNU lightning is a multi-architecture (x86, SPARC, PPC) library for generating code within another program. You'll need to understand general assembly language concepts, but not at a very deep level. You won't have to write anything architecture-specific at all. The down side to lightning (at least last time I used it) is that the interface presented is the intersection of the features available on the supported targets: The small register set of x86, a RISC instruction set like SPARC, and so on. The single-pass code generation is easy to use but has its own quirks, like you can't relocate your output buffer (because of address references) so if you run out of space you generally have to start over. The good thing is that you will probably get a working example going very quickly.

Older versions of NASM come with a fairly concise opcode reference that has x86 instruction encodings. (Looks like there's no 64-bit info, though.) I found this one using google:
http://alien.dowling.edu/~rohit/nasmdocb.html
The official manuals say basically the same thing (and a lot more besides), but not quite so conveniently.

Related

Microarchitectural profiling of C++ and assembly code on MIPS

As part of course project, I need to analyze a piece of C++ code for performance and find out which parts of the Computer Architecture (MIPS or x86) are mostly utilized while running the code and is possibly a bottleneck for the performance. I am looking at various Profilers for analyzing the performance and came across SimpleScalar which is a great tool but sadly only works with C code.
Since I am more familiar with MIPS architecture it would be great if there's a tool like SimpleScalar for simulating and profiling C++ code for MIPS. I am looking at the performance critical parts like branch, cache, instruction set, addressing modes etc. If not, mention of any tool which can do the similar kind of analysis for x86 architectures would be great as well.
(Just to clarify, I'm not looking for any old profiler, but for one that understands the CPU microarchitecture and knows what parts of the CPU are taken advantage of or underused.)
CACTI has detailed low-level simulation of cache.
SESC is a cycle accurate computer architecture simulator that supports MIPS.
SESC includes CACTI.
I doubt that what you want is possible. C++ is the language, but it still needs to be compiled to the target architecture. The optimisations (or the lack of them) will determine a lot of your performance criteria like cache use, etc. So I guess you need to look for machine level profilers (Hopefully they support the debug format of your compiler, so you see source code context).
My understanding is that SimpleScalar can simulate and profile MIPS machine code, no matter what the original language it was compiled from.
(The source-level debugger "DLite!" that comes with SimpleScalar may only support a few languages, but it sounds like you don't need to "debug" your code.)

OpenHMPP in GCC

The gist of the question is:
Do you know any projects that aim to bring OpenHMPP support to GCC? I could also possibly live with affordable commercial compilers, but it's very unlikely, because I prefer Linux, and I would like the compiler to support non-x86 architectures as well.
And the background story:
I know OpenCL and CUDA people will bash me, but here goes my experience/opinion: I've been pursuing some toy projects to get into many core processing using CUDA and OpenCL. I feel that it's such a mess to set up those development environments (especially under linux and especially if you've the slightest bit of irregularity in your system). Even when you set them up, it's still a mess to run them anywhere other than your development environment. Finally (and probably the most importantly) these languages are very verbose and tiresome. I feel like they're the assembler of many-core processing. Compare them to OpenMP, and you see how they could actually be.
At this point, OpenHMPP comes into the scene. It uses #pragma statements like OpenMP and it seems to be a very good step in the right direction. However, it's very hard to find compilers for it. CAPS Enterprize and Pathscale do have OpenHMPP support, but they're very expensive (€4000 for CAPS, I couldn't find the price for Pathscale). And correct me if I'm wrong, but CAPS seems to support C, not C++.
So, we return to the gist. It would be like a dream, to have OpenHMPP support in GCC. Do you know of any open-source projects or any affordable alternatives? Maybe even, do you know of alternatives to OpenHMPP that are easier to find support for.
If I understand you correctly, you are looking for ways to simplify access to accelerator devices, which may be GPUs as well as multi-core CPUs.
This is a field with a lot of academic work happening right now, resulting in many publications describing such frameworks, however only few are actually available.
In fact, the reasons you are stating are the basis of my research, which is also far from complete or in a state usable by anyone else...
The only thing I know, that comes close to what you seek (using #pragmas to access accelerators), would be MGP from the Virtual OpenCL package.
All other solutions are more intrusive by requiring the use of their API.
I have not yet had a closer look to AMP for C++, but it might be interesting if it picks up some pace.

Creating a VHDL backend for LLVM?

LLVM is very modular and allows you to fairly easily define new backends. However most of the documentation/tutorials on creating an LLVM backend focus on adding a new processor instruction set and registers. I'm wondering what it would take to create a VHDL backend for LLVM? Are there examples of using LLVM to go from one higher level language to another?
Just to clarify: are there examples of translating LLVM IR to a higher level language instead of to an assembly language? For example: you could read in C with Clang, use LLVM to do some optimization and then write out code in another language like Java or maybe Fortran.
Yes !
There are many LLVM back-end targeting VHDL/Verilog around :
(open source) Legup paper
(commercial) Xilinx HLS
(online) C-to-verilog
And I know there are many others...
The interesting thing about such low-level representations as LLVM or GIMPLE (also called RTL by the the way) is that they expose static-single assignments (SSA) forms : this can be translated to hardware quite directly, as SSA can be seen as a tree of multiplexers...
There's nothing really special about the LLVM IR. It's a standard DAG with variable arity. Decompiling LLVM IR is a lot like decompiling machine language.
You might be able to leverage some frontend optimizations such as constant folding, but that sounds pretty minor compared to the whole task.
My only experience with LLVM was writing a binary translator for a class project, from a toy CISC to a custom RISC.
I'd say, since it's the closest thing to a standard IR (well, GCC GIMPLE is a close second), see if it fits with your algorithms and style and evaluate it as one alternative.
Note that GCC also started out prioritizing portability above all, and has also accomplished a lot.
I'm not sure I follow how parts of your question relate one to another.
To target LLVM into a high-level language like C is very possible and you seem to have found one reference point.
VHDL is a whole other business however. Do you consider VHDL a high-level language? It may be, but but describing hardware/logic. Sure VHDL has some constructs that you can employ to actually program in it, but it's hardly a fruitful endeavor. VHDL describes hardware and thus makes translating LLVM IR into it a very hard problem, unless of course you design a CPU with a custom instruction set in VHDL and translate LLVM IR into your instructions.
This thread was one of the first things I found while looking for the same thing.
I found a project that's rather far along that cleanly builds under/with llvm 3.5. It's pretty darn cool. It spits out HDL and does various other cool FPGA related things. While it's designed to work with TTAs and generate images for FPGA (or simulate them), it can probably also be made to do some trivial HDL generation from c functions.
It was perfect for my purposes because I wanted to upload to an Altera FPGA, and the fpga_stdout example even spits out Quartus build scripts and project files.
TTA-Based Co-design Environment
I also tried the things listed in the accepted answer and a couple others and found that they weren't going to work for me or weren't very high quality (usually both). TCE is professional feeling, but purely academic I believe. Very nice all the way around.
It seems the question was partially answered, so I’d like to give it a shot:
What it would take to create a VHDL backend for LLVM?
What it would take to translate LLVM IR to a higher level language (presumably with the intention of converting between high-level langs)?
I will give you some background on 2. And expand at a later date on 1.
If you want to convert LLVM IR to a high-level language such as C or Java:
You would have to take the LLVM instructions, and abstract that out into its equivalent C code. Then you need to take the remaining features that LLVM does not have an equivalent for (like classes and abstractions for C++) and write a routine that would find those patterns in the LLVM (like reused blocks) and write C. For the basic stuff, its pretty straightforward. But, just follow the train of thought and you quickly find yourself realizing the true difficultly of the problem, after all not everyone writes simple C. To compound the difficulty further, you may not get the same LLVM IR when compiling the generated C! (Consider the resulting feedback loop)
As for Java, you are in for an even harder battle going direct from LLVM IR, and in either case still have the problem you likely won't get the same code compiling to LLVM IR, if one even can do that. Rather, you would translate LLVM IR to JVM Bytecode. Then you could use a reverse compiler to get your Java.
A group of Chinese students was apparently able to do this, but they wondered why such little interest in their research. I would say its bc they don't fully understand just what the LLVM guys have done, and how it is better than the JVM. (In fact, LLVM arguably makes the JVM obsolete ;)
Even though this seems useful in that one can use LLVM as an intermediary between C and Java to convert bidirectionally, this solution is actually of little use because we are asking the wrong question. See, the entire reason you would want that for practical purposes is to have a common code base and increase performance.
But the real problem is that we need a language that has abstracted the common features of modern languages, and that gives you a central language that you can build from. http://julialang.org/ has answered the question 😉
Looks like the best place to start is with the CBackend in the LLVM source:
llvm/lib/Target/CBackend/CBackend.cpp
tl,dr: I don't think LLVM is the right tool
What your are looking for is way to translate LLVM code to a higher language that's what emscripten do for Javascript.
But it looks like you miss a bit the point of LLVM as it's meant to generate static code in order to achieve that they use a specific intermediate language build for that purpose.
As you can see the way emscripten works is by implementing a stack, but without using javascript as a human would have done it.
They are several project that try to achieve what you original question was, like MyHDL that turns python to VHDL or Verilog.

Why does the chip control the language to choose

I've asked the question before what language should I learn for embedded development. Most embedded engineers said c and c++ are a must, but also pointed out that it depends on the chip.
Can someone clarify? Is it a compiler issue or what? Do chips come with their own specific compilers (like a c compiler or c++ compiler) and that's why you have to use the language the compiler knows? Is it not possible to code and compile it elsewhere, then burn it to the chip directly in its compiled state? (I think I heard an acquaintance say something to this effect)
I'm not sure how this works, as clearly I don't know much embedded systems or how they work. It's probably an easy answer for those of you who know.
Probably, they meant some toolchains do not support C++. Yes, many chips and boards do come with their own toolchains. Different processors have different instruction sets, which means a different compiler (or more specifically a different backend). That doesn't mean you always have to relearn everything. Many of these are based on GCC (often considered the most ported compiler). The final executable/image formats also vary, so you need a specific linker. Most likely, you will be (cross-)compiling the chip on a "regular" computer, then burning it to the chip. However, that doesn't mean you can use a typical compiler and linker targeted towards a desktop operating system.
It "depends on the chip" in three possible ways:
Some very constrained architectures are not suited to C++, or at least C++ provides constructs not suited to such architectures so offers no benefit over C. Most 8 bit devices fall into this category, but by no means all; I have seen useful C++ code implemented on MegaAVR for example.
Some devices are not supported by a C++ compiler. For example Microchip's dsPIC/PIC24 compiler is C only (third-party tools may have C++ support).
The chip architecture is designed specifically for a particular language; for example INMOS Transputers invariably ran OCCAM.
As well as C, C++, other possibilities are assembler, Forth, Ada, Pascal and many others, but C is almost ubiquitous; few chip vendors will release a new architecture or device without a C compiler being available from day-one. For other languages you will generally have to wait until a third-part decides to develop one, and that wait may be forever for a niche architecture.
Is it not possible to code and compile it elsewhere, then burn it to the chip directly in its compiled state?
That is called cross-compilation or cross-development, and is the usual development method for embedded systems. Most embedded systems lack the OS, file, performance and memory resources to self-host a compiler, and most developers want the comfort of a sophisticated development environment with IDEs, debuggers etc. in a familiar user-oriented desktop OS.
I'm not sure how this works, as
clearly I don't know much embedded
systems or how they work.
Get up-to-speed with some of these:
http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf
http://www.eetimes.com/design/embedded
http://www.amazon.com/exec/obidos/ASIN/020179523X
http://www.amazon.com/Embedded-Systems-Firmware-Demystified-CD-ROM/dp/1578200997
Yes, there are many architectures for which a C compiler exists but a C++ compiler does not. The smaller and less fully-featured a processor you choose, the more likely this situation is to occur.
For embedded development, you almost always compile the code 'elsewhere', as you say, and then send it to the chip for execution/debugging. The process of compiling code for a different architecture than the compiler itself is built for is called 'cross-compiling'.
You are correct: chips have variations on compilers. Most/many modern chips have a gcc port; but not all.
The term 'embedded' is used to describe a vast range of hardware. Most embedded software engineering will consist of writing C/C++ code to produce a binary for a target microprocessor, but there are devices that you may work with that are not coded with compiled binary.
One example is a Programmable Logic Controller (PLC). These devices use a language called "Ladder Logic". It's a wonderful language. I have enjoyed working with it in the past.
Another thing you may encounter, as I have in the past, is devices that have interpreted BASIC emulators. Hopefully that is rare today.
C/C++ are a very good choice for firmware development. So the software you make will run on a embedded CPU/Microcontroller. In order to proper programmer the device, you will need to know the language and the device architecture.
The same code probably will not work in different devices. So, you have to learn the language, and the device architecture.
Another options are FPGAs, which are not microcontroller. FPGA are devices with specialized cell capable to transform itself in any type of synchronous circuit, including microcontroller. FPGAs are programed with Hardware Description languages, like verilog and VHDL. The "compiled" (synthesized) version of the software are called gateware.
The HDLs are the same languages used for ASICs designe also. The path to properly learn
the language are long. So I recommend start with C/C++ with pic form Microchip, which is a
low cost and highly accepted microcontroller.
If you intend to do FPGA development, the knowledge gained with C/C++/pic will be helpfull and important, because must FPGAs have embedded CPU/Microcontroller inside.
There is no direct scientific reason for it. In a lot of cases it has to do with the management and politics of the specific company.
Some companies are driven to create a turn key system and force you to buy that system and pay for maintenance. It locks out the individual developers, but there are many companies and esp government agencies that prefer this model because the support is often much better and you can often drive the direction of their products to suit your needs.
Other companies do not have the staff or the talent and outsource the solution and sometimes take whatever they can get. And you might end up with a one time developed tool that after the contractor leaves is never updated or fixed again, or if it is fixed it is a patch job by someone else. It takes money to make money, but if you run out of money before you can sell your product you still fail.
Sometimes you have companies that both have a staff that maintains their in-house must buy from them tool AND has individuals that also contribute to open tools like gcc.
Sometimes the politics or management in the company have individuals that have a strong opinion of how the world must be and only allow tools to be developed for a specific language. Or perhaps they are owned by or partner with or just like a company that has a specific language and this chip product came to be simply to support that language.
On top of all of this you have the very real technical problems of memory space, the quality and efficiency of the instruction set and how compiler friendly it is. Some architectures may be fine for assembler, but higher level compiled code chews up the limited memory resources too quickly.
Gcc in particular has a lot of problems internally (not as a people but the software/source code itself). I challenge you to write a back end, even with the tutorials that are out there. A company requires specialised talent in order to create and then maintain a gcc backend year after year, otherwise you get dumped. if your chip architecture is not 32 bit or bigger you are already fighting a losing battle with gcc, your chip architecture might be compiler friendly but just not friendly with the popular compilers design.
In the near future llvm is going to shine as a cross compiler relative to gcc because it has not yet built this internal bulk, and perhaps because the internal guts are themselves a defined language/system it may never suffer what has happened to gcc. As more folks get comfortable with llvm we will see a number of architectures ported to it. The msp430 backend was done specifically to demonstrate that you can add a target literally in an afternoon. By the end of next month, some motivated individual could have all of the targets most of us have ever heard of ported to llvm. And you dont have to build a cross compiler it is always a cross compiler. I only mention llvm because the door is now open for targets that have suffered from bad tools to recover.
Some companies, microcontrollers in particular, can and will make the programming interface proprietary so that you must use their programming tool (and or hack it and take your chances with publishing those results and or a cat and mouse of them changing it to defeat you). And they may have only made tools for Windows leaving the linux and apple folks hanging in the wind. Or they make it so that the only binaries it will load are the ones generated by their tools, here again you may hack through the binary format allowing an alternate compiler, and they may or may not work to defeat you.
Despite the technical problems the biggest is the companies politics, management, marketing teams, and supply of or lack of talent in the engineering staff. The bottom line, follow the dollars not the technology or science to understand why this language is supported and not that, or the support for this language is good, bad, or marginal.
What language to learn as a result of all of this? Start with assembler on at least three different architectures. Then C and then C++ if you feel you really need it. C and assembler are your primary languages for embedded (depending on your definition of embedded). No, we write assembler mostly for initial boot code and to support C, interrupt stuff or special instructions that are needed that the compiler cannot create. There are places like microcontrollers where it may very well make sense to use assembler for various reasons like tools, limited chip resources, etc. Even if you dont use assembler knowing it makes you a much better high level programmer.
You do need to decide what your definition of embedded is. Is it api and library calls for an application on a(n embedded) linux system (indistinguishable from the same program/calls on a desktop system). Or at the other end of the spectrum are you talking a microcontroller with maybe 256 or 1024 bytes (not mega or giga, but bytes) of program space? Or something in the middle? The majority of the "embedded" folks out there are closer to the api calls for applications on an operating system (rtos, linux, wince, etc), than the deeply embedded, so that means C, maybe C++ (always be able to fall back on C), trying to avoid python and other scripty languages that are resource hogs.
Some 8-bit parts cannot efficiently access data from a stack. Instead of using a stack to pass parameters, auto-variables and parameters are statically allocated; typically, a linker allocates the automatic variables for main() at one end of memory, and then allocate the variables for functions that are called by main and nothing else, then allocate the variables for functions that are called by those functions and nothing else, etc. This will yield an optimal allocation fairly easily, subject to some caveats:
Recursion can only be supported by adding code to explicitly copy variables onto some sort of stack arrangement; in many compilers, it's simply not supported at all.
If a function looks like it "might" call another function, the linker will assume it can do so in all cases (e.g. it may be that when 'foo' calls 'bar', one of its parameters might always have a value such that 'bar' won't call 'boz', but the linker won't know that).
Any call to a function pointer with a certain signature will be regarded as a call to all functions with the same signature whose address is taken.
If the evaluation of more than one parameter to a function requires making additional function calls, additional temporary storage must generally be pessimistically allocated even if optimal placement of the parameter storage could have avoided that.
There are many types of C programs for which the above restrictions pose no problem at all, and many more for which they pose a nuisance but not a huge one (e.g. by adding dummy parameters or return values to ensure different classes of indirectly-called functions have different signatures). Unfortunately, the code generated by an C++ to C pre-compiler will almost always involve function pointers whose call graph cannot be reasonably divined, so using C++ on such a platform is apt to be difficult if not impossible.

need book & web site suggestion for advanced low-level programming

I want to learn all advanced details of low-level programming so i want to be able to
Learn advanced c/c++
Optimize my code with and without inline assembly
Understand the internals of an exe, dll, thread, process
Effeciently make use of technologies like SSE, 3DNow, MMX
Debug&disassemble executables/libraries and understand what's going on inside
The differences/features of different cpus/platforms like x86, MIPS, ARM, PowerPC
My first target is a x86 Windows based system. After that, comes linux based platforms. And embedded systems follow.
Any books, web sites, tutorials, forums, comunities that give me what I'm looking for DIRECTLY is fine.
Thanks..
What you are asking for cannot be found in a single book. Much of what you have mentioned is best found in User Manuals or Functional Specifications for various processors. I recommend starting with an understanding of the core x86 arch and working up from there. One of the old Intel 386 or 486 manuals might be a good start.
I know of no websites for this type of info.
A few recommendations from among my personal favourites to get you started:
“Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition)”
-- Scott Meyers
“Inside the Machine” -- John Stokes
“Hacker’s Delight” -- Henry S. Warren
“The Software Optimization Cookbook” -- Richard Gerber
“Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M” (253666-021)
“Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B: Instruction Set Reference, N-Z” (253667-021)
Maybe it's time for you to get an account on http://my.safaribooksonline.com/, unplug the phone for a couple of weeks, load the refrigerator up with Jolt and Funyuns, say goodbye to your family and friends, and then read as many books as you can. They have a pretty substantial library on there that covers most of the topics that you're looking for.
that is a bit too much, that you want to learn. :)
i would suggest starting with basic ARM v4 core architecture.
it is simple enough to understand.
then move on to 8086, then build up to later versions of ARM and x86.
ARM is of the RISC type. and x86 of the CISC type.
you can never learn all of the processors. (like you wont be ever able to learn all the programming languages)
but having a knowledge of 1 or 2 can will enable to grasp any other you would come across.
there is nothing much Object oriented about low level programming.
so it doesnt matter if you use c++ or c.
get a full system simulator like gxemul or qemu.
try to execute a hello world assembly program - (without using the processor runtime libraries, - you want it hard, right?)
others might be able to guide you with respect to SSE, MMX etc.
checkout infocenter.arm.com for the ARM assembly language and architecture specifications.
I've always found Computer Systems: A Programmer's Perspective (http://www.amazon.com/Computer-Systems-Programmers-Randal-Bryant/dp/013034074X) to be a very good book. It's got a large amount of information about Computer Architecture, and it taught me about memory management, compilation and linking (as well as how to debug linking errors), optimization, relocatable object code, and some lower-level architecture items like how to go about studying computer science from a low-level (e.g. what the internals of the processor are like). There are a lot of good exercises, ranging from optimization examples to implementing buffer overflows. It discusses how to write inline assembly code (and make it work). There's even a section on writing code for a fictional (Y86) processor.
One caveat, though, is that it tends to focus to heavily on the Intel processor line (in my opinion). If you want something that's a bit more along the lines of working with say the ARM line, then you'll probably want to take the recommendations from others above.