What are the differences between MSIL and LLVM bitcode? - llvm

I'm new to .Net and I'm trying to understand the basics first. What is the difference between MSIL and LLVM bitcode?

Both LLVM bitcode and MSIL are intermediate languages. Essentially, they are generic assembly code languages: not as high-level as most source languages (e.g., Swift, C#) but also not as low-level as real assembly (e.g., ARM, x86). There are a number of technical implementation differences between the two languages, but most developers don't need to know the small stuff*. They just need to how they are used in their respective platforms' distribution models.
The LLVM bitcode format is a serialized version of the intermediate representation code used within the LLVM compiler. The "front end" of the compiler translates the source language (such as Swift) into LLVM bitcode, and then the "back end" of the compiler translates the bitcode into the target instruction set (such as ARM machine code). (Note: A previous version of this answer implied LLVM bitcode was processor-agnostic. That is not the case, because the source languages depend on the target processor.)
Apple allows iOS developers to submit their apps as either fully-compiled ARM code or as LLVM bitcode, the latter of which:
[...] will allow Apple to re-optimize your app binary in the future without the need to submit a new version of your app to the store.
Essentially, you run the LLVM front end on your development environment, pass the bitcode to Apple, who runs the LLVM back end on their servers. This process is known as ahead-of-time (AOT) compilation (the Wikipedia article is of two minds as to whether the non-bitcode case is also AOT or if that's just "standard" compilation).
But whether or not you use bitcode, iOS end users always get the app as ARM machine code.
Things are a bit different in .NET. Most .NET code is compiled to MSIL, which is packaged in files called assemblies. The .NET runtime on an end user's device loads and executes assemblies, compiling the MSIL to machine code for the device's processor at runtime. This is called just-in-time (JIT) compilation.
Normally, MSIL is processor-agnostic, so most developers can think of .NET apps as also being processor-agnostic. However, there are a number of ways that processor-specific code can be packaged before the end user runs the app through the JIT:
Some tools, like the Native Image Generator and .NET Native, allow AOT compilation. In fact, Universal Windows Platform (UWP) apps uploaded to the Microsoft Store are AOT compiled - you submit the MSIL version of your app to Microsoft, then their servers use .NET Native to compile it for the various architectures Windows 10 supports.
It's also possible to include native code with assemblies themselves; these are called mixed assemblies.
MSIL itself can be processor-specific, if the source language uses "unsafe" operations (e.g., pointer math in C#).
But these are typically the exception, rather than the rule. Usually, .NET apps are distributed in MSIL, and end users' devices are where the native code is generated.
So in summary:
LLVM bitcode is processor-specific, but not quite as low-level as actual machine code. Apple allows iOS developers to submit apps as bitcode, to allow for future re-compilations when optimizations can be introduced. The end user runs native executables.
MSIL is usually processor-agnostic. The end user typically runs this processor-agnostic code, with .NET compiling the MSIL to native code at runtime. However, there are some cases where some or all of the app could be native code.
* Of course, if you are interested in the technical details, there are standards for LLVM bitcode and for MSIL, under its ECMA name CIL. I'm moderately knowledgeable in the latter; after a cursory glance of the former, the most notable technical difference is the memory models: LLVM bitcode is register-based, MSIL/CIL uses an evaluation stack.

Related

How can we distribute compiled source code if it is specific to the hardware it was compiled on?

Suppose we take a compiled language, for example, C++. Now let's take an example Framework, suppose Qt. Qt has it's source code publically available and has the options for users to download the binary files and let users use their API. My question is however, when they compiled their code, it was compiled to their specific HardWare, Operating System, all that stuff. I understand how many Software Require recompilation for different types of Operating Systems (Including 32 vs 64bit) and offer multiple downloads on their website, however how does it not go even further to suggest it is also Hardware Specific and eventually result in the redistribution of compiled executes extremely frustrating to produce?
Code gets compiled to a target base CPU (e.g. 32-bit x86, x86_64, or ARM), but not necessarily a specific processor like the Core i9-10900K. By default, the compiler typically generates the code to run on the widest range of processors. And Intel and AMD guarantee forward compatibility for running that code on newer processors. Compilers often offer switches for optimizing to run on newer processors with new instruction sets, but you rarely do that since not all your customers have that config. Or perhaps you build your code twice (once for older processors, and an optimized build for newer processors).
There's also a concept called cross-compiling. That's where the compiler generates code for a completely different processor than it runs on. Such is the case when you build your iOS app on a Mac. The compiler itself is an x86_64 program, but it's generating ARM CPU instruction set to run on the iPhone.
Code gets compiled and linked with a certain set of OS APIs and external runtime libraries (including the C/C++ runtime). If you want your code to run on Windows 7 or Mac OSX Maverics, you wouldn't statically link to an API that only exists on Windows 10 or Mac OS Big Sur. The code would compile, but it wouldn't run on the older operating systems. Instead, you'd do a workaround or conditionally load the API if it is available. Microsoft and Apple provides the forward compatibility of providing those same runtime library APIs to be available on later OS releases.
Additionally Windows supports running 32-bit processes on 64-bit chips and OS. Mac can even emulate x86_64 on their new ARM based devices coming out later this year. But I digress.
As for Qt, they actually offer several pre-built configurations for their reference binary downloads. Because, at least on Windows, the MSVCRT (C-runtime APIs from Visual Studio) are closely tied to different compiler versions of Visual Studio. So they offer various downloads to match the configuration you want to build your your code for (32-bit, 64-bit, VS2017, VS2019, etc...). So when you put together a complete application with 3rd party dependencies, some of these build, linkage, and CPU/OS configs have to be accounted for.

c++ - llvm and runtime jit

Context
Linux 64 bits / osx 64 bits. C++ (gcc 5.1, llvm 3.6.1)
Up to now, I always used gcc for my projects.
The problem for the next thing I am creating is the licence. Hence, I decided to give clang/llvm a go.
My needs : runtime self modifying code (and a very relaxed licence for compiler plugins for static analysis and other things.).
I played a lot with libgccjit and it works fine.
As for llvm, I read the Kaleidoscope project and some doc but it is unclear.
Question
I saw that llvm has some jit possibilities but I am not sure if it enables to self modify the code (more precisely, extend the code) at runtime as libgccjit does for c++ language.
I just need a starter here, llvm is huge and new to me, so anyone expert enough is very welcome to guide me a bit.

Setting up a MIPS test environment

We are creating a multi platform software in C++ for "normal" i386 Linux, but also some obscure MIPS hardware and for this we cross compile our product using the ELDK Mips cross compiler (an older version). The software is copied automatically to the real hardware via a script placed on a USB stick (the hardware detects the insertion of the USB stick, searches for the script, copies, reboots).
The compilation of the product happens on the same machine (Linux i386) for both the MIPS and i386. We have a complete set of unit tests, and they are executed automatically upon the completion of the i386 build (results are interpreted via atlassian bamboo's junit parser, but this is not relevant here)... However we have a problem verifying the validity MIPS tests. There are some minor differences in the code when compiling in MIPS so it would be important to know that they work too.
And the question: How to set up a MIPS unit test environment that can take the compiled unit tests and run them? (Any solution is welcome, even the un-orthodox ones)

What are the differences between C/C++ bare-metal compilation and compilation for a specific OS (Linux)?

Suppose you have a cross-compilation tool-chain that produces binaries for the ARM architecture.
Your tool-chain is like this (running on a X86_64 machine with Linux):
arm-linux-gnueabi-gcc.exe : for cross-compilation for Linux, running on ARM.
arm-gcc.exe : for bare-metal cross-compilation targeting ARM.
... and the plethora of other tools for cross-compilation on ARM.
Points that I'm interested in are:
(E)ABI differences between binaries (if any)
limitations in case of bare-metal (like dynamic memory allocations, usage of static constructors in case of C++, threading models, etc)
binary-level differences between the 2 cases in terms of information specific to each of them (like debug info support, etc);
ABI differences is up to how you invoke the compiler, for example GCC has -mabi and that can be one of ‘apcs-gnu’, ‘atpcs’, ‘aapcs’, ‘aapcs-linux’ and ‘iwmmxt’.
On bare-metal limitations for various runtime features exists because someone hasn't provided them. Be them initializing zero allocated areas or providing C++ features. If you can supply them, they will work.
Binary level differences is also up to how you invoke compiler.
You can check GCC ARM options online.
I recently started a little project to use a Linux standard C library in a bare-metal environment. I've been describing it on my blog: http://ellcc.org/blog/?page_id=289
Basically what I've done is set up a way to handle Linux system calls so that by implementing simplified versions of certain system calls I can use functions from the standard library. For example, the current state for the ARM implements simplified versions of read(), readv(), write(), writev() and brk(). This allows me to use printf(), fgets(), and malloc() unchanged.
I'm my case, I use the same compiler for targeting Linux and bare-metal. Since it is clang/LLVM based, I can also use the same compiler to target other processors. I'm working on a bare-metal example for the Mips right now.
So I guess the answer is that there doesn't have to be any difference.

Using LLVM as virtual machine - multiplatform and multiarchitecture coding

I'm currently working in a pet programming language (for learning purposes), and have gone through a lot of research over the past year, and I think its time to finally start modelling the concepts of such a languague. First of all I want it to compile to some intermediate form, such as JVM or .NET bytecode, the goal being multiplatform/architecture compatibily. Second, I want it to be fast (I also have many other things in mind, but its not the purpose of this topic to discuss those).
The best options that came to my mind were:
Compile to JVM bytecode and use OpenJDK as runtime environment,
Compile to .NET bytecode and use Mono as runtime environment,
Compile to LLVM IR and use LLVM as runtime environment.
As you may have imagined, I've chosen LLVM. Why? because its blazing fast. I did a little benchmark using the C++ N-Body code, and achieved 7s in my machine with lli jitted IR, in contrast with 27s with clang native compiled code (I know clang first make IR then machine code).
So, here is my question: Is there any redistributable version of the LLVM basic toolset (I just need lli) that I can use? Or I must compile my own? If the latter, can you provide me with any hints on how to do it? If I really must do it, I'm thinking is cross-compiling them from my machine (Intel Mac), and generating some installers (say, an .msi for windows, .rpm and .deb for popular linux distros and .pkg for Macs). Remember, I only need a minimal subset of LLVM, such that this subset is capable of acting like a VM, by using "lli ". The real question here is how to use LLVM as a typical virtual machine.
First, I think all 3 options - LLVM IR + LLVM, Java Bytecode + OpenJDK, and .NET CIL + Mono - are excellent options, and I agree deciding between them is not easy.
If you go for LLVM and you just want to use lli, you can compile LLVM to your target platform and pack the resulting lli executable with your distribution, it should work.
Another way to write a JIT compiler via LLVM is to use an execution engine - see the handy examples in the Kaleidoscope tutorial. That means that you write your own program which will JIT-compile your own language, compile it to whatever platform you want while statically linking it with LLVM, and then distribute it.
In any case, since a JIT compiler requires copying an LLVM binary to the client side, make sure to attach a copyright notice with your distribution (you don't have to open-source your distribution, though).