Hashtables in llvm using ocaml - ocaml

I'm working on a toy programming language/compiler using ocaml and its llvm bindings. I want to have hashtables/hashmaps as a built in data structure for my language however I'm confused as to how to go about them.
I know the llvm c++ api has an ADT directory with a bunch of data structures that would suffice my needs, but I don't know how to call them using the ocaml api.
Another option would be to implement them using c and link them but I would rather focus on the first idea.
It would be helpful if anyone has useful resources on how to use/implement these data structures in llvm (either through the ocaml bindings or directly using the IR, not the c++ api).

I know the llvm c++ api has an ADT directory with a bunch of data structures that would suffice my needs, but I don't know how to call them using the ocaml api.
You can't call them from the OCaml API, but even if you could, they wouldn't solve your problem. They're just data structures, not a way to generate LLVM code. If you could use them in OCaml, you'd have a couple of additional options of data structures you could use in addition to OCaml's built-in lists, maps, sets and arrays. You wouldn't have a way to generate LLVM code implementing these data structures. That's not what the classes do.
Those data structures could just as well be a separate library that has nothing to do with LLVM. They're part of LLVM because they're used by the LLVM project, not because they're directly related to generating LLVM code.
Another option would be to implement them using c and link them but I would rather focus on the first idea.
That or linking against an existing library implementing hashtables (with added glue code to make them work with your language's type system and memory model as appropriate) would be your only options.

Related

Is there a C-like syntax scripting language interpreter for C++?

I've started long ago to work on a dynamic graph visualizer, editor and algorithm testing platform (graphs with nodes and arcs, not the other kinds).
For the algorithm testing platform i need to let the user write a script or call a script from a file, which will interact with the graph currently loaded. The visualizer would do things like light up nodes while they're being visited by the script algorithm, adding some artificial delay, in order to visualize the algorithm navigating and doing stuff.
Scripts would also be secondly used to add third party features that i could either make available as pre-existing scripts in the program folder OR just integrate inside the program in c++ once they're tested and working.
All my searches for an interpreter to embed in my program sent me to lua;
then i started handwriting my own recursive descent parser for my own C-like syntax scripting language (which i planned to use a subset of C++ grammar so that any code written in my scripting language can be copy-pasted in any C++ code.
It was an interesting crazy idea which i don't regret at all, I have scopes, functions, cycles, gotos, typesafe variables, expressions.
But now that i'm approaching the addition of classes, class methods, inheritance (some default classes would be necessary to interface scripts to the program), i realized it's going to take A LOT of time and effort. A bit too much for a personal project of an ungraduated student with exams to study for… but still i whish to complete this project.
The self-imposed requirement of the scripts being 100% compatible with C++ was all but necessary, it would have been just a little nice extra thing, which i can do without.
Now the question is, is there an alternative to lua with a c-like syntax that supports all i've already done plus classes and inheritance? (being able to add custom "classes" that interface scripts to the program is mandatory)
(i can't assume the user to have a full c++ compiler installed so i cant just compile their "script" at runtime as a dll to load and call it, although i whish i could)
Just-in-time compilation of C++
Parsing C++ is hard. Heck, parsing C is hard. It's difficult to get it right, and there are a lot of edge cases. Thankfully, there are a few libraries out there which can take code and even compile it for you.
libclang
libclang provides a lot of facilities for parsing c++. It's a good, clean library, and it'll parse anything the clang compiler itself will parse. This article here is a good starter
libclang provides a JIT compilation tool that allows you to write and compile C++ at runtime. See this blog post here for a overview of what it does and how to use it. It's very general, very powerful, and user-written code should be fast.
GCC also provides a library called libgccjit for just-in-time compilation during the runtime of a program. libgccjit is a C library, but there's also a C++ wrapper provided by the library maintainers. It can compile abstract syntax trees and link them at runtime, although it's still in Alpha mode.
cppast
If you don't want to use libclang, there's also a library under development called cppast, which is a C++ parser which will give you an abstract syntax tree representation of your c++ code. Unfortunately, it won't parse function bodies.
Other tools
If anyone knows any other libraries for compiling or interpreting C++ at runtime, I encourage them to update this post, or comment them so I can update it!
Here is something that lets you embed a C-like scripting language in your application (and a bunch of other cool things):
http://chaiscript.com/
There is lots of documentation:
https://codedocs.xyz/ChaiScript/ChaiScript/

Are there any ways to compile C++ code during runtime?

I have written a complex math library for JavaScript that features the ability to generate functions from strings of human-readable math expressions. Is there a way to achieve an equivalent of runtime-generated functions in C++?
FUZxxl's answer is right, and I recommend looking at the Clang/LLVM facility.
There is a basic (not so helpful) tutorial file here. And a broad tutorial on writing your Language on LLVM. You can load your generated library in your C++ App.
Unless, you have a performance critical component, you can employ the use of ChaiScript (NB: I am in no way affiliated to it or the authors)
You can execute the C++ compiler, let it generate a shared library and load that into your program to run C++ code at runtime. Note that the details depend on what platform you are working on as Windows and POSIX have different mechanisms to load shared libraries.
The 'compiled language way' is to define your grammar, build a parser, an AST (abstract syntax tree) and interpret/compile that. When you do this, you're essentially writing your own compiler/interpreter and it's a lot of fun. If you want to get it working easily, you might take a look at boost spirit.

Use clang as a library to parse OpenCL code extended with some C++ elements

I am currently working on a Source-to-source compiler that transforms code wirtten in an OpenCL superset to "ordinary" OpenCL. I would really like to use clang as a library to parse and analyze the source code. Especially, I really need all the available type information and I would like to have an AST to make use of clang's Rewrite capabilities.
Fortunately, the OpenCL superset that needs to be parsed is really a "mixture" between OpenCL and C++, i.e. the code is basically OpenCL extended with some C++ stuff. In detail, there are possibly template annotations before a function definition and there may be structs containing methods (including operator definitions).
I was hoping that I can use clang to parse this language, since the clang parser is capable of parsing all these constructs. However, I am not sure how (if possible) to tell clang to parse OpenCL and C++ constructs at once. If possible, I really want to avoid touching the clang code base, but I would prefer using clang as a library instead. Maybe it is possible to setup an appropriate instance of clang's LangOptions class that tells clang to parse all these constructs?
Any ideas on how to make clang parse this mixture between OpenCL and C++? Any help is appreciated, and thanks in advance!
You're trying to mix two different front ends, involving both parsing and name resolution.
I think you are in for a rough trip. The key problem is you are trying to glue together things that had no effort expended, to make them gluable. This usually leads to integration hell. You don't see people doing this with Fortran and C++ for the same reasons.
To start with, you'll discover you will have to define the semantics of how the C++ extensions interact with those of OpenCL. If you check out the C++ standard, you'll discover 600 pages of results from committee arguments on how C++ interacts with itself. So unless you can define a radically simple interaction, you'll have a tough time knowing what your mixed OpenCL/C++ program means.
Your second problem will be interleaving the Clang parsing machinery for C++ (AFAIK hand written code) with the Clang parsing machinery for OpenCL (don't know anything about it, but assumed it follows the C++ style). There's no obviously good reason to believe you can just pick and choose these to interleave easily. It may work out fine; just not a bet I'd care to make.
The next place you are likely to have trouble is in building an AST for the joint language. Maybe it is the case that Clang has defined AST nodes for both C++ and OpenCL in a way that easily composes to a joint Clang/OpenCL tree. Since the node types are chosen by hand, and there was no specific reason to design them to work together, it is also not obvious they will compose nicely.
Your last task, given a "valid" OpenCL/C++ tree, to transform it to OpenCL. How in fact will you expand a C++ template (or any general C++ code) to OpenCL code?
[Check my bio for another system, DMS, that might be a bit better for this task; it provides uniform infrastructure for multiple languages that would make some of this easier. Somewhat similar to what you are trying to do, we have used DMS to mix C++ with F90 and APL concepts for easy expression of vector operations in a prototype Vector C++, but we did not try to preserve F90 and APL syntax and semantics exactly for all the above reasons].
It isn't my purpose to rain on your parade; progress is made by the unreasonable man. Just be sure you understand how big a task you are taking on.

Writing LLVM source files vs. using APIs

I am creating an LLVM backend for a compiler. I am wondering if there is any downside to having my backend write IR code in files instead of using the APIs. The APIs are complicated (especially if one is using a language other than C++, in my case Haskell) and hard to use. The IR is much easier to understand. I don't need JIT compilation, the output code will be compiled to machine code by the standard command line tools.
The IR format changes from version to version. API changes much less frequently. There were examples in the past when IR format changed dramatically, so you'd need to invest big amount of time to tolerate these changes.
Using API is the preferable method. If sometimes it's not clear for you which API calls you will need - you can use cpp backend as a source of inspiration :)
As Anton said, there's a definite advantage in using the API as opposed to spitting out textual IR. I just want to address the point you raise regarding the complexity of the API and its usage from Haskell.
Note that LLVM has a C API, which (apart from being more stable) is suitable for foreign language interfaces. Python bindings exist for LLVM using this API, as well as Haskell bindings (this is easily found by Google) and for other languages as well.

Need to know about good C++ Reflection API (For RuntimeType Identification -RTTI and runtime calling)

I need a good C++ Reflection API (like a Microsoft API) which enables me to determine the types (class, struct, enum, int, float, double, etc) identified at runtime, declare them, and call methods on those types at runtime.
Regards,
Usman
If you are trying to get to a plugin-type architecture, the POCO Library at http://pocoproject.org has some pieces that might get you part of the way. It will allow you to load a .dll or .so at runtime and create the classes contained in it. But the calling code will still need a header file which describes an interface (or abstract base class) to be able to get the signatures of the methods.
C++ is an incredibly complex language. "Reflective" APIs weren't part of the language design and so basically it isn't there.
If you want general purpose "reflection" and "metaprogramming", you can get that by stepping outside the language and using a program transformation system (PTS). Such a tool for your purpose has to parse C++ (in more than one compilation unit at a time), provide you with access to all the language structures, let you reflect, that is, determine the type (or other properties) of any construct (e.g., variable, expression or other syntax construction) and enable you to apply arbitrary code modifications. Obviously, this won't happen at "runtime" (although I suppose you could shell out to such machinery if you insisted).
Our DMS Software Reengineering Toolkit with its C++ Front End has a proven track record at analyzing and transformating very large sets of C++ code. See the technical papers for some detailed use cases. I don't think the other tools at the Wikipedia site handle C++, although they have the right mindset.
Although it isn't really a PTS (no source-to-source transformations), Clang might work, too. I'm not sure (since I don't use it all), how it can collect type information and use it to drive transformations to the source code. Its clearly very good at using such information to do LLVM code generation.