How to make JIT compilation dependent on the values of function variables? - llvm

I am using LLVM to implement a simple language.
When a function that has not yet been JITTed yet and is being called from other already compiled function, I want to lazily JIT-compile it on demand. There is a neat tutorial on this here: https://llvm.org/docs/tutorial/BuildingAJIT4.html and generally it works.
Now, I want one additional feature - I want to make AST->IR compilation conditional on parameter's value the first time the function is called.
That is, let us say we have two functions, f() and g(int x).
In AST I have two g implementations - one for positive and one for negative x:
g_pos(x) = x + 1;
g_neg(x) = x - 1;
This can be more general, but the idea is that I want to lift the condition x > 0 out of my programme and into the JIT level.
So if now f() = g(7), after JITting I will have f always call g_pos(x).
Is there any way to get this kind of behaviour?
P.S.:
I would really appreciate as little as possible of "you should not do it" or "why would you need that" answers.

Related

Why use int functions over void?

I was looking over some example functions and methods (I'm currently in a C++ class), and I noticed that there were a few functions that, rather than being void, they were something like
int myFunction() {
// ...;
return 0;
}
Where the ellipses is obviously some other statement. Why are they returning zero? What's the point of returning a specific value every time you run a function?
I understand that main() has to be int (at least according to the standards) because it is related (or is?) the exit code and thus works with the operating system. However, I can't think of a reason a non-main function would do this.
Is there any particular reason why someone might want to do this, as opposed to simply making a void function?
If that's really what they're doing, returning 0 regardless of what the function does, then it's entirely pointless and they shouldn't be doing it.
In the C world, an int return type is a convention so that you can return your own "error code", but not only is this not idiomatic C++ but if, again, your programmer is always returning 0, then it's entirely silly.
Specifically:
I understand that main() has to be int (at least according to the standards) because it is related (or is?) the exit code and thus works with the operating system. However, I can't think of a reason a non-main function would do this.
I agree.
There's a common convention of int functions returning 0 for success and some non-zero error code for failure.
An int function that always returns 0 might as well be a void function if viewed in isolation. But depending on the context, there might be good reasons to make it compatible with other functions that returning meaningful results. It could mean that the function's return type won't have to be changed if it's later modified so it detects errors -- or it might be necessary for its declaration to be compatible with other int-returning functions, if it's used as a callback or template argument.
I suggest examining other similar functions in the library or program.
It's a convention, particularly among C programmers, to return 0 if the function did not experience any errors and return a nonzero value if there was an error.
This has carried over into C++, and although it's less common and less of a convention due to exception handling and other more object-oriented-friendly ways of handling errors, it does come up often enough.
One more issue that was not touched by other answers. Within the ellipses may be another return statement:
int myFunction() {
// ...;
if (error)
return code;
// ...;
return 0;
}
in which case myFunction is not always returning 0, but rather only when no error has occurred. Such return statements are often preferred over more structured but more verbose if/else code blocks, and may often be disguised within long, sloppy code.
Most of the time function like this should be returning void.
Another possibility is that this function is one of a series of closed-related functions that have the same signature. The return int value may signal the status, say returning 0 for success, and a few of these functions always succeed. To change the signature may break the consistency, or would make the function unusable as function objects since the signature does not match.
Is there any particular reason why someone might want to do this, as opposed to simply making a void function?
Why does your mother cut the ends off the roast before putting it in the oven? Answer: Because that's what her grandmother did. However, her grandmother did that for a simple reason: Her roast pan wasn't big enough to hold a full-sized roast.
I work with a simulation tool that in its earliest incarnations required that all functions callable by the simulation engine must return a success status: 0=success, non-zero=failure. Functions that could never fail were coded to always returned zero. The simulation engine has been able to accommodate functions that return void for a long, long, time. That returning an integer success code was the required behavior from some previous millennium hasn't stopped cargo cult programmers from carrying this behavior of writing functions that always returning zero forward to the current day.
In certain programming languages you find procedures and functions. In C, C++ and similar languages you don't. Rather you only have functions.
In practice, a procedure is a part of a program that performs a certain task. A function on the other hand is like a procedure but the function can return an answer back.
Since C++ has only functions, how would you create a procedure? That's when you would either create a void function or return any value you like to show that the task is complete. It doesn't have to be 0. You can even return a character if you like to.
Take for example, the cout statement. It just outputs something but not return anything. This works like a procedure.
Now consider a math function like tan(x). It is meant to use x and return an answer back to the program that called it. In this case, you cannot return just anything. You must return the value of the TAN operation.
So if you need to write your own functions, you must return a value based on what you're doing. If there's nothing to return, you may just write a void function or return a dummy value like 0 or anything else.
In practice though, it's common to find functions returning 0 to indicate that 'all went off well' but this is not necessarily a rule.
here's an example of a function I would write, which returns a value:
float Area ( int radius)
{
float Answer = 3.14159 * radius * radius;
return Answer;
}
This takes the radius as a parameter and returns the calculated answer (area). In this case you cannot just say return 0.
I hope this is clear.

llvm function wrapper for timing

I would like to add a function wrapper in order to record the entry and exit times of certain functions. It seems that LLVM would be a good tool to accomplish this. However, I've been having trouble finding a tutorial on how to write function wrappers. Any suggestions?
p.s. my target language is C
Assuming you need to call func_start when entering each function and func_return when returning, the easiest way is to do the following:
for each function F
insert a call to func_start(F) before the first instruction in the entry block
for each block B in function F
get the terminator instruction T
if T is a return instruction
insert a call to func_return(F) before T
All in all, including boilerplate code for your FunctionPass, wou'll have to write about 40 lines of code for this.
If you really want to go with the wrapper approach you have to do:
for each function F
clone function F (call it G)
delete all instructions in F
insert a call to func_start(F) in F
insert a call to G in F (forwarding the arguments), put the return value in R
insert a call to func_return(F) in F
insert a return instruction returning R in F
The code complexity in this case will be slightly higher and you'll likely incur in a higher compile- and run-time overhead.
I like doing this and use several approaches, depending on the circumstance.
The easiest if you are on a Linux platform is to use the wonderful ltrace utility. You provide the C program you are timing as an argument to ltrace. The "-T" option will output the elapsed call time. If you want a summary of call times use the "-c" option. You can control the amount of output by using the "-e" and "--library" options. Other platforms have somewhat similar tools (like dtrace) but they are not quite as easy to use.
Another, slightly hackish approach is to use macros to redefine the function names. This has all the potential pitfalls of macros but can work well in a controlled environment for smallish programs. The C preprocessor will not recursively expand macros so you can just call the actual function from inside your wrapper macro at the point of call. This avoids the difficulty of placing the "stop timing" code before each potential return in the function body.
#define foo(a,b,c) ({long t0 = now(); int retval = foo(a,b,c); long elapsed = now() - t0; retval;})
Notice the use of the non-standard code block inside an expression. This avoids collisions of the temporary names used for timing and retval. Also by placing retval as the last expression in the statement list this code will time function calls that are embedded in assignments or other expressional contexts (you need to change the type of "retval" to whatever is appropriate for your function).
You must be very careful NOT to include the #define before prototypes and such.
Use your favorite timer function and its appropriate data type (double, long long, whatever). I like <chrono> in C++11 myself.

Name variable Lua

I have the following code in Lua:
ABC:
test (X)
The test function is implemented in C + +. My problem is this: I need to know what the variable name passed as parameter (in this case X). In C + + only have access to the value of this variable, but I must know her name.
Help please
Functions are not passed variables; they are passed values. Variables are just locations that store values.
When you say X somewhere in your Lua code, that means to get the value from the variable X (note: it's actually more complicated than that, but I won't get into that here).
So when you say test(X), you're saying, "Get the value from the variable X and pass that value as the first parameter to the function test."
What it seems like you want to do is change the contents of X, right? You want to have the test function modify X in some way. Well, you can't really do that directly in Lua. Nor should you.
See, in Lua, you can return values from functions. And you can return multiple values. Even from C++ code, you can return multiple values. So whatever it is you wanted to store in X can just be returned:
X = test(X)
This way, the caller of the function decides what to do with the value, not the function itself. If the caller wants to modify the variable, that's fine. If the caller wants to stick it somewhere else, that's also fine. Your function should not care one way or the other.
Also, this allows the user to do things like test(5). Here, there is no variable; you just pass a value directly. That's one reason why functions cannot modify the "variable" that is passed; because it doesn't have to be a variable. Only values are passed, so the user could simply pass a literal value rather than one stored in a variable.
In short: you can't do it, and you shouldn't want to.
The correct answer is that Lua doesn't really support this, but there is the debug interface. See this question for the solution you're looking for. If you can't get a call to debug to work directly from C++, then wrap your function call with a Lua function that first extracts the debug results and then calls your C++ function.
If what you're after is a string representation of the argument, then you're kind of stuck in lua.
I'm thinking something like in C:
assert( x==y );
Which generates a nice message on failure. In C this is done through macros.
Something like this (untested and probably broken).
#define assert(X) if(!(X)) { printf("ASSERION FAILED: %s\n", #X ); abort(); }
Here #X means the string form of the arguments. In the example above that is "x==y". Note that this is subtly different from a variable name - its just the string used in the parser when expanding the macro.
Unfortunately there's no such corresponding functionality in lua. For my lua testing libraries I end up passing the stringified version as part of the expression, so in lua my code looks something like this:
assert( x==y, "x==y")
There may be ways to make this work as assert("x==y") using some kind of string evaluation and closure mechanism, but it seemed to tricky to be worth doing to me.
EDIT:
While this doesn't appear to be possible in pure lua, there's a patched version that does seem to support macros: http://lua-users.org/wiki/LuaMacros . They even have an example of a nicer assert.

Transforming Lisp to C++

I am working on a toy language that compiles to C++ based on lisp (very small subset of scheme), I am trying to figure out how to represent let expression,
(let ((var 10)
(test 12))
(+ 1 1)
var)
At first I thought execute all exprs then return the last one but returning will kill my ability to nest let expressions, what would be the way to go for representing let?
Also, any resources on source to source transformation is appriciated, I have googled but all I could fing is the 90 min scheme compiler.
One way to expand let is to treat it as a lambda:
((lambda (var test) (+ 1 1) var) 10 12)
Then, transform this to a function and a corresponding call in C++:
int lambda_1(int var, int test) {
1 + 1;
return var;
}
lambda_1(10, 12);
So in a larger context:
(display (let ((var 10)
(test 12))
(+ 1 1)
var))
becomes
display(lambda_1(10, 12));
There are a lot more details, such as needing to access lexical variables outside the let from within the let. Since C++ doesn't have lexically nested functions (unlike Pascal, for example), this will require additional implementation.
I'll try to explain a naive approach to compiling nested
lambdas. Since Greg's explanation of expanding let into a lambda
is very good, I won't address let at all, I'll assume that let is
a derived form or macro and is expanded into a lambda form that is
called immediately.
Compiling Lisp or Scheme functions directly into C or C++ functions
will be tricky due to the issues other posters raised. Depending on
the approach, the resulting C or C++ won't be recognizeable (or even
very readable).
I wrote a Lisp-to-C compiler after finishing Structure and Interpretation of Computer Programs (it's one of the final exercises, and actually I cheated and just wrote a translator from SICP byte code to C). The subset of C that it emitted didn't use C functions to handle Lisp functions at all. This is because the
register machine language in chapter 5 of SICP is really lower level
than C.
Assuming that you have some form of environments, which bind names to values, you can define the crux of function calling like this: extend the environment which the function was defined in with the formal parameters bound to the arguments, and then evaluate the body of the function in this new environment.
In SICP's compiler, the environment is held in a global variable, and there are other
global variables holding the argument list for a function call, as
well as the procedure object that is being called (the procedure object includes a pointer to the environment in which it was defined), and a label to jump to when a function returns.
Keep in mind that when you are compiling a lambda expression, there
are two syntactic components you know at compile-time: the formal
parameters and the body of the lambda.
When a function is compiled, the emitted code looks something like
this pseudocode:
some-label
*env* = definition env of *proc*
*env* = extend [formals] *argl* *env*
result of compiling [body]
...
jump *continue*
... where *env* and *argl* are the global variables holding the
environment and argument list, and extend is some function (this can
be a proper C++ function) that extends the environment *env* by
pairing up names in *argl* with values in [formals].
Then, when the compiled code is run, and there is a call to this
lambda somewhere else in your code, the calling convention is to put
the result of evaluating the argument list into the *argl* variable, put the return label in the *continue* variable, and then jump to some-label.
In the case of nested lambdas, the emitted code would look something
like this:
some-label
*env* = definition env of *proc*
*env* = extend [formals-outer] *argl* *env*
another-label
*env* = definition env of *proc*
*env* = extend [formals-inner] *argl* *env*
result of compiling [body-inner]
...
jump *continue*
rest of result of compiling [body-outer]
... somewhere in here there might be a jump to another-label
jump *continue*
This is a bit difficult to explain, and I'm sure I've done a muddled
job of it. I can't think of a decent example that doesn't involve me basically sloppily describing the whole chapter 5 of SICP. Since I spent the time to write this answer, I'm going to post it, but I'm very sorry if it's hopelessly confusing.
I strongly recommend SICP and Lisp in Small Pieces.
SICP covers metacircular interpretation for beginners, as well as a number of variants on the interpreter, and a byte code compiler which I have managed to obfuscate and mangle above. That's just the last two chapters, the first 3 chapters are just as good. It's a wonderful book. Absolutely read it if you haven't yet.
L.i.S.P includes a number of interpreters written in Scheme, a compiler to byte code and a compiler to C. I'm in the middle of it and can say with confidence it's a deep, rich book well worth the time of anyone interested in the implementation of Lisp. It may be a bit dated by this point, but for a beginner like me, it's still valuable. It's more advanced than SICP though, so beware. It incudes a chapter in the middle on denotational semantics which went basically right over my head.
Some other notes:
Darius Bacon's self-hosting Lisp to C compiler
lambda lifting, which is a more advanced technique that I think Marc Feeley uses
If you're looking for tools to help with source-to-source translation, I'd recommend ANTLR. It is most excellent.
However, you'll need to think about how to translate from a loosely-typed language (lisp) to a less-loosely-typed language (c). For example, in your question, what is the type of 10? A short? An int? A double?

In what language is there a "guard" keyword or concept?

I recently tried to understand a C++ program that was written by somebody who I assume had a background in functional programming: For example, he had declared a closure class which he extensively used and which does somewhat resemble what is known as a closure in functional programming. Another class was called a guard, but I haven't quite figured out yet what it is good for. It seems to have some sort of cleanup functionality attached to it.
The only language in which I have seen a concept called guard is Erlang, but that does not remotely look similar to the code I found. In what other languages does such a concept exist that the author of the C++ code may have alluded to?
To me it sounds like he was using RAII.
The class constructor/destructor is used to symetrically handle some form of resource allocation/release in an exception safe context (What Java programmers would call finally {} as the destructor is guranteed to be called.).
This is a very common C++ idiom and is ued extensively in modern C++.
Did the code look like this:
void Plop()
{
Guard guard(lock);
// Do lots of stuff
}
Here the guard is locking the lock in the constructor and unlocking the lock in the destructor.
The term "guard" is used in several functional languages the way it is used in Erlang, but that usage doesn't seem to fit your description. Without seeing the C++ code it's hard to really know what was intended by it.
A guess by your description would be that it implements something like Haskell's bracket, which basically ensures that some resources are released if the wrapped function exits, even if that happened by an exception. In Python one would use finally for this, in C++ you usually have the cleanup code in the destructor of an object on the stack.
In general terms, a guard is simply a construct which needs to evaluate to true for execution along some path to continue. This or something like it exists in all useful Turing-complete programming languages, but is perhaps so basic that is often not named separately as an entity. Here's a simple example in Haskell:
f x
| x < 0 = -x
| otherwise = x
This is equivalent to the absolute value function: negate a number if it's negative to produce its positive counterpart; otherwise, return the same value passed in. There are two guards here: x < 0, which is true when x is less than zero, and otherwise, which is always true.
Haskell's Control.Monad module has guard:
guard :: MonadPlus m => Bool -> m ()
guard b is return () if b is True, and mzero if b is False.
For example, to compute Pythagorean triples where each leg is no longer than 25, you could use
triples = do
a <- [1..25]
b <- [a..25]
c <- [b..25]
guard (p a b c)
return (a,b,c)
where
p a b c = a*a + b*b == c*c
For an explanation of what's going on, see my blog post A programmable semicolon explained.
Guards in computer science typically refer to the Boolean expression that indicates that a looping construct should continue. For example (pardon the pun)
for (int i = 0; i < N; ++i)
/* stuff */
Here, i < N is the guard.
It's difficult to answer your question more thoroughly without more information.