I'm interested in trying literate programming. However, I often that the requirements are stated in a general but then exceptions are given much later.
For example in one section it will say something like Students are not allowed in the hallways while classes are in session.
But then later there will be section where it says something like Teachers may give a student a hall pass at which point the student may be in the hall while class is in session.
So I'd like to be able to define allowedInTheHall following the first section so that it doesn't allow students in the hall, but then after the second section redefines allowedInTheHall so that it first checks for the presence of a hall pass, and if it's missing then delegates back to the previous definition.
So the only way I can imagine this working would be a language where:
you can redefine a method/function/subroutine in terms of it's previous definition
where only the latest version of a function gets called even if the caller was defined before the latest redefinition of the callee (I believe this is called "late binding").
So which languages fulfill support these criteria?
PS- my motivation is that I am working with existing requirements (in my case game rules) and I want to embed my code into the existing rules so that the code follows the structure of the rules that people are already familiar with. I assume that this situation would also arise trying to implement a legal contract.
Well to answer the direct question,
you can redefine a method/function/subroutine in terms of it's previous definition
...in basically any language, as long as it supports two features:
mutable variables that can hold function values
some kind of closure forming operator, which effectively amounts to the ability to create new function values
So you can't do it in C, because even though it allows variables to store function pointers, there's no operation in C that can compute a new function value; and you can't do it in Haskell because Haskell doesn't let you mutate a variable once it's been defined. But you can do it in e.g. JavaScript:
var f1 = function(x) {
console.log("first version got " + x);
};
function around(f, before, after) {
return function() {
before(); f.apply(null, arguments); after();
};
}
f1 = around(f1,
function(){console.log("added before");},
function(){console.log("added after");});
f1(12);
or Scheme:
(define (f1 x) (display "first version got ") (display x) (newline))
(define (around f before after)
(lambda x
(before) (apply f x) (after) ))
(set! f1 (around
f1
(lambda () (display "added before") (newline))
(lambda () (display "added after") (newline))))
(f1 12)
...or a whole host of other languages, because those are really rather common features. The operation (which I think is generally called "advice") is basically analogous to the ubiquitous x = x + 1, except the value is a function and the "addition" is the wrapping of extra operations around it to create a new functional value.
The reason this works is that by passing the old function in as a parameter (to around, or just a let or whatever) the new function is closing over it referred to through a locally scoped name; if the new function referred to the global name, the old value would be lost and the new function would just recurse.
Technically you could say this is a form of late binding - the function is being retrieved from a variable rather than being linked in directly - but generally the term is used to refer to much more dynamic behaviour, such as as JS field access where the field might not even actually exist. In the above case the compiler can at least be sure the variable f1 will exist, even if it turns out to hold null or something, so lookup is fast.
Other functions that call f1 would work the way you expect assuming that they reference it by that name. If you did var f3 = f1; before the around call, functions defined as calling f3 wouldn't be affected; similarly objects that got a hold of f1 by having it passed in as a parameter or something. Basic lexical scoping rule applies. If you want such functions to be affected too, you could pull it off using something like PicoLisp... but you're also doing something you probably shouldn't (and that's not any kind of binding any more: that's direct mutation of a function object).
All that aside, I'm not sure this is in the spirit of literate programming at all - or for that matter, a program that describes rules. Are rules supposed to change depending on how far you are through the book or in what order you read the chapters? Literate programs aren't - just as a paragraph of text usually means one thing (you may not understand it, but its meaning is fixed) no matter whether you read it first or last, so should a declaration in a true literate program, right? One doesn't normally read a reference - such as a book of rules - from cover to cover like a novel.
Whereas designed like this, the meaning of the program is highly dependent on being read with the statements in one specific order. It's very much a machine-friendly series-of-instructions... not so much a reference book.
Related
I am new to Clojure, but not to lisp. A few of the design decisions look strange to me - specifically requiring a vector for function parameters and explicitly requesting tail calls using recur.
Translating lists to vectors (and vice versa) is a standard operation for an optimiser. Tail calls can be converted to iteration by rewriting to equivalent clojure before compiling to byte code. The [] and recur syntax suggest that neither of these optimisations are present in the current implementation.
I would like a pointer to where in the implementation I can find any/all source-to-source transformation passes. I don't speak Java very well so am struggling to navigate the codebase.
If there isn't any optimisation before function-by-function translation to the JVM's byte code, I'd be interested in the design rationale for this. Perhaps to achieve faster compilation?
Thank you.
There is no explicit optimizer package in the compiler code. Any optimizations are done "inline". Some can be enabled or disabled via compiler flags.
Observe that literal vectors for function parameters are a syntactic choice how functions are represented in source code. Whether they are represented as vectors or list or anything else would not affect runtime and cannot be optimized hence.
Regarding automatic recur, Rich Hickey explained his decision here:
When speaking about general TCO, we are not just talking about
recursive self-calls, but also tail calls to other functions. Full TCO
in the latter case is not possible on the JVM at present whilst
preserving Java calling conventions (i.e without interpreting or
inserting a trampoline etc).
While making self tail-calls into jumps would be easy (after all,
that's what recur does), doing so implicitly would create the wrong
expectations for those coming from, e.g. Scheme, which has full TCO.
So, instead we have an explicit recur construct.
Essentially it boils down to the difference between a mere
optimization and a semantic promise. Until I can make it a promise,
I'd rather not have partial TCO.
Some people even prefer 'recur' to the redundant restatement of the
function name. In addition, recur can enforce tail-call position.
specifically requiring a vector for function parameters
Most other lisps build structures out of syntactic lists. For an associative "map" for example, you build a list of lists. For a "vector", you make a list. For a conditional switch-like expression, you make a list of lists of lists. Lots of lists, lots of parenthesis.
Clojure has made it an obvious goal to make the syntax of lisp more readable and less redundant. A map, set, list, vector all have their own syntax delimiters so they jump out at the eye, while also providing specific functionality that otherwise you'd have to explicitly request using a function if they were all lists. In addition to these structural primitives, other functions like cond minimize the parentheses by removing one layer of parentheses for each pair in the expression, rather than additionally wrapping each pair in yet another grouped parenthesis. This philosophy is widespread throughout the language and its core library so the code is more readable and elegant.
Function parameters as a vector are just part of this syntax. It's not about whether the language can convert a list to a vector easily, it's about how the language requires the placement of function parameters in a function definition -- and it does so by explicitly requiring a vector. And in fact, you can see this clearly in the source for defn:
https://github.com/clojure/clojure/blob/clojure-1.7.0/src/clj/clojure/core.clj#L296
It's just a requirement for how a function is written, that's all.
We have a course whose project is to implement a micro-scheme interpreter in C++. In my implementation, I treat 'if', 'define', 'lambda' as procedures, so it is valid in my implementation to eval 'if', 'define' or 'lambda', and it is also fine to write expressions like '(apply define (quote (a 1)))', which will bind 'a' to 1.
But I find in racket and in mit-scheme, 'if', 'define', 'lambda' are not evaluable. For example,
It seems that they are not procedures, but I cannot figure out what they are and how they are implemented.
Can someone explain these to me?
In the terminology of Lisp, expressions to be evaluated are forms. Compound forms (those which use list syntax) are divided into special forms (headed by special operators like let), macro forms, and function call forms.
The Scheme report desn't use this terminology. It calls functions "procedures". Scheme's special forms are called "syntax". Macros are "derived expression types", individually introduced as "library syntax". (The motivation for this may be some conscious decision to blend into the CS academic mainstream by scrubbing some unfamiliar Lisp terminology. Algol has procedures and a BNF-defined syntax, Scheme has procedures and a BNF-defined syntax. That ticks off some sort of familiarity checkbox.)
Special forms (or "syntax") are recognized by interpreters and compilers as a set of special cases. The interpreter or compiler may handle these forms via function-like bindings in some internal table keyed on symbols, but it's not the program-visible binding namespace.
Setting up these associations in the regular namespace isn't necessarily wrong, but it could be problematic. If you want both a compiler and interpreter, but let has only one top-level binding, that will be an issue: who gets to install their procedure into that binding: the interpreter or compiler? (One way to resolve that is simple: make the binding values cons pairs: the car can be the interpreter function, the cdr the compiler function. But then these bindings are not procedures any more that you can apply.)
Exposing these bindings to the application is problematic anyway, because the semantics is so different between interpretation and compilation. If your interpretation is interpreted, then calling the define binding as a function is possible; it has the effect of performing the definition. But in a compiled interpretation, code depending on this won't work; define will be a function that doesn't actually define anything, but rather compiles: it calculates and returns a compiled fragment written in some intermediate representation.
About your implementation, the fact that (apply define (quote (a 1))) works in your implementation raises a bit of a red flag. Either you've made the environment parameter of the function optional, or it doesn't take one. Functions implementing special operators (or "syntax") need an environment parameter, not just the piece of syntax. (At least if we are developing a lexically scoped Scheme or Lisp!)
The fact that (apply define (quote (a 1))) works also suggests that your define function is taking quote and (a 1) as arguments. While that is workable, the usual approach for these kinds of syntax procedures is to take the whole form as one argument (and a lexical environment as another argument). If such a function can be called, the invocation looks something like like (apply define (list '(define a 1) (null-environment 5))). The procedure itself will do any necessary destructuring on the syntax, and checking for validity: are there too many or too few parameters and so on.
I stumbled upon this Rust example in Wikipedia and I am wondering if its possible to convert it to semantically equivalent C++ code?
The program defines a recursive datastructure and implements methods upon it. Recursive datastructures require a layer of indirection, which is provided by a unique pointer, constructed via the box operator. (These are analogous to the C++ library type std::unique_ptr, though with more static safety guarantees.)
fn main() {
let list = box Node(1, box Node(2, box Node(3, box Empty)));
println!("Sum of all values in the list: {:i}.", list.multiply_by(2).sum());
}
// `enum` defines a tagged union that may be one of several different kinds
// of values at runtime. The type here will either contain no value, or a
// value and a pointer to another `IntList`.
enum IntList {
Node(int, Box<IntList>),
Empty
}
// An `impl` block allows methods to be defined on a type.
impl IntList {
fn sum(self) -> int {
match self {
Node(value, next) => value + next.sum(),
Empty => 0
}
}
fn multiply_by(self, n: int) -> Box<IntList> {
match self {
Node(value, next) => box Node(value * n, next.multiply_by(n)),
Empty => box Empty
}
}
}
Apparently in C++ version Rusts enum should be replaced with union, Rusts Box should be replaced with std::unique_ptr and Rusts Node tuple should be std::tuple type but I just cant wrap my head around how to write equivalent implementation in C++.
I know this is probably not practical (and definitely not the correct way to do things in C++) but I just wanted to see how these languages compare (C++11 features flexible enough for this kind of tinkering?). I would also like to compare compiler generated assembly for semantically equivalent implementations (if even possible).
Disclaimer: I'm not a C++11 expert. Consume with a requisite dose of salt.
As others have commented, there's a few ways of interpreting your question. I'm going to go with an overly aggressive interpretation, since it's the only interesting one:
No, it is not possible to translate that Rust code into equivalent C++ code. Can you translate it into a program that provides the same output? They're both turing complete, so of course you can. Can you translate it so that all semantics in the original are preserved? No.
Most of it can be translated such that it preserves the actual behaviour. Rust-style enums can be replaced by structs with both a tag field and a union, along with writing appropriate operator overloads to ensure that you correctly destroy the members of only the variant that's actually stored. You can (presumably) use unique_ptr in such a way that the memory gets allocated first and then the new value is written directly into the allocation, so there's no copy. I believe you can rewrite fn sum(self) so that it uses an rvalue this (although I've never done this, so I could easily be wrong).
But the one thing you cannot do in C++, to my knowledge, is replicate linear types. There is no way to statically enforce that a moved value cannot be used again. The best you can do is runtime checks, which must necessarily involve additional overhead. This also plays into why you can't have a non-nullable unique_ptr: you wouldn't ever be able to move it, since you have to leave the moved variable in a usable state.
Now, that having been said, I should disclaim the previous statement by noting that currently, the Rust compiler emits some runtime checks for dropped (i.e. moved) values, in the form of drop flags. The plan, last I checked, was to remove these runtime checks in favour of purely static destruction, hopefully before 1.0.
I have designed a parameter class which allows me to write code like this:
//define parameter
typedef basic_config_param<std::string> name;
void test(config_param param) {
if(param.has<name>()) { //by name
cout << "Your name is: " << param.get<name>() << endl;
}
unsigned long & n = param<ref<unsigned long> >(); //by type
if(param.get<value<bool> >(true)) { //return true if not found
++n;
}
}
unsigned long num = 0;
test(( name("Special :-)"), ref<unsigned long>(num) )); //easy to add a number parameter
cout << "Number is: " << num; //prints 1
The performance of the class is pretty fast: everything is just a reference on the stack. And to save all the information I use an internal buffer of up to 5 arguments before it goes to heap allocation to decrease the size of every single object, but this can be easily changed.
Why isn't this syntax used more often, overloading operator,() to implement named parameters? Is it because of the potential performance penalty?
One other way is to use the named idiom:
object.name("my name").ref(num); //every object method returns a reference to itself, allow object chaining.
But, for me, overloading operator,() looks much more "modern" C++, as long you don't forget to uses double parentheses. The performance does not suffer much either, even if it is slower than a normal function, so is it negligible in most cases.
I am probably not the first one to come up with a solution like this, but why isn't it more common? I have never seen anything like the syntax above (my example) before I wrote a class which accepts it, but for me looks it perfect.
My question is why this syntax is not used more, overloading operator,() to implement named parameters.
Because it is counter-intuitive, non-human-readable, and arguably a bad programming practice. Unless you want to sabotage the codebase, avoid doing that.
test(( name("Special :-)"), ref<unsigned long>(num) ));
Let's say I see this code fragment for the first time. My thought process goes like this:
At a first glance it looks like an example of "the most vexing parse" because you use double-parentheses. So I assume that test is a variable, and have to wonder if you forgot to write variable's type. Then it occurs to me that this thing actually compiles. After that I have to wonder if this is an instance of an immediately destroyed class of type test and you use lowercase names for all class types.
Then I discover it is actually a function call. Great.
The code fragment now looks like a function call with two arguments.
Now it becomes obvious to me that this can't be a function call with two arguments, because you used double parentheses.
So, NOW I have to figure what the heck is going on within ().
I remember that there is a comma operator (which I haven't ever seen in real C++ code during the last 5 years) which discards the previous argument. SO NOW I have to wonder what is that useful side effect of name(), and what the name() is - a function call or a type (because you don't use uppercase/lowercase letters to distinguish between class/function (i.e. Test is a class, but test is a function), and you don't have C prefixes).
After looking up name in the source code, I discover that it is class. And that it overloads the , operator, so it actually doesn't discard the first argument anymore.
See how much time is wasted here? Frankly, writing something like that can get you into trouble, because you use language features to make your code look like something that is different from what your code actually does (you make a function call with one argument look like it has two arguments or that it is a variadic function). Which is a bad programming practice that is roughly equivalent to overloading operator+ to perform substractions instead of additions.
Now, let's consider a QString example.
QString status = QString("Processing file %1 of %2: %3").arg(i).arg(total).arg(fileName);
Let's say I see it for the first time in my life. That's how my thought process goes:
There is a variable status of type QString.
It is initialized from a temporary variable of type QString().
... after QString::arg method is called. (I know it is a method).
I look up .arg in the documentation to see what it does, and discover that it replaces %1-style entries and returns QString&. So the chain of .arg() calls instantly makes sense. Please note that something like QString::arg can be templated, and you'll be able to call it for different argument types without manually specifying the type of argument in <>.
That code fragment now makes sense, so I move on to another fragment.
looks very more "modern" C++
"New and shiny" sometimes means "buggy and broken" (slackware linux was built on a somewhat similar idea). It is irrelevant if your code looks modern. It should be human-readable, it should do what it is intended to do, and you should waste the minimum possible amount of time in writing it. I.e. you should (personal recommendation) aim to "implement a maximum amount of functionality in a minimum amount of time at a minimum cost (includes maintenance)", but receive the maximum reward for doing it. Also it makes sense to follow KISS principle.
Your "modern" syntax does not reduce development cost, does not reduce development time, and increases maintenance cost (counter-intuitive). As a result, this syntax should be avoided.
There is not necessity. Your dynamic dispatch (behave differently, depending on the logical type of the argument) can be implemented a) much easier and b) much faster using template specialisation.
And if you actually require a distinction based on information that is only available on runtime, I'd try to move your test function to be a virtual method of the param type and simply use dynamic binding (that's what it's for, and that's what you're kind of reinventing).
The only cases where this approach would be more useful may be multiple-dispatch scenarios, where you want to reduce code and can find some similarity patterns.
Note that I'm not talking about ear muffs in symbol names, an issue that is discussed at Conventions, Style, and Usage for Clojure Constants? and How is the `*var-name*` naming-convention used in clojure?. I'm talking strictly about instances where there is some function named foo that then calls a function foo*.
In Clojure it basically means "foo* is like foo, but somehow different, and you probably want foo". In other words, it means that the author of that code couldn't come up with a better name for the second function, so they just slapped a star on it.
Mathematicians and Haskellers can use their apostrophes to indicate similar objects (values or functions). Similar but not quite the same. Objects that relate to each other. For instance, function foo could be a calculation in one manner, and foo' would do the same result but with a different approach. Perhaps it is unimaginative naming but it has roots in mathematics.
Lisps generally (without any terminal reason) have discarded apostrophes in symbol names, and * kind of resembles an apostrophe. Clojure 1.3 will finally fix that by allowing apostrophes in names!
If I understand your question correctly, I've seen instances where foo* was used to show that the function is equivalent to another in theory, but uses different semantics. Take for instance the lamina library, which defines things like map*, filter*, take* for its core type, channels. Channels are similar enough to seqs that the names of these functions make sense, but they are not compatible enough that they should be "equal" per se.
Another use case I've seen for foo* style is for functions which call out to a helper function with an extra parameter. The fact function, for instance, might delegate to fact* which accepts another parameter, the accumulator, if written recursively. You don't necessarily want to expose in fact that there's an extra argument, because calling (fact 5 100) isn't going to compute for you the factorial of 5--exposing that extra parameter is an error.
I've also seen the same style for macros. The macro foo expands into a function call to foo*.
a normal let binding (let ((...))) create separate variables in parallel
a let star binding (let* ((...))) creates variables sequentially so that can be computed from eachother like so
(let* ((x 10) (y (+ x 5)))
I could be slightly off base but see LET versus LET* in Common Lisp for more detail
EDIT: I'm not sure about how this reflects in Clojure, I've only started reading Programming Clojure so I don't know yet