I need to do a big trick and am keen on hearing your suggestions.
What I need is a macro that takes ordinary clojure code peppered with a special "await" form. The await forms contains only clojure code and are supposed to return the code's return value. Now, what I want is that when I run whatever is being produced by this macro, it should stop executing when the first "await" form is due for evaluation.
Then, it should dump all the variables defined in its scope so far to the database (I will ignore the problem that not all Clojure types can be serialised to EDN, e.g. functions can't), together with some marker of the place it has stopped in.
Then, if I want to run this code again (possibly on a different machine, another day) - it will read its state from the DB and continue where it stopped.
Therefore I could have, for example:
(defexecutor my-executor
(let [x 7
y (await (+ 3 x))]
(if (await (> y x))
"yes"
"no")))
Now, when I do:
(my-executor db-conn "unique-job-id")
the first time I should get a special return value, something like
:deferred
The second time it should be like this as well, only the third time a real return value should be returned.
The question I have is not how to write such executor, but rather how to gather information from within the macro about all the declared variables to be able to store them. Later I also want to re-establish them when I continue execution. The await forms can be nested, of course :)
I had a peek into core.async source code because it is doing a similar thing inside, but what I have found there made me shiver - it seems they employ the Clojure AST analyser to get this info. Is this really so complex? I know of &env variable inside a macro, but do not have any idea how to use it in this situation. Any help would be appreciated.
And one more thing. Please do not ask me why I need this or that there is a different way of solving a problem - I want this specific solution.
I will ignore the problem that not all Clojure types can be serialised to EDN, e.g. functions can't
If you ignore this, it will be very restrictive for the kinds of Clojure expressions you can handle. Functions are everywhere, e.g. in the implementation of things like doseq and for. Likewise, a lot of interesting programs will depend on some Java object like a file handle or whatever.
The question I have is not how to write such executor, but rather how to gather information from within the macro about all the declared variables to be able to store them.
If you manage to write such an executor, I suspect its implementation will need to know about local variables anyway. So you can put off this question until you are done implementing your executor - you will probably find it obsolete, if you can implement your executor.
I had a peek into core.async source code because it is doing a similar thing inside, but what I have found there made me shiver - it seems they employ the Clojure AST analyser to get this info. Is this really so complex?
Yes, this is very intrusive. You are basically writing a compiler. Thank your lucky stars they wrote the analyzer for you already, instead of having to analyze expressions yourself.
I know of &env variable inside a macro, but do not have any idea how to use it in this situation.
This is the easy part. If you like, you can write a simple macro that gives you all the locals in scope. This question has been asked and answered before, e.g. in Clojure get local lets.
And one more thing. Please do not ask me why I need this or that there is a different way of solving a problem - I want this specific solution.
This is generally an unproductive attitude when asking a question. It's admitting you're posing an XY problem, and still refusing to tell anyone what the Y is.
Related
Quite often, I swap! an atom value using an anonymous function that uses one or more external values in calculating the new value. There are two ways to do this, one with what I understand is a closure and one not, and my question is which is the better / more efficient way to do it?
Here's a simple made-up example -- adding a variable numeric value to an atom -- showing both approaches:
(def my-atom (atom 0))
(defn add-val-with-closure [n]
(swap! my-atom
(fn [curr-val]
;; we pull 'n' from outside the scope of the function
;; asking the compiler to do some magic to make this work
(+ curr-val n)) ))
(defn add-val-no-closure [n]
(swap! my-atom
(fn [curr-val val-to-add]
;; we bring 'n' into the scope of the function as the second function parameter
;; so no closure is needed
(+ curr-val val-to-add))
n))
This is a made-up example, and of course, you wouldn't actually write this code to solve this specific problem, because:
(swap! my-atom + n)
does the same thing without any need for an additional function.
But in more complicated cases you do need a function, and then the question arises. For me, the two ways of solving the problem are of about equal complexity from a coding perspective. If that's the case, which should I prefer? My working assumption is that the non-closure method is the better one (because it's simpler for the compiler to implement).
There's a third way to solve the problem, which is not to use an anonymous function. If you use a separate named function, then you can't use a closure and the question doesn't arise. But inlining an anonymous function often makes for more readable code, and I'd like to leave that pattern in my toolkit.
Thanks!
edit in response to A. Webb's answer below (this was too long to put into a comment):
My use of the word "efficiency" in the question was misleading. Better words might have been "elegance" or "simplicity."
One of the things that I like about Clojure is that while you can write code to execute any particular algorithm faster in other languages, if you write idiomatic Clojure code it's going to be decently fast, and it's going to be simple, elegant, and maintainable. As the problems you're trying to solve get more complex, the simplicity, elegance and maintainability get more and more important. IMO, Clojure is the most "efficient" tool in this sense for solving a whole range of complex problems.
My question was really -- given that there are two ways that I can solve this problem, what's the more idiomatic and Clojure-esque way of doing it? For me when I ask that question, how 'fast' the two approaches are is one consideration. It's not the most important one, but I still think it's a legitimate consideration if this is a common pattern and the different approaches are a wash from other perspectives. I take A. Webb's answer below to be, "Whoa! Pull back from the weeds! The compiler will handle either approach just fine, and the relative efficiency of each approach is anyway unknowable without getting deeper into the weeds of target platforms and the like. So take your hint from the name of the language and when it makes sense to do so, use closures."
closing edit on April 10, 2014
I'm going to mark A. Webb's answer as accepted, although I'm really accepting A. Webb's answer and omiel's answer -- unfortunately I can't accept them both, and adding my own answer that rolls them up seems just a bit gratuitous.
One of the many things that I love about Clojure is the community of people who work together on it. Learning a computer language doesn't just mean learning code syntax -- more fundamentally it means learning patterns of thinking about and understanding problems. Clojure, and Lisp behind it, has an incredibly powerful set of such patterns. For example, homoiconicity ("code as data") means that you can dynamically generate code at compile time using macros, or destructuring allows you to concisely and readably unpack complex data structures. None of the patterns are unique to Clojure, but Clojure brings them all together in ways that make solving problems a joy. And the only way to learn those patterns is from people who know and use them already. When I first picked Clojure more than a year ago, one of the reasons that I picked it over Scala and other contenders was the reputation of the Clojure community for being helpful and constructive. And I haven't been disappointed -- this exchange around my question, like so many others on StackOverflow and elsewhere, shows how willing the community is to help a newcomer like me -- thank you!
After you figure out the implementation details of the current compiler version for the current version of your current target host, then you'll have to start worrying about the optimizer and the JIT and then the target computer's processors.
You are too deep in the weeds, turn back to the main path.
Closing over free variables when applicable is the natural thing to do and an extremely important idiom. You may assume a language named Clojure has good support for closures.
I prefer the first approach as being simpler (as long as the closure is simple) and somewhat easier to read. I often struggle reading code where you have an anonymous function immediately called with parameters ; I have to resolve to count parentheses to be sure of what's happening, and I feel it's not a good thing.
I think the only way it could be the wrong thing to do is if the closures closes over a value that shouldn't be captured, like the head of a long lazy sequence.
G'day gurus,
I've written some code that leverages a Java library that makes uses of the visitor pattern. What I'd like is to hide all the messy details of the visitor etc. behind a single Clojure function that takes the input parameter(s) and returns a simple data structure containing all of the state derived by the visitor.
The trick is that there are multiple "visitXXX" callbacks on the Java side and there's no easy way to return state back out of them (Java, being Java, assumes any state that gets built up by the various visitors is stored in instance variables).
What I've done (and which seems to work great, fwiw) is define an atom in a let block, and have each of my visitor functions swap! the atom with an updated value when they're called by the Java visitation code. I then return the deref'ed atom out the end of the main "driver" function, after the Java visitor completes.
My question is: is this an appropriate usage of an atom? If not, is there a more idiomatic way to do this?
If anyone's interested, the code in question is here.
Disclaimer: I'm still a Clojure n00b so that code is probably hideous to the more discerning eye. Comments / feedback / critiques welcome!
Thanks in advance!
Your approach using an atom is fine and looks good and clojurish.
If you are looking for other approaches as well; since you can split your problem into some code that will produce and answer (your visitor) and some other code that will need the answer when it is available, Clojure's promise and deliver functions may be well suited.
If you create the promises in the let block, then have the visitor deliver the results to the promise.
This is a followup to Clojure: pre post functions
Goal
For every Clojure function, I want to have a pre and post function that gets executed:
right before the function is evaluated and
right after the function returns
Now, I want to do this all functions in my *.clj files.
I would prefer (this is also an learning exercise) to do this at the Clojure Compiler level.
Question:
How do I get started on this? What part of the Clojure Compiler source code should I be reading? What documentation / tutorials on the internals of the Clojure Compiler I should be aware of?
Thanks!
First off, this sounds like a slightly crazy thing to do in general. There are almost certainly better ways to achieve any sensible objective (i.e. this is screaming "XY Problem"). But as long as you say it is just for a learning exercise, that is fine :-)
I can think of a couple of strategies you might want to consider before hacking the compiler:
Create your own defn macro that does the wrapping when functions are created. Obviously you'll need to make sure your own version of defn is used rather than the built-in one. Probably the simplest solution.
Walk your namespaces at runtime (after they are loaded) and redefine all functions to a wrapped version of the same function. Could get a bit messy but will certainly enhance your understanding of namespaces :-)
If you really want to hack the compiler, the easiest place to make this change would probably be just by hacking defn in core.clj
A coworker and I are Clojure newbies. We started a project a couple months back, but quickly found that we had a tough time dealing with our code base -- by 500 LOC we basically had no idea where to start with the debugging, when things went wrong (which was often). Instead of pairs, functions were getting lists, or numbers, or what-have-you.
Now we're starting a new but related project and migrating a lot of the old code over. But we're again hitting a wall.
We're wondering, how do we effectively manage a Clojure project, especially as we make changes to existing code?
What we've come up with:
liberal use of unit-tests
liberal use of pre-, post-conditions
informal type declarations in function comments
use defrecord/defstruct/defprotocol to implement a data model, which would really simplify testing
But post-, pre-conditions seem not to be used very often. Unit-testing + comments will only help so much. And it seems like Clojure programmers don't typically implement formal data models.
Do we just not get Clojure? How do Clojure programmers know that their code is robust and correct?
I think this is actually an evolving area - Clojure hasn't really been around long enough for all of the best practices and associated tools for managing a large code base to be developed yet.
Some suggestions from my experience:
Structure your code in a "bottom up" way - in general, the way you want to structure you code will have the "utility" code at the top of the file (or imported from another namespace) and the "business logic" code that uses these utility functions towards the end of the file. If this seems difficult to do, then it's probably a hint that your code needs some refactoring.
Tests as examples - Test code in clojure works very well both to sanity check your code but also as documentation (e.g. "what kind of parameter is this function expecting?"). If you hit a bug, refer to your tests to check your assumptions and write a couple of new tests to flush out what is going wrong.
Keep functions simple and compose them - Kind of an extension of the "single responsibility principle" to functional programming. I consider more than 5-10 lines in a Clojure function as a major code smell (if this seems extreme, just remember that you can probably achieve as much in 5-10 lines of Clojure as you could with 50-100 lines of Java/C#)
Watch out for "imperative habits" - when I first started using Clojure, I wrote a lot of pseudo-imperative code in Clojure. An example would be emulating a for loop with "dotimes" and accumulating some result within an atom. This can be painful - it's not idiomatic, it's confusing and usually there is a much smarter, simpler and less error-prone functional way of doing it. This takes practice, but it is worth it in the long run...
Debug at the REPL - usually when I hit an issue, coding at the REPL is the easiest way to flush it out. Generally this means running some specific parts of the larger algorithm to check assumptions etc.
Refactor common utility functions out - you'll probably find a bunch of common or structure repeated in many functions. Well worth pulling this out into a function or macro that you can re-use in other places or projects - that way you can test it much more rigorously and have the benefits in multiple places. Bonus points if you can get it all the way upstream into Clojure itself! If you do this well enough, then your main code base will be extremely succinct and therefore easy to manage, containing nothing but the genuinely domain-specific code.
simple composable abstractions
"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." - Alan J. Perlis
For me its all about composing simple functions. Try to break every function down into the smallest units you can and then have another function that composes them to do the work your need. You know you are in good shape is every function can be tested independently. If you go too heavy on the macroes then it can make this step harder because macroes compose differently.
D.R.Y, Seriously, just don't repeat yourself
starting with well decomposed functions in a a bunch of namespaces; every time I need one of the composable parts somewhere else I "hoist" that function up to a library included by both namespaces. This way your commonly used abstractions sort of evolve over the course of the project into "just enough framework". It is very difficult to do this unless you really have discrete composable abstractions.
Sorry to dig up this old question, the answers by mikera and Arthur are excellent, but it's something I've also wondered about as I've been learning Clojure, and thought I'd mention how we organise files.
In a similar vein to ensuring each function has a single job, we group related functions into namespaces to make it easier to navigate the code. So we might have a namespace for functions providing access to a particular database, or providing a collection of HTTP-related utilities. This keeps each file relatively small, and makes tests easier to find. It also makes refactoring much more straightforward. This is hardly anything new, but it's worth bearing in mind.
I've written a handful of basic 2D shooter games, and they work great, as far as they go. To build upon my programming knowledge, I've decided that I would like to extend my game using a simple scripting language to control some objects. The purpose is more about the general process of design of writing a script parser / executer than the actual control of random objects.
So, my current line of thought is to make use of a container of lambda expressions (probably a map). As the parser reads each line, it will determine the type of expression. Then, once it has decided the type of instruction and discovered whatever values it has to work with, it will then open the map to the kind of expression and pass it any values it needs to work.
A more-or-less pseudo code example would be like this:
//We have determined somehow or another that this is an assignment operator
someContainerOfFunctions["assignment"](whatever_variable_we_want);
So, what do you guys think of a design like this?
Not to discourage you, but I think you would get more out of embedding something like Squirrel or Lua into your project and learning to use the API and the language itself. The upside of this is that you'll have good performance without having to think about the implementation.
Implementing scripting languages (even basic ones) from scratch is quite a task, especially when you haven't done one before.
To be honest: I don't think it's a good idea as you described, but does have potential.
This limits you with an 'annoying' burden of C++'s static number of arguments, which is may or may not what you want in your language.
Imagine this - you want to represent a function:
VM::allFunctions["functionName"](variable1);
But that function takes two arguments! How do we define a dynamic-args function? With "..." - that means stdargs.h and va_list. unfortunately, va_list has disadvantages - you have to supply an extra variable that will somehow be of an information to you of how many variables are there, so we change our fictional function call to:
VM::allFunctions["functionName"](1, variable1);
VM::allFunctions["functionWithtwoArgs"](2, variable1, variable2);
That brings you to a new problem - During runtime, there is no way to pass multiple arguments! so we will have to combine those arguments into something that can be defined and used during runtime, let's define it (hypothetically) as
typedef std::vector<Variable* > VariableList;
And our call is now:
VM::allFunctions["functionName"](varList);
VM::allFunctions["functionWithtwoArgs"](varList);
Now we get into 'scopes' - You cannot 'execute' a function without a scope - especially in embedded scripting languages where you can have several virtual machines (sandboxing, etc...), so we'll have to have a Scope type, and that changes the hypothetical call to:
currentVM->allFunctions["functionName_twoArgs"].call(varList, currentVM->currentScope);
I could continue on and on, but I think you get the point of my answer - C++ doesn't like dynamic languages, and it would most likely not change to fit it, as it will most likely change the ABI as well.
Hopefully this will take you to the right direction.
You might find value in Greg Rosenblatt's series of articles of at GameDev.net on creating a scripting engine in C++ ( http://www.gamedev.net/reference/articles/article1633.asp ).
The approach he takes seems to err on the side of minimalism and thus may be either a close fit or a good source of implementation ideas.