Type-hints can make a huge improvement on execution time where reflection occurs many times. My understanding of type-hints is that it just allows the compiler to cache a reflection lookup. Can that caching occur dynamically? Or is there some reason this would be bad/impossible?
From Programming Clojure:
These warnings indicate that Clojure has no way to know the type of
c. You can provide a type hint to fix this, using the metadata syntax
^Class:
(defn describe-class [#^Class c]
{:name (.getName c)
:final (java.lang.reflect.Modifier/isFinal (.getModifiers c))})
With the type hint in place, the reflection warnings will disappear. The
compiled Clojure code will be exactly the same as compiled Java code.
Further, attempts to call describe-class with something other than a Class
will fail with a ClassCastException.
So to sum up, the reflection cast isn't just cached it is eliminated.
Rich was kind enough to enlighten me:
"The real answer for the JDK proper is JSR 292, the invokedynamic instruction, which allows for the proper construction of call site caches with performance much better
than memoizaton."
Related
I'm in the process of learning Clojure, and I'm using 4Clojure
as a resource. I can solve many of the "easy" questions on the site, but for me thinking in a functional programming mindset still doesn't come naturally (I'm coming from Java). As a result, I use a loop/recur iterative pattern in most of my seq-building implementations because that's how I'm used to thinking.
However, when I look at the answers from more experienced Clojure users, they do things in a much more functional style. For example, in a problem about implementing the range function, my answer was the following:
(fn [start limit]
(loop [x start y limit output '()]
(if (< x y)
(recur (inc x) y (conj output x))
(reverse output))))
While this worked, other users did things like this:
(fn [x y] (take (- y x) (iterate inc x)))
My function is more verbose and I had no idea the "iterate" function even existed. But was my answer worse in an efficiency sense? Is loop/recur somehow worse to use than alternatives? I fear this sort of thing is going to happen a lot to me in the future, as there are still many functions like iterate I don't know about.
The second variant returns a lazy sequence, which may indeed be more efficient, especially if the range is big.
The other thing is that the second solution conveys the idea better. To put it differently, it describes the intent instead of implementation. It takes less time to understand it as compared to your code, where you have to read through the loop body and build a model of control flow in your head.
Regarding the discovery of the new functions: yes, you may not know in advance that some function is already defined. It is easier in, say, Haskell, where you can search for a function by its type signature, but with some experience you will learn to recognize the functional programming patterns like this. You will write the code like the second variant, and then look for something working like take and iterate in the standard library.
Bookmark the Clojure Cheetsheet website, and always have a browser tab open to it.
Study all of the functions, and especially read the examples they link to (the http://clojuredocs.org website).
The site http://clojure-doc.org is also very useful (yes, the two names are almost identical but not quite)
The question should not be about performance (it depends!) but about communication: when using loop/recur or plain recursion or lazy-seq or sometimes even reduce, you make your code harder to understand: because the reader has to understand how you perform your iteration before getting to understand what you are computing.
loop/recur is real Clojure, and idiomatic. It's there for a reason. And often there is no better way. But many people find that once one gets used to it, it's very convenient to build many functions out of building blocks such as iterate. Clojure has a very nice collection of them. I started out writing things from scratch using truly recursive algorithms and then loop/recur. Personally, I wouldn't claim that it's better to use the functional building blocks functions, but I've come to love using them. It's one of the things that's great about Clojure.
(Yes, the many of the building block functions are lazy, as are e.g. for and map, which are more general-purpose. Laziness can be good, but I'm not religious about it. Sometimes it's more efficient. Sometimes it's not. Sometimes it's beautiful. Sometimes it's a pain in the rear. Sometimes all that.)
Loop and recur are not bad - in fact, if you look at the source code for many of the built-in functions, you will find that is what they do - the provided functions are often an abstraction of common patterns which can make your code easier to understand. How you are doing things is typical for many when they first start. How you are approaching this seems correct to me. You are not just writing your solution and moving on. You are writing your solution and then looking at how others have solved the same problem and making a comparison. This is the right road to improvement. Highly recommend that when you find an alternative solution which seems more elegant/efficient/clear, analyse it, look at the source code of the built-in functions it uses and things will slowly come together.
loop ... recur is an optimisation for recursive tail calls, and should
always be used where it applies.
range is lazy, so your version of it should strive to be so.
loop ... recur can't do this.
All the sequence functions that can sensibly be lazy (iterate,
filter, map, take-while ...) are so. As you know, you can use some of these
to build a lazy range. As #cgrand explains, this is the preferred approach.
If you prefer, you can build a lazy range from scratch:
(defn range [x y]
(lazy-seq
(when (< x y)
(cons x (range (inc x) y)))))
I wondered the same thing for some days but truly many tims I do not see any better alternative than loop recur.
Some jobs are not fully "reduce" or "map". It is the case when you update data base on a buffer you mutates at every iteration.
Loop recur is very convienient where "non linear precise work" is require. It looks like more imperative but if I remember well Clojure was designed with pragmatism. Buy yet, pragmatism means choosing what is more effficient.
That is why in complex programs, I use both Clojure and java code mixed. sometimes java is just more clear for "low level" or iterative jobs like taking a specific value and so on while I see Clojure functions more useful for big data processing (without so much level of detail : global filters, etc.).
Some people say that we must stock with Clojure as much as possible but I do not see any reason not to use Java. I did not programmed a lot but Clojure/Java is the best interop I have ever seen, very complementary approaches.
Quite often, I swap! an atom value using an anonymous function that uses one or more external values in calculating the new value. There are two ways to do this, one with what I understand is a closure and one not, and my question is which is the better / more efficient way to do it?
Here's a simple made-up example -- adding a variable numeric value to an atom -- showing both approaches:
(def my-atom (atom 0))
(defn add-val-with-closure [n]
(swap! my-atom
(fn [curr-val]
;; we pull 'n' from outside the scope of the function
;; asking the compiler to do some magic to make this work
(+ curr-val n)) ))
(defn add-val-no-closure [n]
(swap! my-atom
(fn [curr-val val-to-add]
;; we bring 'n' into the scope of the function as the second function parameter
;; so no closure is needed
(+ curr-val val-to-add))
n))
This is a made-up example, and of course, you wouldn't actually write this code to solve this specific problem, because:
(swap! my-atom + n)
does the same thing without any need for an additional function.
But in more complicated cases you do need a function, and then the question arises. For me, the two ways of solving the problem are of about equal complexity from a coding perspective. If that's the case, which should I prefer? My working assumption is that the non-closure method is the better one (because it's simpler for the compiler to implement).
There's a third way to solve the problem, which is not to use an anonymous function. If you use a separate named function, then you can't use a closure and the question doesn't arise. But inlining an anonymous function often makes for more readable code, and I'd like to leave that pattern in my toolkit.
Thanks!
edit in response to A. Webb's answer below (this was too long to put into a comment):
My use of the word "efficiency" in the question was misleading. Better words might have been "elegance" or "simplicity."
One of the things that I like about Clojure is that while you can write code to execute any particular algorithm faster in other languages, if you write idiomatic Clojure code it's going to be decently fast, and it's going to be simple, elegant, and maintainable. As the problems you're trying to solve get more complex, the simplicity, elegance and maintainability get more and more important. IMO, Clojure is the most "efficient" tool in this sense for solving a whole range of complex problems.
My question was really -- given that there are two ways that I can solve this problem, what's the more idiomatic and Clojure-esque way of doing it? For me when I ask that question, how 'fast' the two approaches are is one consideration. It's not the most important one, but I still think it's a legitimate consideration if this is a common pattern and the different approaches are a wash from other perspectives. I take A. Webb's answer below to be, "Whoa! Pull back from the weeds! The compiler will handle either approach just fine, and the relative efficiency of each approach is anyway unknowable without getting deeper into the weeds of target platforms and the like. So take your hint from the name of the language and when it makes sense to do so, use closures."
closing edit on April 10, 2014
I'm going to mark A. Webb's answer as accepted, although I'm really accepting A. Webb's answer and omiel's answer -- unfortunately I can't accept them both, and adding my own answer that rolls them up seems just a bit gratuitous.
One of the many things that I love about Clojure is the community of people who work together on it. Learning a computer language doesn't just mean learning code syntax -- more fundamentally it means learning patterns of thinking about and understanding problems. Clojure, and Lisp behind it, has an incredibly powerful set of such patterns. For example, homoiconicity ("code as data") means that you can dynamically generate code at compile time using macros, or destructuring allows you to concisely and readably unpack complex data structures. None of the patterns are unique to Clojure, but Clojure brings them all together in ways that make solving problems a joy. And the only way to learn those patterns is from people who know and use them already. When I first picked Clojure more than a year ago, one of the reasons that I picked it over Scala and other contenders was the reputation of the Clojure community for being helpful and constructive. And I haven't been disappointed -- this exchange around my question, like so many others on StackOverflow and elsewhere, shows how willing the community is to help a newcomer like me -- thank you!
After you figure out the implementation details of the current compiler version for the current version of your current target host, then you'll have to start worrying about the optimizer and the JIT and then the target computer's processors.
You are too deep in the weeds, turn back to the main path.
Closing over free variables when applicable is the natural thing to do and an extremely important idiom. You may assume a language named Clojure has good support for closures.
I prefer the first approach as being simpler (as long as the closure is simple) and somewhat easier to read. I often struggle reading code where you have an anonymous function immediately called with parameters ; I have to resolve to count parentheses to be sure of what's happening, and I feel it's not a good thing.
I think the only way it could be the wrong thing to do is if the closures closes over a value that shouldn't be captured, like the head of a long lazy sequence.
Main question: I view the most significant application of tail call optimization (TCO) as a translation of a recursive call into a loop (in cases in which the recursive call has a certain form). More precisely, when translated into a machine language, this would usually be translation into some sort of series of jumps. Some Common Lisp and Scheme compilers that compile to native code (e.g. SBCL) can identify tail-recursive code and perform this translation. JVM-based Lisps such as Clojure and ABCL have trouble doing this. What is it about the JVM as a machine that prevents or makes this difficult? I don't get it. The JVM obviously has no problem with loops. It's the compiler that has to figure out how to do TCO, not the machine to which it compiles.
Related question: Clojure can translate seemingly recursive code into a loop: It acts as if it's performing TCO, if the programmer replaces the tail call to the function with the keyword recur. But if it's possible to get a compiler to identify tail calls--as SBCL and CCL do, for example--then why can't the Clojure compiler figure out that it's supposed to treat a tail call the way it treats recur?
(Sorry--this is undoubtably a FAQ, and I'm sure that the remarks above show my ignorance, but I was unsuccessful in finding earlier questions.)
Real TCO works for arbitrary calls in tail position, not just self calls, so that code like the following does not cause a stack overflow:
(letfn [(e? [x] (or (zero? x) (o? (dec x))))
(o? [x] (e? (dec x)))]
(e? 10))
Clearly you'd need JVM support for this, since programs running on the JVM cannot manipulate the call stack. (Unless you were willing to establish your own calling convention and impose the associated overhead on function calls; Clojure aims to use regular JVM method calls.)
As for eliminating self calls in tail position, that's a simpler problem which can be solved as long as the entire function body gets compiled to a single JVM method. That is a limiting promise to make, however. Besides, recur is fairly well liked for its explicitness.
There is a reason why the JVM does not support TCO: Why does the JVM still not support tail-call optimization?
However there is a way around this by abusing the heap memory and some trickery explained in the A First-Order One-Pass CPS Transformation paper; it is implemented in Clojure by Chris Frisz and Daniel P. Friedman (see clojure-tco).
Now Rich Hickey could have chosen to do such an optimization by default, Scala does this at some points. Instead he chose to rely on the end user to specify the cases where they can be optimized by Clojure with the trampoline or loop-recur constructs. The decision has been explained here: https://groups.google.com/d/msg/clojure/4bSdsbperNE/tXdcmbiv4g0J
In the final presentation of ClojureConj 2014, Brian Goetz pointed out there is a security feature in the JVM that prevents stack frame collapsing (as that would be an attack vector for people looking to make a function go somewhere else on return).
https://www.youtube.com/watch?v=2y5Pv4yN0b0&index=21&list=PLZdCLR02grLoc322bYirANEso3mmzvCiI
This is a followup to Clojure: pre post functions
Goal
For every Clojure function, I want to have a pre and post function that gets executed:
right before the function is evaluated and
right after the function returns
Now, I want to do this all functions in my *.clj files.
I would prefer (this is also an learning exercise) to do this at the Clojure Compiler level.
Question:
How do I get started on this? What part of the Clojure Compiler source code should I be reading? What documentation / tutorials on the internals of the Clojure Compiler I should be aware of?
Thanks!
First off, this sounds like a slightly crazy thing to do in general. There are almost certainly better ways to achieve any sensible objective (i.e. this is screaming "XY Problem"). But as long as you say it is just for a learning exercise, that is fine :-)
I can think of a couple of strategies you might want to consider before hacking the compiler:
Create your own defn macro that does the wrapping when functions are created. Obviously you'll need to make sure your own version of defn is used rather than the built-in one. Probably the simplest solution.
Walk your namespaces at runtime (after they are loaded) and redefine all functions to a wrapped version of the same function. Could get a bit messy but will certainly enhance your understanding of namespaces :-)
If you really want to hack the compiler, the easiest place to make this change would probably be just by hacking defn in core.clj
I'm working on some Clojure code that has some circular dependencies between different namespaces and I'm trying to work out the best way of resolving them.
Basic issue is that I get a "No such var: namespace/functionname" error in one of the files
I tried to "declare" the function but then it complains with: "Can't refer to a qualified var that doesn't exist"
I could of course refactor the entire codebase but that seems impractical to do every time you have a dependency to resolve..... and might get very ugly for certain networks of circular dependencies
I could separate out a bunch of interfaces / protocols / declarations into a separate file and have everything refer to that.... but that seems like it would end up getting messy and spoil the current nice modular structure that I have with related functionality grouped together
Any thoughts? What is the best way to handle this kind of circular dependency in Clojure?
I remember a number of discussions on namespaces in Clojure -- on the mailing list and elsewhere -- and I have to tell you that the consensus (and, AFAICT, the current orientation of Clojure's design) is that circular dependencies are a design's cry for refactoring. Workarounds might occasionally be possible, but ugly, possibly problematic for performance (if you make things needlessly "dynamic"), not guaranteed to work forever etc.
Now you say that the circular project structure is nice and modular. But, why would you call it that if everything depends on everything...? Also, "every time you have a dependency to resolve" shouldn't be very often if you plan for a tree-like dependency structure ahead of time. And to address your idea of putting some basic protocols and the like in their own namespace, I have to say that many a time I've wished that projects would do precisely that. I find it tremendously helpful to my ability to skim a codebase and get an idea of what kind of abstractions it's working with quickly.
To summarise, my vote goes to refactoring.
I had a similar problem with some gui code, what I ended up doing is,
(defn- frame [args]
((resolve 'project.gui/frame) args))
This allowed me to resolve the call during runtime, this gets called from a menu item in frame so I was 100% sure frame was defined because it was being called from the frame itself, keep in mind that resolve may return nil.
I am having this same problem constantly. As much as many developers don't want to admit it, it is a serious design flaw in the language. Circular dependencies are a normal condition of real objects. A body cannot survive without a heart, and the heart can't survive without the body.
Resolving at call time may be possible, but it won't be optimal. Take the case where you have an API, as part of that api is error reporting methods but the api creates an object that has its own methods, those objects will need the error reporting and you have your circular dependency. Error checking and reporting functions will be called often so resolving at the time they are called isn't an option.
The solution in this case, and most cases, is to move code that doesn't have dependencies into separate (util) namespaces where they can be freely shared. I have not yet run into a case where the problem cannot be resolved with this technique. This makes maintaining complete, functional, business objects nearly impossible but it seems to be the only option. Clojure has a long way to go before it is a mature language capable of accurately modeling the real world, until then dividing up code in illogical ways is the only way to eliminate these dependencies.
If A.a() depends on B.a() and B.b() relies on A.b() the only solution is to move B.a() to C.a() and/or A.b() into C.b() even though C technically doesn't exist in the real world.
Either move everything to one giant source file so that you have no external dependencies, or else refactor. Personally I'd go with refactor, but when you really get down to it, it's all about aesthetics. Some people like KLOCS and spaghetti code, so there's no accounting for taste.
It's good to think carefully about the design. Circular dependencies may be telling us that we're confused about something important.
Here's a trick I've used to work around circular dependencies in one or two cases.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; example/a.cljc
(ns example.a
(:require [example.b :as b]))
(defn foo []
(println "foo"))
#?(
:clj
(alter-var-root #'b/foo (constantly foo)) ; <- in clojure do this
:cljs
(set! b/foo foo) ; <- in clojurescript do this
)
(defn barfoo []
(b/bar)
(foo))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; example/b.cljc
(ns example.b)
;; Avoid circular dependency. This gets set by example.a
(defonce foo nil)
(defn bar []
(println "bar"))
(defn foobar []
(foo)
(bar))
I learned this trick from Dan Holmsand's code in Reagent.