How does one best integrate with clojure abstractions? - clojure

I am implementing an ordered set in clojure, where I retrieve elements based on their rank. This means that I can retrieve the 4th element (according to the set's ordering), the 3rd, or the 7th, all in logarithmic time.
In order to get my new data structure integrated with clojure's common methods (or "abstractions") such as conj, get, nth, etc., Which is the better way to do it:
Actually implement conj, for example, in my datatype's protocol, or
Implement Rich Hickey's clojure.lang.IPersistentSet or some interface like it.
The first seems easier, but also easier to mess up the semantics of the function. The second seems like I am implementing an interface that was never meant to be part of the public API, and the actual methods that are associated with that interface (protocol) are confusingly different. For example, it seems that in order to implement conj with my set, I must implement a cons method of clojure.lang.IPersistentSet, which has a different name. There seems to have little documentation on how this all works, which poses a large challenge in implementing this ranked set.
Which one should I choose? Should I implement my own or the methods of a clojure.lang interface? If I should do the latter, where is some good documentation that can guide me through the prosses?
EDIT: I want to make it clear that I am trying to make a set from which you can retrieve any element (or "remove" it) in logarithmic time by specifying the element's rank (e.g., "give me the 5th element, mr. set."). To my knowledge, no such set yet exists in clojure.

Firstly, I have just released a library called avl.clj which implements persistent sorted maps and sets with support for the standard Clojure API (they are drop-in replacements for the built-in sorted collections), as well as transients and logarithmic time rank queries (via clojure.core/nth)1. Both Clojure and ClojureScript are supported; performance on the Clojure side is mostly on a par with the built-in variants in my preliminary benchmarking. Follow the link above if you'd like to give it a try. Any experience reports would be greatly appreciated!
As for the actual question: I'm afraid there isn't much in the way of documentation on Clojure's internal interfaces, but still, implementing them is the only way of making one's custom data structures fit in with the built-ins. core.rrb-vector (which I have written and now maintain) takes this approach, as do other Contrib libraries implementing various data structures. This is also what I've done with avl.clj, as well as sorted.clj (which is basically the ClojureScript port of the red-black-tree-based sorted collections backported to Clojure). All of these libraries, as well as Clojure's own gvec.clj file which implements the primitive-storing vectors produced by clojure.core/vector-of, can serve as examples of what's involved. (Though I have to say it's easy to miss a method here and there...)
The situation is much simpler in ClojureScript, where all the core protocols are defined at the top of core.cljs, so you can just look at the list and pick the ones relevant to your data structure. Hopefully the same will be true on the Clojure side one day.
1 Removal by rank is (disj my-set (nth my-set 123)) for now. I might provide a direct implementation later on if it turns out to make enough of a difference performance-wise. (I'll definitely write one to check if it does.)

Related

How do lisps that prefer first and rest to car and cdr approach combinations like cdaddr?

One of the great schisms in the Lisp community is if we should have car and cdr or first and rest. One of the benefits of the traditional car and cdr is that we can combine them to produce pronoucible functions like cdaddr. How do Lisps that do not use car and cdr, such as Clojure, typically form combinations like this with first and rest? Is there any consensus?
Clojure, at any rate, simply has no need for caddaadr and friends, because nobody builds data structures out of just cons cells. The language does have combinations of any two of first and next, named ffirst, fnext, nnext, and nfirst, which were added very early on I suppose because it was assumed we'd want something like cadr, but I never see them used in real life. Instead destructuring is used quite often.
On the rare occasions where you need to reach deeply into a structure built of nested sequences, destructuring often still produces readable code but also writing it out longhand is no great burden. It's also a good hint to you that maybe you should abstract thing a bit more rather than working with so many layers of primitive combinators directly.

Is there an idiomatic alternative to nil-punning in Clojure?

I'm reading some Clojure code at the moment that has a bunch of uninitialised values as nil for a numeric value in a record that gets passed around.
Now lots of the Clojure libraries treat this as idiomatic. Which means that it is an accepted convention.
But it also leads to NullPointerException, because not all the Clojure core functions can handle a nil as input. (Nor should they).
Other languages have the concept of Maybe or Option to proxy the value in the event that it is null, as a way of mitigating the NullPointerException risk. This is possible in Clojure - but not very common.
You can do some tricks with fnil but it doesn't solve every problem.
Another alternative is simply to set the uninitialised value to a symbol like :empty-value to force the user to handle this scenario explicitly in all the handling code. But this isn't really a big step-up from nil - because you don't really discover all the scenarios (in other people's code) until run-time.
My question is: Is there an idiomatic alternative to nil-punning in Clojure?
Not sure if you've read this lispcast post on nil-punning, but I do think it makes a pretty good case for why it's idiomatic and covers various important considerations that I didn't see mentioned in those other SO questions.
Basically, nil is a first-class thing in clojure. Despite its inherent conventional meaning, it is a proper value, and can be treated as such in many contexts, and in a context-dependent way. This makes it more flexible and powerful than null in the host language.
For example, something like this won't even compile in java:
if(null) {
....
}
Where as in clojure, (if nil ...) will work just fine. So there are many situations where you can use nil safely. I'm yet to see a java codebase that isn't littered with code like if(foo != null) { ... everywhere. Perhaps java 8's Optional will change this.
I think where you can run into issues quite easily is in java interop scenarios where you are dealing with actual nulls. A good clojure wrapper library can also help shield you from this in many cases, and its one good reason to prefer one over direct java interop where possible.
In light of this, you may want to re-consider fighting this current. But since you are asking about alternatives, here's one I think is great: prismatic's schema. Schema has a Maybe schema (and many other useful ones as well), and it works quite nicely in many scenarios. The library is quite popular and I have used it with success. FWIW, it is recommended in the recent clojure applied book.
Is there an idiomatic alternative to nil-punning in Clojure?
No. As leeor explains, nil-punning is idiomatic. But it's not as prevalent as in Common Lisp, where (I'm told) an empty list equates to nil.
Clojure used to work this way, but the CL functions that deal with lists correspond to Clojure functions that deal with sequences in general. And these sequences may be lazy, so there is a premium on unifying lazy sequences with others, so that any laziness can be preserved. I think this evolution happened about Clojure 1.2. Rich described it in detail here.
If you want option/maybe types, take a look at the core.typed library. In contrast to Prismatic Schema, this operates at compile time.

Why would I ever choose not to use the clojure 1.5 reducers feature?

I was reading about clojure reducers introduced in 1.5, here: https://github.com/clojure/clojure/blob/master/changes.md. My understanding is that they're a performance enhancement on the existing map/filter/reduce function(s). So if that's the case, I'm wondering why they are in a new namespace, and do not simply replace the existing map/reduce/filter implementations. Stated differently, why would I ever not choose to use the new reducers feature?
EDIT:
In response to the inital two answers, here is a clarification:
I'm going to quote the release notes here:
Reducers provide a set of high performance functions for working with collections. The actual fold/reduce algorithms are specified via the collection being reduced. This allows each collection to define the most efficient way to reduce its contents.
This does not sound to me like the new map/filter/reduce functions are inherently parallel. For example, further down in the release notes it states:
It contains a new function, fold, which is a parallel reduce+combine
So unless the release note are poorly written, it would appear to me that there is one new function, fold, which is parallel, and the other functions are collection specific implementations that aim to produce the highest performance possible for the particular collection. Am I simply mis-reading the release notes here?
Foreword: you have problem and you are going to use parallelism, now problems two have you.
They're replacement in a sense they do that work in parallel (versus plain old sequential map and etc). Not all operations could be parallelized (in many cases operation has to be at least associative, also think about lazy sequences and iterators). Moreover, not every operation could be parallelized efficiently (there is always some coordination overhead, sometimes overhead is greater than parallelization gain).
They cannot replace the old implementations in some cases. For instance if you have infinite sequences or if you actually require sequential processing of the collection.
A couple of good reasons you might decide not to use reducers:
You need to maintain backwards compatibility with Clojure 1.4. This makes it tricky to use reducers in library code, for example, where you don't know what Clojure version your uses will be using
In some circumstances there are better options: for example if you are dealing with numerical arrays then you will almost certainly be better off using something like core.matrix instead.
I found the following write up by Rich Hickey that while still somewhat confusing, cleared (some) things up for me: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html
In particular the summary:
By adopting an alternative view of collections as reducible, rather than seqable things, we can get a complementary set of fundamental operations that tradeoff laziness for parallelism, while retaining the same high-level, functional programming model. Because the two models retain the same shape, we can easily choose whichever is appropriate for the task at hand.

Clojure Protocols vs Types

Disclaimer
Despite the title, this is a genuine question, not an attempt at Emacs/Vi flamewars.
Context
I've used Haskell for a few months, and written a small ~10K LOC interpreter. In the past year, I've switched to Clojure. For quite a while, I struggled with Clojure's lack of types. Then I switched into using defrecords in Clojure, and now, switched to Clojure's defprotocols.
I really really like defprotocols. In fact, more than types.
I'm now at the point where for my Clojure functions, for it's documentation string, I just specify:
* the protocols of the inputs
* the protocols of the outputs
Using this, it appears I now have an ad-hoc type system (not compiler checked; but human checked).
Question
I suspect there's something about types that I'm missing. What does types provide over protocols?
Questioning the question...
Your question "What [do] types provide over protocols?" seems awkward to me. Types and protocols are perpendicular; They describe different things. Types/records define structure of data, while Protocols define the structure of some behavior or functionality. And part of why this question seems weird to me is that these things are not mutually exlusive! You can have types implement a protocol, thereby giving them whatever behaviour/functionality that protocol describes. In fact, since your context makes it clear that you have been using protocols, I have to wonder how you've been using them. My guess is that you've been using them with records (or possibly reifying them), but you could just as easily use protocols and (def)types together.
So to me, it seems you've compared apples with oranges here. To help clarify, let me compare apples to apples and oranges to oranges with a couple of different questions:
What problems do protocols solve, and what are the alternatives and their respective advantages/disadvantages?
Protocols let you define functions that operate in different ways on different types. The only other ways to do this are multimethods and simple function logic:
multimethods: have value in being extremely flexible. You can dispatch behavior on type by passing type as the dispatch function, but you can also use any other arbitrary function for dispatching.
internal function logic: You can also (of course) manually check for types in conditionals in your function definitions to decide how to process differently given different types. This is more primitive than multimethod dispatch, and also less extensible. Except in simple cases, multimethods are preferred.
Protocols have the advantage of being much more performant, being based on JVM class/method dispatch, which has been highly optimized. Additionally, protocols were designed to address the expression problem (great read), which makes them really powerful tools for crafting nice, modular, extensible APIs.
What are the advantages/disadvantages of (def)records or reify over (def)types?
On the side of how we specify the structure of data, we have a number of options available:
(def)records: produce a type good for "representing application domain information" (from http://clojure.org/datatypes; worth a read)
(def)types: produce a lighter weight type for creating "artifacts of the implementation/programming domain", such as the standard collection types
reify: construct a one-off object with an anonymous type implementing one or more protocols; good for... one-off things which need to implement a protocol(s)
Practically, records behave like clojure hash-maps, but have the added benefit of being able to implement protocols and have faster attribute lookup. Conveniently, the remain extensible via assoc, though attributes added in this fashion do not share the compiled lookup performance. This is what makes these constructs convenient for implementing applciation logic. Using deftype is advantageous for aspects of implementation/programming domain because they don't implement excess bagage, making the the use cleaner for these cases.
Protocols create interfaces and interfaces are a well, the interface to a type. they describe some aspects of a type though with much less rigor than you would come to expect in a language like Haskell.
machine checking
type inference (you don't get some of your protocols generated from docs of others)
parametric polymorphism (parameterised protocols / protocols with generics don't exist)
higher order protocols (what is the protocol for a function that returns a protocol?)
automatic generation of code / boilerplate
inter-operation with automated tools

Graph Library API design

I am writing my own Graph library Graph++, I have question regarding what should interface return. For example What should my BFS return, I am confuse on the point that whether it should return set of vertices visited in the order , or should I have callback function, which will get invoked during each visit.
What could be the best option so that my library will easily consumable.
A recurring pattern in the stl is to offer iterators. Your traversal algorithms might return a start iterator, and the library user could increment it as desired, while comparing against an end() iterator that either it or the graph provides.
The visitor pattern may also be relevant to your interests.
I don’t want to be unhelpful or sound arrogant. This is just a personal opinion and you should take it for what it is worth. You did not say why you are writing this library, so I’ll assume you have a specific problem to solve. If you are doing it just for fun or to learn, please go ahead and disregard the reminder of this reply.
Graphs are extremely generic abstractions. Any data structure more complex than a tree is a graph. Most programs have such data structures. For example, a web site containing linked pages is a graph. So is a representation of a map. However, when you think of these as graphs, you ignore all differences between web sites and street maps and focus on the only thing that it is common.
In the vast majority of cases, the details you are trying to abstract away, the fact that web pages are HTML, links are URLs, streets have speed limits, intersections have traffic lights, and so on, are more important. If you start your implementation with a graph abstraction, by the time you implement these other details on top of it you’ve got yourself into quite a mess. It is much better to start with other, more important abstractions as building blocks and connect those together to form a graph. Sure, you won’t get the shortest path algorithm for free for your street map, for example, but you are likely interested in the fastest route anyway, for which you need speed limits, traffic lights, and other information.
I guess what I’m trying to say is that I see very limited uses for a generic graph library. It is nice that we have the Boost Graph Library, but AFAIK, not many people are using it.
In C++11, prefer functional approach to iterative. In C++03, use iterator strategies.