deftype vs. defrecord - clojure

While defrecord is the preferred form -for the general case- in Clojure for defining an "entity", in ClojureScript one can find far more references to deftype, as reflected in various documentation.
What is the difference between both forms? Which should one prefer?

deftype creates a bare-bones object which implements a protocol.
defrecord creates an immutable persistent map which implements a protocol.
Which to use depends on what you want. Do you want a full ClojureScript data structure? Then use a record. Do you just want a bare-bones thing that does nothing but satisfy a protocol? Then use a type.
The two bits of documentation you reference use types because they're trying to illustrate protocols at the most basic level, and types have less "going on" than records, so to speak.
However, most real-world uses of object-like things in Clojure/ClojureScript need to store fields of data along with the object, and for that you should emphatically use a record, for the same reason you should use any of Clojure's immutable collections.

According to DEFTYPE VS DEFRECORD,
you should distinguish programming constructs and domain constructs.
deftype is for programming constructs and defrecord is for domain constructs that need a custom type.
Hope this helps.

Related

How does one best integrate with clojure abstractions?

I am implementing an ordered set in clojure, where I retrieve elements based on their rank. This means that I can retrieve the 4th element (according to the set's ordering), the 3rd, or the 7th, all in logarithmic time.
In order to get my new data structure integrated with clojure's common methods (or "abstractions") such as conj, get, nth, etc., Which is the better way to do it:
Actually implement conj, for example, in my datatype's protocol, or
Implement Rich Hickey's clojure.lang.IPersistentSet or some interface like it.
The first seems easier, but also easier to mess up the semantics of the function. The second seems like I am implementing an interface that was never meant to be part of the public API, and the actual methods that are associated with that interface (protocol) are confusingly different. For example, it seems that in order to implement conj with my set, I must implement a cons method of clojure.lang.IPersistentSet, which has a different name. There seems to have little documentation on how this all works, which poses a large challenge in implementing this ranked set.
Which one should I choose? Should I implement my own or the methods of a clojure.lang interface? If I should do the latter, where is some good documentation that can guide me through the prosses?
EDIT: I want to make it clear that I am trying to make a set from which you can retrieve any element (or "remove" it) in logarithmic time by specifying the element's rank (e.g., "give me the 5th element, mr. set."). To my knowledge, no such set yet exists in clojure.
Firstly, I have just released a library called avl.clj which implements persistent sorted maps and sets with support for the standard Clojure API (they are drop-in replacements for the built-in sorted collections), as well as transients and logarithmic time rank queries (via clojure.core/nth)1. Both Clojure and ClojureScript are supported; performance on the Clojure side is mostly on a par with the built-in variants in my preliminary benchmarking. Follow the link above if you'd like to give it a try. Any experience reports would be greatly appreciated!
As for the actual question: I'm afraid there isn't much in the way of documentation on Clojure's internal interfaces, but still, implementing them is the only way of making one's custom data structures fit in with the built-ins. core.rrb-vector (which I have written and now maintain) takes this approach, as do other Contrib libraries implementing various data structures. This is also what I've done with avl.clj, as well as sorted.clj (which is basically the ClojureScript port of the red-black-tree-based sorted collections backported to Clojure). All of these libraries, as well as Clojure's own gvec.clj file which implements the primitive-storing vectors produced by clojure.core/vector-of, can serve as examples of what's involved. (Though I have to say it's easy to miss a method here and there...)
The situation is much simpler in ClojureScript, where all the core protocols are defined at the top of core.cljs, so you can just look at the list and pick the ones relevant to your data structure. Hopefully the same will be true on the Clojure side one day.
1 Removal by rank is (disj my-set (nth my-set 123)) for now. I might provide a direct implementation later on if it turns out to make enough of a difference performance-wise. (I'll definitely write one to check if it does.)

Is there anything similar like ML's datatype declaration and pattern matching in clojure?

I'm new to both sml (1 month) and clojure(1 week). I learned datatype and pattern matching in sml weeks ago and want to know if there are anything similar in clojure.
There seems to be several pattern matching library out there. Do they have the full power of ML's pattern matching?
How about datatype? Do I have to use something like deftype to create my own datatype? If so, how do I do it? deftype looks pretty complex to me.
If people don't use datatype in lisp world, then what is the idiomatic way to do pattern matching with datatype in clojure?
In the Clojure world you have a few "a la carte" options for polymorphism that you can use:
You would normally use protocols if you want to define efficiently dispatched functions that work polymorphically with different data types. Different data types can mean Java classes or Clojure types defined with deftype or defrecord.
core.match is a pretty good general purpose pattern matching library
Multimethods provide general purpose polymorphic dispatch, that can dispatch/match on any function of their parameters. Slightly slower than protocols, but very flexible.
As for defining your own data types:
Don't underestimate doing things with pure data (stored in regular maps, lists, vectors). In most cases, this is the easiest and most flexible approach in Clojure.
If you decide that isn't enough, I'd suggest defrecord rather than deftype in most instances: defrecord creates something that behaves like a hashmap, so it is quite flexible and user friendly. deftype is more of a low-level construct for people writing libraries and compilers etc.

Clojure Protocols vs Types

Disclaimer
Despite the title, this is a genuine question, not an attempt at Emacs/Vi flamewars.
Context
I've used Haskell for a few months, and written a small ~10K LOC interpreter. In the past year, I've switched to Clojure. For quite a while, I struggled with Clojure's lack of types. Then I switched into using defrecords in Clojure, and now, switched to Clojure's defprotocols.
I really really like defprotocols. In fact, more than types.
I'm now at the point where for my Clojure functions, for it's documentation string, I just specify:
* the protocols of the inputs
* the protocols of the outputs
Using this, it appears I now have an ad-hoc type system (not compiler checked; but human checked).
Question
I suspect there's something about types that I'm missing. What does types provide over protocols?
Questioning the question...
Your question "What [do] types provide over protocols?" seems awkward to me. Types and protocols are perpendicular; They describe different things. Types/records define structure of data, while Protocols define the structure of some behavior or functionality. And part of why this question seems weird to me is that these things are not mutually exlusive! You can have types implement a protocol, thereby giving them whatever behaviour/functionality that protocol describes. In fact, since your context makes it clear that you have been using protocols, I have to wonder how you've been using them. My guess is that you've been using them with records (or possibly reifying them), but you could just as easily use protocols and (def)types together.
So to me, it seems you've compared apples with oranges here. To help clarify, let me compare apples to apples and oranges to oranges with a couple of different questions:
What problems do protocols solve, and what are the alternatives and their respective advantages/disadvantages?
Protocols let you define functions that operate in different ways on different types. The only other ways to do this are multimethods and simple function logic:
multimethods: have value in being extremely flexible. You can dispatch behavior on type by passing type as the dispatch function, but you can also use any other arbitrary function for dispatching.
internal function logic: You can also (of course) manually check for types in conditionals in your function definitions to decide how to process differently given different types. This is more primitive than multimethod dispatch, and also less extensible. Except in simple cases, multimethods are preferred.
Protocols have the advantage of being much more performant, being based on JVM class/method dispatch, which has been highly optimized. Additionally, protocols were designed to address the expression problem (great read), which makes them really powerful tools for crafting nice, modular, extensible APIs.
What are the advantages/disadvantages of (def)records or reify over (def)types?
On the side of how we specify the structure of data, we have a number of options available:
(def)records: produce a type good for "representing application domain information" (from http://clojure.org/datatypes; worth a read)
(def)types: produce a lighter weight type for creating "artifacts of the implementation/programming domain", such as the standard collection types
reify: construct a one-off object with an anonymous type implementing one or more protocols; good for... one-off things which need to implement a protocol(s)
Practically, records behave like clojure hash-maps, but have the added benefit of being able to implement protocols and have faster attribute lookup. Conveniently, the remain extensible via assoc, though attributes added in this fashion do not share the compiled lookup performance. This is what makes these constructs convenient for implementing applciation logic. Using deftype is advantageous for aspects of implementation/programming domain because they don't implement excess bagage, making the the use cleaner for these cases.
Protocols create interfaces and interfaces are a well, the interface to a type. they describe some aspects of a type though with much less rigor than you would come to expect in a language like Haskell.
machine checking
type inference (you don't get some of your protocols generated from docs of others)
parametric polymorphism (parameterised protocols / protocols with generics don't exist)
higher order protocols (what is the protocol for a function that returns a protocol?)
automatic generation of code / boilerplate
inter-operation with automated tools

Difference between definterface and defprotocol in Clojure

Other than lack of documentation, what is the difference between definterface and defprotocol in Clojure?
According to the Joy of Clojure:
The advantages of using definterface over defprotocol are restricted
entirely to the fact that the former allows primitive types for
arguments and returns. At some point in the future, the same advantage
will likely be extended to the interfaces generated [by protocols], so use
definterface sparingly and prefer protocols unless absolutely
necessary.
My possibly incomplete understanding was definterface produces an interface .class that java code can implement in order to create classes suitable to pass to your Clojure functions.
Protocols are, in short, a faster and more focused way of doing dispatch than multimethods. you actually have running code in a protocol that is used by other clojure code.

Methods for side-effects in purely functional programming languages

At the moment I'm aware of the following methods to integrate side-effects into purely functional programming languages:
effect systems
continuations
unique types
monads
Monads are often cited to be the most effective and most general way to do this.
Which other methods exist? How do they compare?
Arrows, which are more general than monads.
The very simplest method is to simply pass around the environment between the functions. This is often used to teach scheme.
To me a more general way is via a monad/comonad pair. This generalizes the common "monad" approach which should correctly be called the "strong monad" approach, since it only works with strong monads.
Moving to a monad/comonad pair allows effects to be modeled that result in some variables no longer being available. An example where this is useful is the effect of migrating a thread to another host in a distributed setting.
An additional method of historical interest is to make the whole program a function mapping a stream/list of input events to a stream/list of output events. See: "How to Declare an Imperative" by Phil Wadler: http://www.cs.bell-labs.com/~wadler/topics/monads.html#monadsdeclare