Clojure Protocols vs Types - clojure

Disclaimer
Despite the title, this is a genuine question, not an attempt at Emacs/Vi flamewars.
Context
I've used Haskell for a few months, and written a small ~10K LOC interpreter. In the past year, I've switched to Clojure. For quite a while, I struggled with Clojure's lack of types. Then I switched into using defrecords in Clojure, and now, switched to Clojure's defprotocols.
I really really like defprotocols. In fact, more than types.
I'm now at the point where for my Clojure functions, for it's documentation string, I just specify:
* the protocols of the inputs
* the protocols of the outputs
Using this, it appears I now have an ad-hoc type system (not compiler checked; but human checked).
Question
I suspect there's something about types that I'm missing. What does types provide over protocols?

Questioning the question...
Your question "What [do] types provide over protocols?" seems awkward to me. Types and protocols are perpendicular; They describe different things. Types/records define structure of data, while Protocols define the structure of some behavior or functionality. And part of why this question seems weird to me is that these things are not mutually exlusive! You can have types implement a protocol, thereby giving them whatever behaviour/functionality that protocol describes. In fact, since your context makes it clear that you have been using protocols, I have to wonder how you've been using them. My guess is that you've been using them with records (or possibly reifying them), but you could just as easily use protocols and (def)types together.
So to me, it seems you've compared apples with oranges here. To help clarify, let me compare apples to apples and oranges to oranges with a couple of different questions:
What problems do protocols solve, and what are the alternatives and their respective advantages/disadvantages?
Protocols let you define functions that operate in different ways on different types. The only other ways to do this are multimethods and simple function logic:
multimethods: have value in being extremely flexible. You can dispatch behavior on type by passing type as the dispatch function, but you can also use any other arbitrary function for dispatching.
internal function logic: You can also (of course) manually check for types in conditionals in your function definitions to decide how to process differently given different types. This is more primitive than multimethod dispatch, and also less extensible. Except in simple cases, multimethods are preferred.
Protocols have the advantage of being much more performant, being based on JVM class/method dispatch, which has been highly optimized. Additionally, protocols were designed to address the expression problem (great read), which makes them really powerful tools for crafting nice, modular, extensible APIs.
What are the advantages/disadvantages of (def)records or reify over (def)types?
On the side of how we specify the structure of data, we have a number of options available:
(def)records: produce a type good for "representing application domain information" (from http://clojure.org/datatypes; worth a read)
(def)types: produce a lighter weight type for creating "artifacts of the implementation/programming domain", such as the standard collection types
reify: construct a one-off object with an anonymous type implementing one or more protocols; good for... one-off things which need to implement a protocol(s)
Practically, records behave like clojure hash-maps, but have the added benefit of being able to implement protocols and have faster attribute lookup. Conveniently, the remain extensible via assoc, though attributes added in this fashion do not share the compiled lookup performance. This is what makes these constructs convenient for implementing applciation logic. Using deftype is advantageous for aspects of implementation/programming domain because they don't implement excess bagage, making the the use cleaner for these cases.

Protocols create interfaces and interfaces are a well, the interface to a type. they describe some aspects of a type though with much less rigor than you would come to expect in a language like Haskell.

machine checking
type inference (you don't get some of your protocols generated from docs of others)
parametric polymorphism (parameterised protocols / protocols with generics don't exist)
higher order protocols (what is the protocol for a function that returns a protocol?)
automatic generation of code / boilerplate
inter-operation with automated tools

Related

Thrift list as a mutable type

As taken from the thrift website's documentation, a thrift list is "an ordered list of elements. Translates to an STL vector, Java ArrayList, native arrays in scripting languages, etc." Why are these lists expressed as mutable types? Doesn't this promote slower object types that don't take advantage of native arrays? I don't understand why the default - only - translation of a list in thrift is to a mutable array type.
As taken from the thrift website's documentation, a thrift list is "an ordered list of elements. Translates to an STL vector, Java ArrayList, native arrays in scripting languages, etc." Why are these lists expressed as mutable types?
One has to keep in mind that the type system of Thrift is designed with portability across languages as the first and most important goal in mind. That's also the reason why there are only signed integers.
Furthermore, the IDL types should be considered as an abstract concept, which describes more the intent of a particular type rather than referring to a concrete syntax construction in, let's say, Java, C++ or Python. Which leads us to the second part of your question:
Doesn't this promote slower object types that don't take advantage of native arrays? I don't understand why the default - only - translation of a list in thrift is to a mutable array type.
The mapping from the IDL type down to a concrete implementation for a particular target language lies in the responsibility of the implementation of both Thrift compiler and Thrift library for the language in question. In other words, it is by no means set in concrete and can (at least theoretically) be changed at any time by changing those implementations.
But another goal is to not build too many roadblocks and potholes for the developer. Besides being fast and efficient, it should be easy to use the generated code and the library, it should be easy to do whatever your code wants to do with them. For example, C# classes are generated as partial classes to offer more flexibility. While immutable types may be more efficient in some cases, they may cause more work in other cases. At the end, this is kind of an balancing act.
If you can think of an improvement, the above is by no means intended to stop you from proposing quality patches. If your ideas can help making Thrift better, you are more than welcome and we will gladly review your patches or pull requests.

defmulti vs defprotocol?

It seems like both can be used to define functions that you can implement later, with different data types. AFAIK the major difference is that defmulti works on maps and defprotocol works on records.
What other differences there? What are the benefits of using one over the other?
Short version: defmulti is much more flexible and general, while defprotocol performs better.
Slightly longer version:
defprotocol supports dispatch on type, which is like polymorphism in most mainstream programming languages.
defmulti is a more general mechanism where you can dispatch on other things than just a single type. This flexibility comes with a performance penalty.
More on protocols
More on multimethods
Just an addition to cover the motivation, corvuscorax's answer covers the origional question nicely.
Originally Clojure only had multimethods and very early on a lot of thought went into building a dispatch abstraction that could handle all cases very well and would not force people to structure their abstractions around a limitation of the abstractions offered by the language.
As Clojure matured the desire to create "clojure in clojure" required abstractions that where at least in theory capable of producing any bytecode that could be produced by java and thus the need for protocols, a dispatch abstraction that more closely matched native Java.
Clojure has a strong "embrace your platform" ideal and protocols suit this mindset very well.

Type safety in Clojure

I want to ask what sort of type safety languages constructs are there on Clojure?
I've read 'Practical Clojure' from Luke VanderHart and Stuart Sierra several times now, but i still have the distinct impression that Clojure (like other lisps) don't take compilation-time validation checking very seriously. Type safety is just but one (very popular) strategy for doing compilation-time checking of correct semantics
I'm asking this question because i'm aching to be proven wrong; what sort of design patterns are there available on clojure to validate (at compilation-time, not at run-time) that a function that expects a string doesn't get called with, say, a list of integers?
Also, i've read very smart people like Paul Graham openly advocate about lisp allowing to implement everything from lower-level languages on top of it (most would say that the language themselves are being reimplemented on top of it), so if that assertion would be true, then trivially stuff like type checking should be a piece of cake. So do you feel that there exist type systems (or the ability to implement such type systems) in clojure or other lisps, that give the programmer the ability to offset validation checking from run-time to compile-time, or even better, design-time?
Compilation units in Clojure are very small - a single function. Lispers tend to change small portions of running programs while they develop. Introducing static type checking into this style of development is problematic - for a deeper discussion why I recommend the post Types are Anti-Modular by Gilad Bracha. Thus Clojure's prefers pre/post-conditions which jive better with Lisp's highly REPL-oriented development.
That said, it's certainly desirable and possible to build an a la carte type system for Clojure. This trail has been blazed by Qi/Shen, and Typed Racket. This functionality could be easily provided as a library. I'm hoping to build something like that in the future with core.logic - https://github.com/clojure/core.logic.
Since Clojure is a dynamic language the whole idea is not to check the types (or much of anything) at compile time.
Even when you add type hints to your function they do not get checked at compile-time.
Since Clojure is a Lisp you can do whatever you want at compile-time with macros and macros are powerful enough that you can write your own type systems. Some people have made type systems for lisps Typed Racket and Qi. These Type systems can be just as powerful as any Type system in a "normal" language.
Ok, we now know that it is possible but does Clojure has such a optional type system? The answer is currently no but there is a logic engine (core.logic) that could be used to implement a typesystem but the author has not worked (yet) in that direction.
There is a library that adds an optional type system to Clojure,
http://typedclojure.org/
Rationale
Static typing has well known benefits. For example, statically typed languages catch many common programming errors at the earliest time possible: compile time. Types also serve as an excellent form of (machine checkable) documentation that almost always augment existing hand-written documentation.
Languages without static type checking (dynamically typed) bring other benefits. Without the strict rigidity of mandatory static typing, they can provide more flexible and forgiving idioms that can help in rapid prototyping. Often the benefits of static type checking are desired as the program grows.
This work adds static type checking (and some of its benefits) to Clojure, a dynamically typed language, while still preserving idioms that characterise the language. It allows static and dynamically typed code to be mixed so the programmer can use whichever is more appropriate.

How does Clojure approach Separation of Concerns?

How does Clojure approach Separation of Concerns ? Since code is data, functions can be passed as parameters and used as returns...
And, since there is that principle "Better 1000 functions that work on 1 data structure, than 100 functions on 100 data structures" (or something like that).
I mean, pack everything a map, give it a keyword as key, and that's it ? functions, scalars, collections, everything...
The idea of Separation of Concerns is implemented, in Java, by means of Aspects (aspect oriented programming) and annotations. This is my view of the concept and might be somewhat limited, so don't take it for granted.
What is the right way (idiomatic way) to go about in Clojure, to avoid the WTFs of fellow programmers _
In a functional language, the best way to handle separation of concerns is to convert any programming problem into a set of transformations on a data structure. For instance, if you write a web app, the overall goal is to take a request and transform it into a response, which can be thought of as simply transforming the request data into response data. (In a non-trivial web app, the starting data would probably include not only the request, but also session and database information) Most programming tasks can be thought of in this way.
Each "concern" would be a function in a "pipeline" that helps make the transform possible. In this way, each function is completely decoupled from the other steps.
Note that this means that your data, as it undergoes these transformations, needs to be rich in its structure. Essentially, we want to put all the "intelligence" of our program into the data, not in the code. In a complicated functional program, the data at the different levels may be complex enough that in needs to look like a programming language in its own right- This is where the idea of "domain-specific languages" comes into play.
Clojure has excellent support for manipulating complex heterogenous data structures, which makes this less cumbersome than it may sound (i.e. it's not cumbersome at all if done right)
In addition, Clojure's support for lazy data structures allows these intermediate data structures to actually be (conceptually) infinite in size, which makes this decoupling possible in most scenarios. See the following paper for info on why having infinite data structures is so valuable in this situation: http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf
This "pipeline" approach can handle 90% of your needs for separating concerns. For the remaining 10% you can use Clojure macros, which, at a high level, can be thought of as a very powerful tool for aspect-oriented programming.
That's how I believe you can best decouple concerns in Clojure- Note that "objects" or "aspects" are not really necessary concepts in this approach.

Methods for side-effects in purely functional programming languages

At the moment I'm aware of the following methods to integrate side-effects into purely functional programming languages:
effect systems
continuations
unique types
monads
Monads are often cited to be the most effective and most general way to do this.
Which other methods exist? How do they compare?
Arrows, which are more general than monads.
The very simplest method is to simply pass around the environment between the functions. This is often used to teach scheme.
To me a more general way is via a monad/comonad pair. This generalizes the common "monad" approach which should correctly be called the "strong monad" approach, since it only works with strong monads.
Moving to a monad/comonad pair allows effects to be modeled that result in some variables no longer being available. An example where this is useful is the effect of migrating a thread to another host in a distributed setting.
An additional method of historical interest is to make the whole program a function mapping a stream/list of input events to a stream/list of output events. See: "How to Declare an Imperative" by Phil Wadler: http://www.cs.bell-labs.com/~wadler/topics/monads.html#monadsdeclare