Functional Programming and Type Systems - clojure

I have been learning about various functional languages for some time now including Haskell, Scala and Clojure. Haskell has a very strict and well-defined static type system. Scala is also statically typed. Clojure on the other hand, is dynamically typed.
So my questions are
What role does the type system play in a functional language?
Is it necessary for a language to have a type system for it to be functional?
How is the "functional" level of a language related to the kind of the type system of the language?

A language does not need to be typed to be functional - at the heart of functional programming is the lambda calculus, which comes in untyped and typed variants.
The type system plays two roles:
it provides a guarantee at compile time that a class of errors cannot occur at run-time. The class of errors usually includes things like trying to add two strings together, or trying to apply an integer as a function.
it has some efficiency benefits, in that objects at runtime do not need to carry their types around, because the types have already been established at compile-time. This is known as type erasure.
In advanced type systems like Haskell's, the type system can provide more benefits:
overloading: using one identifier to refer to operations on different types
it allows a library to automatically choose an optimised implementation based on what type it is used at (using Type Families)
it allows powerful invariants to be proven at compile time, such as the invariant in a red-black tree (using Generalised Algebraic Datatypes)

What role does the type system play in a functional language?
To Simon Marlow's excellent answer, I would add that a type system, especially one that includes algebraic data types, makes it easier to write programs:
Software designs, which in object-oriented languages are sometimes expressed using UML diagrams, are very clearly expressed using types. This clarity manifests especially when not only values have types, but also modules have types, as in Objective Caml or Standard ML.
When a person is writing code, a couple of simple heuristics make it very, very easy to write pure functions based on the types:
A value of function type can always be created with a lambda.
A value of function type can always be consumed by applying it.
A value of an algebraic data type can be created by applying any of the type's constructors.
A value of an algebraic data type can be consumed by scrutinizing it with a case expression.
Based on these observations, and on the simple rule that unless there's a good reason, a function should consume each of its arguments, it's pretty easy to cut down the space of possible code you could write to a very small number of candidates. For example, there just aren't that many sensible functions of type (using Haskell notation)
forall a . (a -> Bool) -> [a] -> Bool
The art of using types to create code is called type-directed programming. When it works well, you hear functional programmers say things like "once we got the types right, the code practically wrote itself." Since the types are usually much smaller than the
code, this is a big win.

Same as in any programming language: it helps you to avoid/find errors in your code. In case of static typing a good type system prevents programs with certain types of errors from compiling.
No. The untyped lambda calculus is what you could call the prototype of functional programming languages and it is, as the name suggests, entirely untyped.
In a functional language (as well as any other language where a function can be used as a value) the type system needs to know what the type of a function is. Other than that there is nothing special about type systems for functional languages.
In a purely functional language you need to abstract side-effects, so you'd want the type system to somehow be able to support that. For example if you want to have a world type like in Clean, you'd want the type system to support uniqueness types to ensure proper usage.
If you want to have an IO monad like in haskell, you'd need an IO type (though a monad typeclass like in haskell is not required to have an IO monad, so you don't need a type system, which supports that).

1: Same as any other, it stops you from doing operations that are either ill-defined, or whose result would be 'nonsensical' to humans. Like float addition on integers.
2: Nope, the oldest programming language in the world, the (untyped) lambda calculus, is both functional and untyped.
3: Hardly, functional just means no side effects, no mutations, referential transparency et cetera.
Just remember that the oldest functional language, the untyped lambda calculus has no type system.

Related

How are c++ concepts different to Haskell typeclasses?

Concepts for C++ from the Concepts TS have been recently merged into GCC trunk. Concepts allow one to constrain generic code by requiring types to satisfy the conditions of the concept ('Comparable' for instance).
Haskell has type classes. I'm not so familiar with Haskell. How are concepts and type classes related?
Concepts (as defined by the Concepts TS) and type classes are related only in the sense that they restrict the sets of types that can be used with a generic function. Beyond that, I can only think of ways in which the two features differ.
I should note that I am not a Haskell expert. Far from it. However, I am an expert on the Concepts TS (I wrote it, and I implemented it for GCC).
Concepts (and constraints) are predicates that determine whether a type is a member of a set. You do not need to explicitly declare whether a type is a model of concept (an instance of a type class). That's determined by a set of requirements and checked by the compiler. In fact, concepts do not allow you to write "T is a model of C" at all, although this is readily supported using various metaprogramming techniques.
Concepts can be used to constrain non-type arguments, and because of constexpr functions and template metaprogramming, express pretty much any constraint you could ever hope to write (e.g., a hash array whose extent must be a prime number). I don't believe this is true for type classes.
Concepts are not part of the type system. They constrain the use of declarations and, in some cases template argument deduction. Type classes are part of the type system and participate in type checking.
Concepts do not support modular type checking or compilation. Template definitions are not checked against concepts, so you can still get late caught type errors during instantiation, but this does add a certain degree of flexibility for library writers (e.g., adding debugging code to an algorithm won't change the interface). Because type classes are part of the type system, generic algorithms can be checked and compiled modularly.
The Concepts TS supports the specialization of generic algorithms and data structures based based on the ordering of constraints. I am not at all an expert in Haskell, so I don't know if there is an equivalent here or not. I can't find one.
The use of concepts will never add runtime costs. The last time I looked, type classes could impose the same runtime overhead as a virtual function call, although I understand that Haskell is very good at optimizing those away.
I think that those are the major differences when comparing feature (Concepts TS) to feature (Haskell type classes).
But there's an underlying philosophical difference in two languages -- and it isn't functional vs. whatever flavor of C++ you're writing. Haskell wants to be modular: being so has many nice properties. C++ templates refuse to be modular: instantiation-time lookup allows for type-based optimization without runtime overhead. This is why C++ generic libraries offer both broad reuse and unparalleled performance.
You might be interested in the following research paper:
"A comparison of C++ concepts and Haskell type classes", Bernardy et al., WGP 2008. Pdf More details.
Update: as a short summary of the paper: the paper defines a precise mapping between terminology for C++ concepts and terminology for Haskell type classes and uses this mapping to provide a detailed feature comparison between the two.
Their conclusion says:
Out of our 27 criteria, summarised in table 2, 16 are equally supported in both languages, and only one or two are not portable. So, we can safely conclude as we started — C++ concepts and Haskell type classes are very similar.
As noted by T.C. below, it is worth pointing out that the paper is comparing C++0x concepts, not Concepts TS. I am not aware of a good reference describing the differences.

Unit testing a new language

Does anyone know if there's standardized process for unit testing a new language.
I mean any new language will have basic flow control like IF, CASE etc.
How does one normally test the language itself?
Unit testing is one strategy to achieve a goal: verify that a piece of software meets a stated specification. Let's assume you are more interested in the goal, instead of exclusively using unit testing to achieve it.
The question of verifying that a language meets a specification or exhibits specific desirable qualities is profound. The earliest work led to type theory, in which one usually extends the language with new syntax and rules to allow one to talk about well-typed programs: programs that obey these new rules.
Hand-in-hand with these extensions are mathematical proofs demonstrating that any well-typed program will exhibit various desired qualities. For example, perhaps a well-typed program will never attempt to perform integer arithmetic on a string, or try to access an out-of-bounds element of an array.
By requiring programs to be well-typed before allowing them to execute, one can effectively extend these guarantees from well-typed programs to the language itself.
Type systems can be classified by the kinds of rules they include, which in turn determine their expressive power. For example, most typed languages in common use can verify my first case above, but not the second. With the added power comes greater complexity: their type verification algorithms are correspondingly harder to write, reason about, etc.
If you want to learn more, I suggest you read this book, which will take you from the foundations of functional programming up through the common type system families.
You could lookup what other languages do for testing. When I was developing a language I was thinking about doing something like Python. They have tests written in python itself.
You could lookup their tests. These are some of then: grammar, types, exceptions and so on.
Offcourse, there is a lot of useful stuff there if you are looking for examples, so I recomend that you dig in :).

Thrift list as a mutable type

As taken from the thrift website's documentation, a thrift list is "an ordered list of elements. Translates to an STL vector, Java ArrayList, native arrays in scripting languages, etc." Why are these lists expressed as mutable types? Doesn't this promote slower object types that don't take advantage of native arrays? I don't understand why the default - only - translation of a list in thrift is to a mutable array type.
As taken from the thrift website's documentation, a thrift list is "an ordered list of elements. Translates to an STL vector, Java ArrayList, native arrays in scripting languages, etc." Why are these lists expressed as mutable types?
One has to keep in mind that the type system of Thrift is designed with portability across languages as the first and most important goal in mind. That's also the reason why there are only signed integers.
Furthermore, the IDL types should be considered as an abstract concept, which describes more the intent of a particular type rather than referring to a concrete syntax construction in, let's say, Java, C++ or Python. Which leads us to the second part of your question:
Doesn't this promote slower object types that don't take advantage of native arrays? I don't understand why the default - only - translation of a list in thrift is to a mutable array type.
The mapping from the IDL type down to a concrete implementation for a particular target language lies in the responsibility of the implementation of both Thrift compiler and Thrift library for the language in question. In other words, it is by no means set in concrete and can (at least theoretically) be changed at any time by changing those implementations.
But another goal is to not build too many roadblocks and potholes for the developer. Besides being fast and efficient, it should be easy to use the generated code and the library, it should be easy to do whatever your code wants to do with them. For example, C# classes are generated as partial classes to offer more flexibility. While immutable types may be more efficient in some cases, they may cause more work in other cases. At the end, this is kind of an balancing act.
If you can think of an improvement, the above is by no means intended to stop you from proposing quality patches. If your ideas can help making Thrift better, you are more than welcome and we will gladly review your patches or pull requests.

Clojure Protocols vs Types

Disclaimer
Despite the title, this is a genuine question, not an attempt at Emacs/Vi flamewars.
Context
I've used Haskell for a few months, and written a small ~10K LOC interpreter. In the past year, I've switched to Clojure. For quite a while, I struggled with Clojure's lack of types. Then I switched into using defrecords in Clojure, and now, switched to Clojure's defprotocols.
I really really like defprotocols. In fact, more than types.
I'm now at the point where for my Clojure functions, for it's documentation string, I just specify:
* the protocols of the inputs
* the protocols of the outputs
Using this, it appears I now have an ad-hoc type system (not compiler checked; but human checked).
Question
I suspect there's something about types that I'm missing. What does types provide over protocols?
Questioning the question...
Your question "What [do] types provide over protocols?" seems awkward to me. Types and protocols are perpendicular; They describe different things. Types/records define structure of data, while Protocols define the structure of some behavior or functionality. And part of why this question seems weird to me is that these things are not mutually exlusive! You can have types implement a protocol, thereby giving them whatever behaviour/functionality that protocol describes. In fact, since your context makes it clear that you have been using protocols, I have to wonder how you've been using them. My guess is that you've been using them with records (or possibly reifying them), but you could just as easily use protocols and (def)types together.
So to me, it seems you've compared apples with oranges here. To help clarify, let me compare apples to apples and oranges to oranges with a couple of different questions:
What problems do protocols solve, and what are the alternatives and their respective advantages/disadvantages?
Protocols let you define functions that operate in different ways on different types. The only other ways to do this are multimethods and simple function logic:
multimethods: have value in being extremely flexible. You can dispatch behavior on type by passing type as the dispatch function, but you can also use any other arbitrary function for dispatching.
internal function logic: You can also (of course) manually check for types in conditionals in your function definitions to decide how to process differently given different types. This is more primitive than multimethod dispatch, and also less extensible. Except in simple cases, multimethods are preferred.
Protocols have the advantage of being much more performant, being based on JVM class/method dispatch, which has been highly optimized. Additionally, protocols were designed to address the expression problem (great read), which makes them really powerful tools for crafting nice, modular, extensible APIs.
What are the advantages/disadvantages of (def)records or reify over (def)types?
On the side of how we specify the structure of data, we have a number of options available:
(def)records: produce a type good for "representing application domain information" (from http://clojure.org/datatypes; worth a read)
(def)types: produce a lighter weight type for creating "artifacts of the implementation/programming domain", such as the standard collection types
reify: construct a one-off object with an anonymous type implementing one or more protocols; good for... one-off things which need to implement a protocol(s)
Practically, records behave like clojure hash-maps, but have the added benefit of being able to implement protocols and have faster attribute lookup. Conveniently, the remain extensible via assoc, though attributes added in this fashion do not share the compiled lookup performance. This is what makes these constructs convenient for implementing applciation logic. Using deftype is advantageous for aspects of implementation/programming domain because they don't implement excess bagage, making the the use cleaner for these cases.
Protocols create interfaces and interfaces are a well, the interface to a type. they describe some aspects of a type though with much less rigor than you would come to expect in a language like Haskell.
machine checking
type inference (you don't get some of your protocols generated from docs of others)
parametric polymorphism (parameterised protocols / protocols with generics don't exist)
higher order protocols (what is the protocol for a function that returns a protocol?)
automatic generation of code / boilerplate
inter-operation with automated tools

Type safety in Clojure

I want to ask what sort of type safety languages constructs are there on Clojure?
I've read 'Practical Clojure' from Luke VanderHart and Stuart Sierra several times now, but i still have the distinct impression that Clojure (like other lisps) don't take compilation-time validation checking very seriously. Type safety is just but one (very popular) strategy for doing compilation-time checking of correct semantics
I'm asking this question because i'm aching to be proven wrong; what sort of design patterns are there available on clojure to validate (at compilation-time, not at run-time) that a function that expects a string doesn't get called with, say, a list of integers?
Also, i've read very smart people like Paul Graham openly advocate about lisp allowing to implement everything from lower-level languages on top of it (most would say that the language themselves are being reimplemented on top of it), so if that assertion would be true, then trivially stuff like type checking should be a piece of cake. So do you feel that there exist type systems (or the ability to implement such type systems) in clojure or other lisps, that give the programmer the ability to offset validation checking from run-time to compile-time, or even better, design-time?
Compilation units in Clojure are very small - a single function. Lispers tend to change small portions of running programs while they develop. Introducing static type checking into this style of development is problematic - for a deeper discussion why I recommend the post Types are Anti-Modular by Gilad Bracha. Thus Clojure's prefers pre/post-conditions which jive better with Lisp's highly REPL-oriented development.
That said, it's certainly desirable and possible to build an a la carte type system for Clojure. This trail has been blazed by Qi/Shen, and Typed Racket. This functionality could be easily provided as a library. I'm hoping to build something like that in the future with core.logic - https://github.com/clojure/core.logic.
Since Clojure is a dynamic language the whole idea is not to check the types (or much of anything) at compile time.
Even when you add type hints to your function they do not get checked at compile-time.
Since Clojure is a Lisp you can do whatever you want at compile-time with macros and macros are powerful enough that you can write your own type systems. Some people have made type systems for lisps Typed Racket and Qi. These Type systems can be just as powerful as any Type system in a "normal" language.
Ok, we now know that it is possible but does Clojure has such a optional type system? The answer is currently no but there is a logic engine (core.logic) that could be used to implement a typesystem but the author has not worked (yet) in that direction.
There is a library that adds an optional type system to Clojure,
http://typedclojure.org/
Rationale
Static typing has well known benefits. For example, statically typed languages catch many common programming errors at the earliest time possible: compile time. Types also serve as an excellent form of (machine checkable) documentation that almost always augment existing hand-written documentation.
Languages without static type checking (dynamically typed) bring other benefits. Without the strict rigidity of mandatory static typing, they can provide more flexible and forgiving idioms that can help in rapid prototyping. Often the benefits of static type checking are desired as the program grows.
This work adds static type checking (and some of its benefits) to Clojure, a dynamically typed language, while still preserving idioms that characterise the language. It allows static and dynamically typed code to be mixed so the programmer can use whichever is more appropriate.