Terminology for example of codata in Clojure - clojure

Imagine the following function to give an infinite lazy sequence of fibonacci in Clojure:
(def fib-seq
(concat
[0 1]
((fn rfib [a b]
(lazy-cons (+ a b) (rfib b (+ a b)))) 0 1)))
user> (take 20 fib-seq)
(0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181)
Assuming
We take the pithy definition of codata as being "Codata are types inhabited by values that may be infinite".
That this Clojure example doesn't use a static type system (from core.typed) and so any description of codata is a 'working definition'
My question is - What part of the function above is the 'codata'. Is it the anonymous function? Is it the lazy sequence?

Codata is the dual of data. You work with data via structural induction which means that data is always finite. You work with codata via coinduction which means codata is potentially infinite (but not always).
In any case, if you can't properly define a finite toString or equality then it'll be codata:
Can we define a toString for an infinite streams? No, we'd need an infinite string.
Can we always define extensional equality for two infinite streams? No, that'd take forever.
We can't do the above for streams because they're infinite. But even potentially infinite causes undecidability (i.e. we can't give a definite yes or no for equality or definitely give a string).
So an infinite stream is codata. I think your second question is more interesting, is the function codata?
Lispers say that code is data because features like S-expressions allow manipulating the program just like data. Clearly we have already have a string representation of Lisp (i.e. source code). We can also take a program and check if it's made up of equal S-expressions (i.e. compare the AST). Data!
But let's stop thinking about the symbols that represent our code and instead start thinking about the meaning of our programs. Take the following two functions:
(fn [a] (+ a a))
(fn [a] (* a 2))
They give the same results for all inputs. We shouldn't care that one uses * and the other uses +. It's not possible to compute if any two arbitrary functions are extensionally equal unless they only work on finite data (equality is then just comparing input-output tables). Numbers are infinite so that still doesn't solve our above example.
Now let's think about converting a function to a string. Let's say we had access to the source representation of functions at runtime.
(defn plus-two [a] (+ a 2))
(defn plus-four [a] (plus-two (plus-two a)))
(show-fn plus-four)
; "(plus-two (plus-two a))"
Now, referential transparency says we can take function calls and replace them with the function bodies, with the variables substituted and the program always gives the same result. Let's do that for plus-two:
(defn plus-four [a] (+ (+ a 2) 2))
(show-fn plus-four)
; "(+ (+ a 2) 2)"
Oh... The result is not the same. We broke referential transparency.
So we also can't define toString or equality for functions. It's because they're codata!
Here are some resources I found helpful for understanding codata better:
Data, Codata, and Their Implications for Equality, and Serialization
Codata and Comonads in Haskell
Data and Codata on A Neighborhood of Infinity
Some slides from University of Nottingham

My personal thought is the return value of the call to lazy-cons is the point at which the type of the thing in question could first be said to be infinate and thus that is the point that I see codata'nes starting.

Related

Newbie Problem Understanding Clojure Lazy Sequences

I've just started learning Clojure and I'm puzzled by how lazy sequences work. In particular, I don't understand why these 2 expressions produce different results in the repl:
;; infinite range works OK
(user=> (take 3 (map #(/(- % 5)) (range)))
(-1/5 -1/4 -1/3)
;; finite range causes error
user=> (take 3 (map #(/(- % 5)) (range 1000)))
Error printing return value (ArithmeticException) at clojure.lang.Numbers/divide (Numbers.java:188).
Divide by zero
I take the sequence of integers (0 1 2 3 ...) and apply a function that subtracts 5 and then takes the reciprocal. Obviously this causes a division-by-zero error if it's applied to 5. But since I'm only taking the first 3 values from a lazy sequence I wasn't expecting to see an exception.
The results are what I expected when I use all the integers, but I get an error if I use the first 1000 integers.
Why are the results different?
Clojure 1.1 introduced "chunked" sequences,
This can provide greater efficiency ... Consumption of chunked-seqs as
normal seqs should be completely transparent. However, note that some
sequence processing will occur up to 32 elements at a time. This could
matter to you if you are relying on full laziness to preclude the
generation of any non-consumed results. [Section 2.3 of "Changes to Clojure in Version 1.1"]
In your example (range) seems to be producing a seq that realizes one element at a time and (range 999) is producing a chunked seq. map will consume a chunked seq a chunk at a time, producing a chunked seq. So when take asks for the first element of the chunked seq, function passed to map is called 32 times on the values 0 through 31.
I believe it is wisest to code in such a way the code will still work for any seq producing function/arity if that function produces a chunked seq with an arbitrarily large chunk.
I do not know if one writes a seq producing function that is not chunked if one can rely in current and future versions of library functions like map and filter to not convert the seq into a chunked seq.
But, why the difference? What are the implementation details such that (range) and (range 999) are different in the sort of seq produced?
Range is implemented in clojure.core.
(range) is defined as (iterate inc' 0).
Ultimately iterate's functionality is provided by the Iterate class in Iterate.java.
(range end) is defined, when end is a long, as (clojure.lang.LongRange/create end)
The LongRange class lives in LongRange.java.
Looking at the two java files it can be seen that the LongRange class implements IChunkedSeq and the Iterator class does not. (Exercise left for the reader.)
Speculation
The implementation of clojure.lang.Iterator does not chunk because iterator can be given a function of arbitrary complexity and the efficiency from chunking can easily be overwhelmed by computing more values than needed.
The implementation of (range) relies on iterator instead of a custom optimized Java class that does chunking because the (range) case is not believed to be common enough to warrant optimization.

Clojure Core function argument positions seem rather confusing. What's the logic behind it?

For me as, a new Clojurian, some core functions seem rather counter-intuitive and confusing when it comes to arguments order/position, here's an example:
> (nthrest (range 10) 5)
=> (5 6 7 8 9)
> (take-last 5 (range 10))
=> (5 6 7 8 9)
Perhaps there is some rule/logic behind it that I don't see yet?
I refuse to believe that the Clojure core team made so many brilliant technical decisions and forgot about consistency in function naming/argument ordering.
Or should I just remember it as it is?
Thanks
Slightly offtopic:
rand&rand-int VS random-sample - another example where function naming seems inconsistent but that's a rather rarely used function so it's not a big deal.
There is an FAQ on Clojure.org for this question: https://clojure.org/guides/faq#arg_order
What are the rules of thumb for arg order in core functions?
Primary collection operands come first. That way one can write → and its ilk, and their position is independent of whether or not they have variable arity parameters. There is a tradition of this in OO languages and Common Lisp (slot-value, aref, elt).
One way to think about sequences is that they are read from the left, and fed from the right:
<- [1 2 3 4]
Most of the sequence functions consume and produce sequences. So one way to visualize that is as a chain:
map <- filter <- [1 2 3 4]
and one way to think about many of the seq functions is that they are parameterized in some way:
(map f) <- (filter pred) <- [1 2 3 4]
So, sequence functions take their source(s) last, and any other parameters before them, and partial allows for direct parameterization as above. There is a tradition of this in functional languages and Lisps.
Note that this is not the same as taking the primary operand last. Some sequence functions have more than one source (concat, interleave). When sequence functions are variadic, it is usually in their sources.
Adapted from comments by Rich Hickey.
Functions that work with seqs usually has the actual seq as last argument.
(map, filter, remote etc.)
Accessing and "changing" individual elements takes a collection as first element: conj, assoc, get, update
That way, you can use the (->>) macro with a collection consistenly,
as well as create transducers consistently.
Only rarely one has to resort to (as->) to change argument order. And if you have to do so, it might be an opportunity to check if your own functions follow that convention.
For some functions (especially functions that are "seq in, seq out"), the args are ordered so that one can use partial as follows:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(dotest
(let [dozen (range 12)
odds-1 (filterv odd? dozen)
filter-odd (partial filterv odd?)
odds-2 (filter-odd dozen) ]
(is= odds-1 odds-2
[1 3 5 7 9 11])))
For other functions, Clojure often follows the ordering of "biggest-first", or "most-important-first" (usually these have the same result). Thus, we see examples like:
(get <map> <key>)
(get <map> <key> <default-val>)
This also shows that any optional values must, by definition, be last (in order to use "rest" args). This is common in most languages (e.g. Java).
For the record, I really dislike using partial functions, since they have user-defined names (at best) or are used inline (more common). Consider this code:
(let [dozen (range 12)
odds (filterv odd? dozen)
evens-1 (mapv (partial + 1) odds)
evens-2 (mapv #(+ 1 %) odds)
add-1 (fn [arg] (+ 1 arg))
evens-3 (mapv add-1 odds)]
(is= evens-1 evens-2 evens-3
[2 4 6 8 10 12]))
Also
I personally find it really annoying trying to parse out code using partial as with evens-1, especially for the case of user-defined functions, or even standard functions that are not as simple as +.
This is especially so if the partial is used with 2 or more args.
For the 1-arg case, the function literal seen for evens-2 is much more readable to me.
If 2 or more args are present, please make a named function (either local, as shown for evens-3), or a regular (defn some-fn ...) global function.

The usage of lazy-sequences in clojure

I am wondering that lazy-seq returns a finite list or infinite list. There is an example,
(defn integers [n]
(cons n (lazy-seq (integers (inc n)))))
when I run like
(first integers 10)
or
(take 5 (integers 10))
the results are 10 and (10 11 12 13 14)
. However, when I run
(integers 10)
the process cannot print anything and cannot continue. Is there anyone who can tell me why and the usage of laza-seq. Thank you so much!
When you say that you are running
(integers 10)
what you're really doing is something like this:
user> (integers 10)
In other words, you're evaluating that form in a REPL (read-eval-print-loop).
The "read" step will convert from the string "(integers 10)" to the list (integers 10). Pretty straightforward.
The "eval" step will look up integers in the surrounding context, see that it is bound to a function, and evaluate that function with the parameter 10:
(cons 10 (lazy-seq (integers (inc 10))))
Since a lazy-seq isn't realized until it needs to be, simply evaluating this form will result in a clojure.lang.Cons object whose first element is 10 and whose rest element is a clojure.lang.LazySeq that hasn't been realized yet.
You can verify this with a simple def (no infinite hang):
user> (def my-integers (integers 10))
;=> #'user/my-integers
In the final "print" step, Clojure basically tries to convert the result of the form it just evaluated to a string, then print that string to the console. For a finite sequence, this is easy. It just keeps taking items from the sequence until there aren't any left, converts each item to a string, separates them by spaces, sticks some parentheses on the ends, and voilà:
user> (take 5 (integers 10))
;=> (10 11 12 13 14)
But as you've defined integers, there won't be a point at which there are no items left (well, at least until you get an integer overflow, but that could be remedied by using inc' instead of just inc). So Clojure is able to read and evaluate your input just fine, but it simply cannot print all the items of an infinite result.
When you try to print an unbounded lazy sequence, it will be completely realized, unless you limit *print-length*.
The lazy-seq macro never constructs a list, finite or infinite. It constructs a clojure.lang.LazySeq object. This is a nominal sequence that wraps a function of no arguments (commonly called a thunk) that evaluates to the actual sequence when called; but it isn't called until it has to be, and that's the purpose of the mechanism: to delay evaluating the actual sequence.
So you can pass endless sequences around as evaluated LazySeq objects, provided you never realise them. Your evaluation at the REPL invokes realisation, an endless process.
It's not returning anything because your integers function creates an infinite loop.
(defn integers [n]
(do (prn n)
(cons n (lazy-seq (integers (inc n))))))
Call it with (integers 10) and you'll see it counting forever.

Pattern matching functions in Clojure?

I have used erlang in the past and it has some really useful things like pattern matching functions or "function guards". Example from erlang docs is:
fact(N) when N>0 ->
N * fact(N-1);
fact(0) ->
1.
But this could be expanded to a much more complex example where the form of parameter and values inside it are matched.
Is there anything similar in clojure?
There is ongoing work towards doing this with unification in the core.match ( https://github.com/clojure/core.match ) library.
Depending on exactly what you want to do, another common way is to use defmulti/defmethod to dispatch on arbitrary functions. See http://clojuredocs.org/clojure_core/clojure.core/defmulti (at the bottom of that page is the factorial example)
I want to introduce defun, it's a macro to define functions with pattern matching just like erlang,it's based on core.match. The above fact function can be wrote into:
(use 'defun)
(defun fact
([0] 1)
([(n :guard #(> % 0))]
(* n (fact (dec n)))))
Another example, an accumulator from zero to positive number n:
(defun accum
([0 ret] ret)
([n ret] (recur (dec n) (+ n ret)))
([n] (recur n 0)))
More information please see https://github.com/killme2008/defun
core.match is a full-featured and extensible pattern matching library for Clojure. With a little macro magic and you can probably get a pretty close approximation to what you're looking for.
Also, if you want to take apart only simple structures like vectors and maps (any thing that is sequence or map, e.g. record, in fact), you could also use destructuring bind. This is the weaker form of pattern matching, but still is very useful. Despite it is described in let section there, it can be used in many contexts, including function definitions.

How do you return from a function early in Clojure?

Common Lisp has return-from; is there any sort of return in Clojure for when you want to return early from a function?
When you need to bail out of a computation early, you need a way to do that, not an argument from purists. Usually you need it when you're reducing a big collection and a certain value indicates that there's no point in further processing the collection. To that end, the ever-practical Clojure provides the reduced function.
A simple example to illustrate is that when multiplying a sequence of numbers, if you encounter a zero, you already know that the final result will be zero, so you don't need to look at the rest of the sequence. Here's how you code that with reduced:
(defn product [nums]
(reduce #(if (zero? %2)
(reduced 0.0)
(* %1 %2))
1.0
nums))
reduced wraps the value you give it in a sentinel data structure so that reduce knows to stop reading from the collection and simply return the reduced value right now. Hey, it's pure-functional, even!
You can see what's going on if you wrap the above if in a do with a (println %1 %2):
user=> (product [21.0 22.0 0.0 23.0 24.0 25.0 26.0])
1.0 21.0
21.0 22.0
462.0 0.0
0.0
user=> (product [21.0 22.0 23.0 24.0 25.0 26.0])
1.0 21.0
21.0 22.0
462.0 23.0
10626.0 24.0
255024.0 25.0
6375600.0 26.0
1.657656E8
There isn't any explicit return statement in clojure. You could hack something together using a catch/throw combination if you want to, but since clojure is much more functional than common lisp, the chances you actually need an early return right in the middle of some nested block is much smaller than in CL. The only 'good' reason I can see for return statements is when you're dealing with mutable objects in a way that's not idiomatic in clojure.
I wouldn't go as far as saying that it's never useful, but I think in Clojure, if your algorithm needs a return statement, it's a major code smell.
Unless you're writing some really funky code, the only reason you'd ever want to return early is if some condition is met. But since functions always return the result of the last form evaluated, if is already this function — just put the value that you want to return in the body of the if and it will return that value if the condition is met.
I'm no expert in Clojure, but it seems it does not have those construct to try to be more functional. Take a look at what Stuart Halloway says here:
Common Lisp also supports a return-from macro to "return" from the
middle of a function. This encourages an imperative style of
programming, which Clojure discourages.
However, you can solve the same problems in a different way. Here is
the return-from example, rewritten in a functional style so that no
return-from is needed:
(defn pair-with-product-greater-than [n]
(take 1 (for [i (range 10) j (range 10) :when (> (* i j) n)] [i j])))
That is, use lazy sequences and returning values based on conditions.
As an alternative you could use cond . And if on some conditions you would need to evaluate multiple expressions use do.
Here is an example:
(defn fact [x]
(cond
(< x 0) (do (println (str x " is negative number"))
(throw (IllegalArgumentException. "x should be 0 or higher")))
(<= x 1) 1
:else (* x (fact (- x 1)))))
The if option already given is probably the best choice, and note since maps are easy, you can always return {:error "blah"} in the error condition, and{result: x} in the valid condition.
There isn't a return statement in Clojure. Even if you choose not to execute some code using a flow construct such as if or when, the function will always return something, in these cases nil. The only way out is to throw an exception, but even then it will either bubble all the way up and kill your thread, or be caught by a function - which will return a value.
In short, no. If this is a real problem for you then you can get around it with
"the maybe monad" Monads have a high intellectual overhead so for many cases clojurians tend to avoid the "if failed return" style of programming.
It helps to break the function up into smaller functions to reduce the friction from this.