foldl vs foldr: which should I prefer? - sml

I remember that when I showed some code that I wrote to my professor he remarked, offhand, that
It rarely matters, but it's worth noting that fold* is a little bit more efficient than fold*' in SML/NJ, so you should prefer it over fold* when possible.
I forget whether fold* was foldr or foldl. I know that this is one of those micro-optimization things that probably doesn't make a big difference in practice, but I'd like to be in the habit of using the more efficient one when I have the choice.
Which is which? My guess is that this is SML/NJ specific and that MLton will be smart enough to optimize both down to the same machine code, but answers for other compilers are good to know.

foldl is tail-recursive, while foldr is not. Although you can do foldr in a tail-recursive way by reversing the list (which is tail recursive), and then doing foldl.
This is only going to matter if you are folding over huge lists.

Prefer the one that converts the given input into the intended output.
If both produce the same output such as with a sum, and if dealing with a list, folding from the left will be more efficient because the fold can begin with head element, while folding from the right will first require walking the list to find the last element before calculating the first intermediate result.
With arrays and similar random access data structures, there's probably not going to be much difference.
A compiler optimization that always chose the better of left and right would require the compiler to determine that left and right were equivalent over all possible inputs. Since foldl and foldr take a functions as arguments, this is a bit of a tall order.

I'm going to keep the accepted answer here, but I had the chance to speak to my professor, and his reply was actually the opposite, because I forgot a part of my question. The code in question was building up a list, and he said:
Prefer foldr over foldl when possible, because it saves you a reverse at the end in cases where you're building up a list by appending elements during the fold.
As in, for a trivial example:
- val ls = [1, 2, 3];
val ls = [1,2,3] : int list
- val acc = (fn (x, xs) => x::xs);
val acc = fn : 'a * 'a list -> 'a list
- foldl acc [] ls;
val it = [3,2,1] : int list
- foldr acc [] ls;
val it = [1,2,3] : int list
The O(n) save of a reverse is probably more important than the other differences between foldl and foldr mentioned in answers to this question.

Related

Why can't I zip two lists of different lengths?

In F# if one tries to zip two lists of different lengths one gets an error:
List.zip [1..4] [1..3]
// System.ArgumentException: The lists had different lengths.
However, it is very easy to define an alternative definition of zip that accepts two argument lists of different lengths:
let rec zip' (xs: 'T list) (ys: 'T list) =
match (xs, ys) with
| ([], _) -> []
| (_, []) -> []
| ((z::zs), (w::ws)) -> (z, w) :: zip' zs ws
zip' [1..4] [1..3]
// val it : (int * int) list = [(1, 1); (2, 2); (3, 3)]
Is there a good reason not to use this alternative definition? Why wasn't it adopted in the first place?
This is, indeed, a bit confusing because there is a mismatch between List.zip (which does not allow this) and Seq.zip (which truncates the longer list).
I think that zip which works only on lists of equal length is a reasonable default behaviour - if it automatically truncated data, there is a realistic chance that you would accidentally lose some useful data when using zip which might cause subtle bugs.
The fact that Seq.zip truncates the longer list is only sensible because sequences are lazy and so, by design, when I define a sequence I expect that the consumer might not read it all.
In summary, I think the behaviour difference is based on "what is the most sensible thing to do for a given data structure", but I do think having two names for the operations would make a lot more sense than calling both zip (alas, that's pretty much impossible to change now).

F#: recursive function to find bigger value in non empty list

I need a recursive function in F# that gives me the biggest value of a non empty list.
example:
biggest [2;4;5;3;9;3]
should return 9
Update 1
I'm learning recursive functions and this is an exercise from the book with no answer on it. I thought it was ok to ask here but it seems it was not a good idea. Ok, I didn't write any code example so that it seemed to be a homework exercise of a lazy guy. Anyway this is my best try:
let rec highest l =
match l with
|[] -> 0
|x::y::xs -> if x > y then highest x::xs
else highest y::xs
But this doesn't work. I cannot use F# functions, this is for learning purpose of course. So sorry if made you loose some time and thanks for your help.
Before the answer: this question is weird and Stackoverflow is probably not the best place for it.
If it's for production code, use List.max. (Puns aside, recursion isn't its own reward...)
If it's for homework, try to understand recursion instead of delegating your exercises to random people on the internet.
If it's a puzzle/code golf, this is the wrong site and it could be clearer what the requirements are.
Anyway, this can be answered as posted, with the following requirements:
The solution is tail-recursive, not just recursive. Obviously I don't want to write a function to replace List.max just to needlessly grow the stack.
The function biggest that is called in the question's code is directly the recursive one and gets no additional arguments. If I take the question literally, this seems to be a requirement, so I'm not allowed to use an accumulator.
List.max is implemented with a mutating loop and therefore doesn't qualify (link goes to F# source code). So this needs a custom implementation:
let rec biggest = function
| h1 :: h2 :: t -> biggest ((max h1 h2) :: t)
| [result] -> result
| [] -> failwith "list empty"
It's a pretty weird solution, but it does what's asked for and works for long lists.
After some days thinking on it and with some help at school I came up with this, it's an alternative solution to Vandroiy's one:
let rec max_value l =
match l with
|[] -> []
|[x] -> [x]
|(x::y::xs) -> if x<y then max_value (y::xs)
else max_value (x::xs)
thanks a lot

Pack consecutive duplicates of list elements into sublists in Ocaml

I found this problem in the website 99 problems in ocaml. After some thinking I solved it by breaking the problem into a few smaller subproblems. Here is my code:
let rec frequency x l=
match l with
|[]-> 0
|h::t-> if x=[h] then 1+(frequency x t)
else frequency x t
;;
let rec expand x n=
match n with
|0->[]
|1-> x
|_-> (expand x (n-1)) # x
;;
let rec deduct a b=
match b with
|[]-> []
|h::t -> if a=[h] then (deduct a t)
else [h]# (deduct a t)
;;
let rec pack l=
match l with
|[]-> []
|h::t -> [(expand [h] (frequency [h] l))]# (pack (deduct [h] t))
;;
It is rather clear that this implementation is overkill, as I have to count the frequency of every element in the list, expand this and remove the identical elements from the list, then repeat the procedure. The algorithm complexity is about O(N*(N+N+N))=O(N^2) and would not work with large lists, even though it achieved the required purpose. I tried to read the official solution on the website, which says:
# let pack list =
let rec aux current acc = function
| [] -> [] (* Can only be reached if original list is empty *)
| [x] -> (x :: current) :: acc
| a :: (b :: _ as t) ->
if a = b then aux (a :: current) acc t
else aux [] ((a :: current) :: acc) t in
List.rev (aux [] [] list);;
val pack : 'a list -> 'a list list = <fun>
the code should be better as it is more concise and does the same thing. But I am confused with the use of "aux current acc" in the inside. It seems to me that the author has created a new function inside of the "pack" function and after some elaborate procedure was able to get the desired result using List.rev which reverses the list. What I do not understand is:
1) What is the point of using this, which makes the code very hard to read on first sight?
2) What is the benefit of using an accumulator and an auxiliary function inside of another function which takes 3 inputs? Did the author implicitly used tail recursion or something?
3) Is there anyway to modify the program so that it can pack all duplicates like my program?
These are questions mostly of opinion rather than fact.
1) Your code is far harder to understand, in my opinion.
2a) It's very common to use auxiliary functions in OCaml and other functional languages. You should think of it more like nested curly braces in a C-like language rather than as something strange.
2b) Yes, the code is using tail recursion, which yours doesn't. You might try giving your code a list of (say) 200,000 distinct elements. Then try the same with the official solution. You might try determining the longest list of distinct values your code can handle, then try timing the two different implementations for that length.
2c) In order to write a tail-recursive function, it's sometimes necessary to reverse the result at the end. This just adds a linear cost, which is often not enough to notice.
3) I suspect your code doesn't solve the problem as given. If you're only supposed to compress adjacent elements, your code doesn't do this. If you wanted to do what your code does with the official solution you could sort the list beforehand. Or you could use a map or hashtable to keep counts.
Generally speaking, the official solution is far better than yours in many ways. Again, you're asking for an opinion and this is mine.
Update
The official solution uses an auxiliary function named aux that takes three parameters: the currently accumulated sublist (some number of repetitions of the same value), the currently accumulated result (in reverse order), and the remaining input to be processed.
The invariant is that all the values in the first parameter (named current) are the same as the head value of the unprocessed list. Initially this is true because current is empty.
The function looks at the first two elements of the unprocessed list. If they're the same, it adds the first of them to the beginning of current and continues with the tail of the list (all but the first). If they're different, it wants to start accumulating a different value in current. It does this by adding current (with the one extra value added to the front) to the accumulated result, then continuing to process the tail with an empty value for current. Note that both of these maintain the invariant.

Haskell List complexity

sorry if this question has already been asked, I didn't find it. And sorry for my poor english.
I'm learning Haskell and try to use lists.
I wrote a function which transforms a list following a specific pattern, I can't check if it works now, but i think so.
This function is not a tail call function, so I think it will be horrible to compute this function with a big list:
transform :: [Int] -> [Int]
transform list = case list of
(1:0:1:[]) -> [1,1,1,1]
(1:1:[]) -> [1,0,1]
[1] -> [1,1]
(1:0:1:0:s) -> 1:1:1:1: (transform s)
(1:1:0:s) -> 1:0:1: (transform s)
(1:0:s) -> 1:1: (transform s)
(0:s) -> 0: (transform s)
So I thought about another function, which would be "better":
transform = reverse . aux []
where
aux buff (1:0:[1]) = (1:1:1:1:buff)
aux buff (1:[1]) = (1:0:1:buff)
aux buff [1] = (1:1:buff)
aux buff (1:0:1:0:s) = aux (1:1:1:1:buff) s
aux buff (1:1:0:s) = aux (1:0:1:buff) s
aux buff (1:0:s) = aux (1:1:buff) s
aux buff (0:s) = aux (0:buff) s
The problem is that I don't know how it compiles and if I'm wrong with the second function. Can you explain me how lists work ? Is it better to use (++) or reverse the list at the end ?
Thank you in advance for your answers
The first function is perfectly fine and in my opinion preferable to the second one.
The reason is laziness. If you have a transform someList expression in your code, the resulting list will not be evaluated unless you demand it. In particular the list will be evaluated only as far as it is needed; print $ take 10 $ transform xs will do less work than print $ take 20 $ transform xs.
In a strict language transform would indeed encumber the stack, since it would have to evaluate the whole list (in a non-tail recursive way) before returning anything of use. In Haskell transform (0:xs) evaluates to 0 : transform xs, a usable partial result. We can inspect the head of this result without touching the tail. There is no danger of stack overflow either: at any time there is at most a single unevaluated thunk (like transform xs in the previous example) in the tail of the list. If you demand more elements, the thunk will be just pushed further back, and the stack frame of the previous thunk can be garbage collected.
If we always fully evaluate the list then the performance of the two functions should be similar, or even then the lazy version could be somewhat faster because of the lack of reversing or the extra ++-s. So, by switching to the second function we lose laziness and gain no extra performance.
Your first version looks much better to me1. It's fine that it's not tail-recursive: you don't want it to be tail-recursive, you want it to be lazy. The second version can't produce even a single element without processing the entire input list, because in order to reverse the result of aux, the entirety of aux must be known. However,
take 10 . transform $ cycle [1,0,0,1,1,1]
would work fine with your first definition of transform, because you only consume as much of the list as you need in order to make a decision.
1 But note that (1:0:1:[]) is just [1,0,1].

Can you recognize an infinite list in a Haskell program? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to tell if a list is infinite?
In Haskell, you can define an infinite list, for example [1..]. Is there a built-in function in Haskell to recognize whether a list has finite length? I don't imagine it is possible to write a user-supplied function to do this, but the internal representation of lists by Haskell may be able to support it. If not in standard Haskell, is there an extension providing such a feature?
No, this is not possible. It would be impossible to write such a function, because you can have lists whose finiteness might be unknown: consider a recursive loop generating a list of all the twin primes it can find. Or, to follow up on what Daniel Pratt mentioned in the comments, you could have a list of all the steps a universal Turing machine takes during its execution, ending the list when the machine halts. Then, you could simply check whether such a list is infinite, and solve the Halting problem!
The only question an implementation could answer is whether a list is cyclic: if one of its tail pointers points back to a previous cell of the list. However, this is implementation-specific (Haskell doesn't specify anything about how implementations must represent values), impure (different ways of writing the same list would give different answers), and even dependent on things like whether the list you pass in to such a function has been evaluated yet. Even then, it still wouldn't be able to distinguish finite lists from infinite lists in the general case!
(I mention this because, in many languages (such as members of the Lisp family), cyclic lists are the only kind of infinite lists; there's no way to express something like "a list of all integers". So, in those languages, you can check whether a list is finite or not.)
There isn't any way to test for finiteness of lists other than iterating over the list to search for the final [] in any implementation I'm aware of. And in general, it is impossible to tell whether a list is finite or infinite without actually going to look for the end (which of course means that every time you get an answer, that says finite).
You could write a wrapper type around list which keeps track of infiniteness, and limit yourself to "decidable" operations only (somehow similar to NonEmpty, which avoids empty lists):
import Control.Applicative
data List a = List (Maybe Int) [a]
infiniteList (List Nothing _) = true
infiniteList _ = false
emptyList = List (Just 0) []
singletonList x = List (Just 1) [x]
cycleList xs = List Nothing (cycle xs)
numbersFromList n = List Nothing [n..]
appendList (List sx xs) (List sy ys) = List ((+) <$> sx <*> sy) (xs ++ ys)
tailList (List s xs) = List (fmap pred s) (tail xs)
...
As ehird wrote, your only hope is in finding out whether a list is cyclic. A way of doing so is to use an extension to Haskell called "observable sharing". See for instance: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.4053
When talking about "internal representation of lists", from standpoint of Haskell implementation, there are no infinite lists. The "list" you ask about is actually a description of computational process, not a data object. No data object is infinite inside a computer. Such a thing simply does not exist.
As others have told you, internal list data might be cyclical, and implementation usually would be able to detect this, having a concept of pointer equality. But Haskell itself has no such concept.
Here's a Common Lisp function to detect the cyclicity of a list. cdr advances along a list by one notch, and cddr - by two. eq is a pointer equality predicate.
(defun is-cyclical (p)
(labels ((go (p q)
(if (not (null q))
(if (eq p q) t
(go (cdr p) (cddr q))))))
(go p (cdr p))))