List.fold_left implementation vs List.fold_right implementation - ocaml

I am new to OCaml, and I have seen from other posts that fold_left in List is tail recursive and works better on larger lists, whereas fold_right is not tail recursive.
My question is why fold_left only works better on larger lists, how is it implemented that makes it not work better on smaller lists.

Being tail-recursive allows to avoid a lot of memory allocation. That optimization will be directly proportional to the length of the list.
On a small list, there will be a gain, but it's not likely to be noticeable until you start using big lists.
As a rule of thumb, you should use fold_left unless you are working on a small list and the fold_right version corresponds more to what you're trying to write.

The fold_left function is indeed tail-recursive, however, it works fine on both small and large lists. There is no gain in using fold_right instead of fold_left on small lists. The fold_left function is always faster than fold_right, and the rumors that you heard are not about fold_left vs fold_right, but rather about a tail-recursive version of fold_right vs a non-tail-recursive version of fold_right. But let me first of all highlight the difference between right and left folds.
The left fold takes a list of elements
a b c d ... z
and a function f, and produces a value
(f (f (f (f a b) c) d) ... z)
It is easier to understand, if we imagine that f is some operator, e.g., an addition, and use the infix notation a + b, instead of the prefix notation (add a b), so the left fold will reduce a sequence to a sum as follows
((((a + b) + c) + d) + ... + z)
So, we can see that the left fold associates parenthesis to the left. This is its only difference from the right fold, which actually associates parenthesis to the right, so if we will take the same sequence and apply the same function to it using the right fold, we will have the following computation
(a + (b + ... (x + (y + z))))
In the case of the addition operation, the result will be the same for both left and right folds. However, the right fold implementation will be less efficient. The reason for that is that for the left fold, we can compute the result as soon as we get two elements, e.g., a+b, where for the right fold, we need to compute the result of the addition of n-1 elements, and then add the first element, e.g., a + (b + ... + (y + z)). Therefore, the right fold has to store the intermediate results somewhere. The easy way is to use stack, e.g., a::rest -> a + (fold_right (+) rest 0)), where the a value is put onto the stack, then the (fold_right (+) rest 0)) computation is run, and when it is ready, we can finally add a and the sum of all other elements. Eventually, it will push all values a, b, ... x, until we finally get to y and z which we can sum, and then unfold the stack of calls.
The problem with the stack is that it is usually bounded, unlike the heap memory, which may grow without any bounds. This is not actually specific to mathematics or computer language design, this is how modern operating systems run programs, they give them a fixed sized stack space and unbound heap size. And once a program runs out of the stack size the operating system terminates it, without any possibility to recover. This is very bad, and should be avoided if possible.
Therefore, people proposed a safer implementation of fold_right, as a left fold of a reversed list. Obviously, this tradeoff results in a slower implementation, as we have to essentially create a reversed copy of the input list, and only after that traverse it with the fold_left function. As a result, we will traverse the list twice and produce garbage, which will further reduce the performance of our code. Therefore, we have a tradeoff between fast but unsafe implementation as provided by the standard library, versus a sure and safe, but slow implementation provided by some other libraries.
To summarize, fold_left is always faster than fold_right, and is always tail-recursive. The standard OCaml implementation of fold_right is not tail-recursive, which is faster than a tail recursive implementation of fold_right functions provided by some other libraries. However, this comes with a price, you shall not apply fold_right to large lists. In general, it means that in OCaml you have to prefer fold_left as your primary tool for processing lists.

Related

Why does the shuffle' function require an Int parameter?

In System.Random.Shuffle,
shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]
The hackage page mentions this Int argument as
..., its length,...
However, it seems that a simple wrapper function like
shuffle'' x = shuffle' x (length x)
should've sufficed.
shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.
The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.
Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.
It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.
Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.
This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):
-- examples
t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.
t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
However, it's not the same for the 'random-shuffle' package implementation:
> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths
I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.

Why is `++` for Haskell List implemented recursively and costs O(n) time?

As I understood, a List in Haskell is a similar to a Linked-List in C language.
So for expressions below:
a = [1,2,3]
b = [4,5,6]
a ++ b
Haskell implement this in a recursive way like this:
(++) (x:xs) ys = x:xs ++ ys
The time complexity for that is O(n)..
However, I was wondering why can't I implement ++ more efficiently.
The most efficient way may be like this:
make a copy(fork) of a, let's call it a', there may be some tricks to do this in O(1) time
make the last element of a' to point to the first element of b. This can be done easily in O(1) time..
Does anyone have ideas about this? Thanks!
That's pretty much what the recursive solution does. It's the copying of a which takes O(n) (where n is the length of a. The length of b doesn't affect the complexity).
There is really no "trick" to copy a list of n elements in O(1) time.
See the copy(fork) part is the problem - the recursive solution does exactly this (and you really have to do it, because you have to adjust all the pointers for the elements in the a list.
Let's say a = [a1,a2,a3] and b is some list.
You have to make a new copy of a3 (let's call it a3') because it should now no longer point to an empty list but to the start of b.
Then you have to make a copy of the second to last element a2 too because it must point to a3' and finally - for the same reason - you have to create a new copy of a1 too (pointing to a2').
This is exactly what the recursive definition does - it's no problem with the algorithm - it's a problem with the data-structure (it's just not good with concatenation).
If you don't allow mutability and want the structure of a list you can really do nothing else.
You have this in other langs. too if they provide immutable data - for example in .net strings are immutable - so there is almost the same problem with string-concatenation as is here (if you concat lots of strings your program will perform poorly). There are workaround (StringBuilder) that will deal better with the memory footprint - but of course those are no longer immutable data-structures.
There is no way to do that concatenation in constant time, simply because the immutability of the data structure doesn't allow it.
You might think that you could do something similar to the "cons" operator (:) that adds an additional element x0 to the front of a list oldList=[x1,x2,x3] (resulting in newList=(x0:oldLIst)) without having to run through the whole list. But that's just because you don't touch the existing list oldList, but simply reference it.
x0 : ( x1 : ( x2 : ( x3 : [] ) ) )
^ ^
newList oldList
But in your case (a ++ b) we are talking about updating a reference deep within the data structure. You want to replace the [] in 1:(2:(3:[])) (the explicit form of [1,2,3]) by the new tail b. Just count the parenthesis and you'll see that we have to go deep inside to get to the []. That's always expensive because we have to duplicate the whole outer part, in order to make sure that a stays unmodified. In the resulting list, where would the old a point to in order to have the unmodified list?
1 : ( 2 : ( 3 : b ) )
^ ^
a++b b
That's impossible in the same data structure. So we need a second one:
1 : ( 2 : ( 3 : [] ) )
^
a
And that means duplicating those : nodes, which necessarily costs the mentioned linear time in the first list. The "copy(fork)" that you mentioned is therefore, differently from what you said, not in O(1).
make a copy(fork) of a, let's call it a', there may be some tricks to do this in O(1) time
When you talk about a "trick" to fork something in constant time, you probably think about not actually making a full copy, but creating a reference to the original a, with the changes stored as "annotations" (like the hint: "modification to tail: use b instead of []").
But that's what Haskell, thanks to its lazyness, does anyway! It doesn't immediately execute the O(n) algorithm, but just "remembers" that you wanted a concatenated list, until you actually access its elements. But that doesn't save you from paying the cost in the end. Because even though in the beginning the reference was cheap (in O(1), just like you wanted), when you do access the actual list elements, every instance of the ++ operator adds a little overhead (the cost of "interpreting the annotation" that you added to your reference) to the access of every element in the first part of the concatenation, effectively adding the O(n) cost finally.

What container really mimics std::vector in Haskell?

The problem
I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one. This means that the size of the container, at the end, will always be n.
Each element, i, of the container depends on at least 2 and up to 4 previous results.
The container have to provide:
constant time insertions at either beginning or end (one of the two, not necessarily both)
constant time indexing in the middle
or alternatively (given a O(n) initialization):
constant time single element edits
constant time indexing in the middle
What is std::vector and why is it relevant
For those of you who don't know C++, std::vector is a dynamically sized array. It is a perfect fit for this problem because it is able to:
reserve space at construction
offer constant time indexing in the middle
offer constant time insertion at the end (with a reserved space)
Therefore this problem is solvable in O(n) complexity, in C++.
Why Data.Vector is not std::vector
Data.Vector, together with Data.Array, provide similar functionality to std::vector, but not quite the same. Both, of course, offer constant time indexing in the middle, but they offer neither constant time modification ((//) for example is at least O(n)) nor constant time insertion at either beginning of end.
Conclusion
What container really mimics std::vector in Haskell? Alternatively, what is my best shot?
From reddit comes the suggestion to use Data.Vector.constructN:
O(n) Construct a vector with n elements by repeatedly applying the generator function to the already constructed part of the vector.
constructN 3 f = let a = f <> ; b = f <a> ; c = f <a,b> in f <a,b,c>
For example:
λ import qualified Data.Vector as V
λ V.constructN 10 V.length
fromList [0,1,2,3,4,5,6,7,8,9]
λ V.constructN 10 $ (1+) . V.sum
fromList [1,2,4,8,16,32,64,128,256,512]
λ V.constructN 10 $ \v -> let n = V.length v in if n <= 1 then 1 else (v V.! (n - 1)) + (v V.! (n - 2))
fromList [1,1,2,3,5,8,13,21,34,55]
This certainly seems to qualify to solve the problem as you've described it above.
The first data structures that come to my mind are either Maps from Data.Map or Sequences from Data.Sequence.
Update
Data.Sequence
Sequences are persistent data structures that allow most operations efficient, while allowing only finite sequences. Their implementation is based on finger-trees, if you are interested. But which qualities does it have?
O(1) calculation of the length
O(1) insert at front/back with the operators <| and |> respectively.
O(n) creation from a list with fromlist
O(log(min(n1,n2))) concatenation for sequences of length n1 and n2.
O(log(min(i,n-i))) indexing for an element at position i in a sequence of length n.
Furthermore this structure supports a lot of the known and handy functions you'd expect from a list-like structure: replicate, zip, null, scans, sort, take, drop, splitAt and many more. Due to these similarities you have to do either qualified import or hide the functions in Prelude, that have the same name.
Data.Map
Maps are the standard workhorse for realizing a correspondence between "things", what you might call a Hashmap or associave array in other programming languages are called Maps in Haskell; other than in say Python Maps are pure - so an update gives you back a new Map and does not modify the original instance.
Maps come in two flavors - strict and lazy.
Quoting from the Documentation
Strict
API of this module is strict in both the keys and the values.
Lazy
API of this module is strict in the keys, but lazy in the values.
So you need to choose what fits best for your application. You can try both versions and benchmark with criterion.
Instead of listing the features of Data.Map I want to pass on to
Data.IntMap.Strict
Which can leverage the fact that the keys are integers to squeeze out a better performance
Quoting from the documentation we first note:
Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).
So what are the characteristics for IntMaps
O(min(n,W)) for (unsafe) indexing (!), unsafe in the sense that you will get an error if the key/index does not exist. This is the same behavior as Data.Sequence.
O(n) calculation of size
O(min(n,W)) for safe indexing lookup, which returns a Nothing if the key is not found and Just a otherwise.
O(min(n,W)) for insert, delete, adjust and update
So you see that this structure is less efficient than Sequences, but provide a bit more safety and a big benefit if you actually don't need all entries, such the representation of a sparse graph, where the nodes are integers.
For completeness I'd like to mention a package called persistent-vector, which implements clojure-style vectors, but seems to be abandoned as the last upload is from (2012).
Conclusion
So for your use case I'd strongly recommend Data.Sequence or Data.Vector, unfortunately I don't have any experience with the latter, so you need to try it for yourself. From the stuff I know it provides a powerful thing called stream fusion, that optimizes to execute multiple functions in one tight "loop" instead of running a loop for each function. A tutorial for Vector can be found here.
When looking for functional containers with particular asymptotic run times, I always pull out Edison.
Note that there's a result that in a strict language with immutable data structures, there's always a logarithmic slowdown to implementing mutable data structure on top of them. It's an open problem whether the limited mutation hidden behind laziness can avoid that slowdown. There also the issue of persistent vs. transient...
Okasaki is still a good read for background, but finger trees or something more complex like an RRB-tree should be available "off-the-shelf" and solve your problem.
I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one.
Each element, i, of the container depends on at least 2 and up to 4 previous results.
Lets consider a very small program. that calculates fibonacci numbers.
fib 1 = 1
fib 2 = 1
fib n = fib (n-1) + fib (n-2)
This is great for small N, but horrible for n > 10. At this point, you stumble across this gem:
fib n = fibs !! n where fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
You may be tempted to exclaim that this is dark magic (infinite, self referential list building and zipping? wth!) but it is really a great example of tying the knot, and using lazyness to ensure that values are calcuated as-needed.
Similarly, we can use an array to tie the knot too.
import Data.Array
fib n = arr ! 10
where arr :: Arr Int Int
arr = listArray (1,n) (map fib' [1..n])
fib' 1 = 1
fib' 2 = 1
fib' n = arr!(n-1) + arr!(n-2)
Each element of the array is a thunk that uses other elements of the array to calculate it's value. In this way, we can build a single array, never having to perform concatenation, and call out values from the array at will, only paying for the calculation up to that point.
The beauty of this method is that you don't only have to look behind you, you can look in front of you as well.

Calculate ratio of an element in a list efficiently

The following code works with small lists, however it takes forever with long lists, I suppose it's my double use of length that is the problem.
ratioOfPrimes :: [Int] -> Double
ratioOfPrimes xs = fromIntegral (length (filter isPrime xs))/ fromIntegral(length xs)
How do I calculate the ratio of an element in longer lists?
The double use of length isn't the main problem here. The multiple traversals in your implementation produce a constant factor and with double length and filter you get the avg complexity of O(3n). Due to Stream Fusion it's even O(2n), as already mentioned by Impredicative. But in fact since the constant factors don't have a dramatic effect on performance, it's even conventional to simply ignore them, so, conventionally speaking, your implementation still has the complexity of O(n), where n is the length of the input list.
The real problem here is that the above would all be true only if isPrime had the complexity of O(1), but it doesn't. This function performs a traversal thru a list of all primes, so it itself has the complexity of O(m). So the dramatic performance decrease here is caused by your algorithm having the final complexity of O(n*m), because on each iteration of the input list it has to traverse the list of all primes to an unknown depth.
To optimize I suggest to first sort the input list (takes O(n*log n)) and itegrate a custom lookup on a list of all primes, which will drop the already visited numbers on each iteration. This way you'll be able to achieve a single traversal on the list of all primes, which theoretically could grant you with the complexity of O(n*log n + n + m), which again, conventionally can be thought of as simply O(n*log n), by highlighting the cost center.
So, there's a few things going on there. Let's look at some of the operations involved:
length
filter
isPrime
length
As you say, using length twice isn't going to help, since that's O(n) for lists. You do that twice. Then there's filter, which is also going to do a whole pass of the list in O(n). What we'd like to do is do all this in a single pass of the list.
Functions in the Data.List.Stream module implement a technique called Stream Fusion, which would for example rewrite your (length (filter isPrime xs)) call into a single loop. However, you'd still have the second call to length. You could rewrite this whole thing into a single fold (or use of the State or ST monads) with a pair of accumulators and do this in a single pass:
ratioOfPrimes xs = let
(a,b) = foldl' (\(odd,all) i -> if (isPrime i) then (odd +1, all+1) else (odd, all+1)) (0,0) xs
in a/b
However, in this case you could also move away from using a list and use the vector library. The vector library implements the same stream fusion techniques for removing intermediate lists, but also has some other nifty features:
length is O(1)
The Data.Vector.Unboxed module lets you store unboxable types (which primitive types such as Int certainly are) without the overhead of the boxed representation. So this list of ints would be stored as a low-level Int array.
Using the vector package should let you write the idiomatic representation you have above and get better than the performance of a single-pass translation.
import qualified Data.Vector.Unboxed as U
ratioOfPrimes :: U.Vector Int -> Double
ratioOfPrimes xs = (fromIntegral $ U.length . U.filter isPrime $ xs) / (fromIntegral $ U.length xs)
Of course, the thing that hasn't been mentioned is the isPrime function, and whether the real problem is that it's slow for large n. An unperformant prime checker could easily blow concerns over list indexing out of the water.

About lists:suffix/2 in Erlang

The source code:
suffix(Suffix, List) ->
Delta = length(List) - length(Suffix),
Delta >= 0 andalso nthtail(Delta, List) =:= Suffix.
How about rewriting it like the follow:
suffix(Suffix, List) ->
prefix(reverse(Suffix), reverse(List)).
If Delta >=0, the first one will traverse four times, and the second one will traverse three times, is it correct?
The first one (from stdlib lists.erl) will traverse both lists twice each, yes. On the other hand, on the second traversal all the list cells will probably be in L2 cache, and it doesn't have to allocate any data. Your suggestion works too, but has to build two reversed temporary lists on the heap, which both has a cost in allocating and initializing data structures as well as causing garbage collection to happen more often on average.
If you think about the same problem in C (or any similar language): testing whether one singly linked list is a suffix of another singly linked list, it becomes more obvious why it's hard to do efficiently, in particular if you want to avoid allocating memory, and you aren't allowed to use tricks like reversing pointers.
I don't think it is correct. As far as I know, length is a build in function which does not need to traverse the list to get the result (it is the reason why it is allowed in guard test), and the andalso is a kind of shortcut. if the first term is false, it does not evaluate the second term and directly return false.