Pack consecutive duplicates of list elements into sublists in Ocaml - ocaml

I found this problem in the website 99 problems in ocaml. After some thinking I solved it by breaking the problem into a few smaller subproblems. Here is my code:
let rec frequency x l=
match l with
|[]-> 0
|h::t-> if x=[h] then 1+(frequency x t)
else frequency x t
;;
let rec expand x n=
match n with
|0->[]
|1-> x
|_-> (expand x (n-1)) # x
;;
let rec deduct a b=
match b with
|[]-> []
|h::t -> if a=[h] then (deduct a t)
else [h]# (deduct a t)
;;
let rec pack l=
match l with
|[]-> []
|h::t -> [(expand [h] (frequency [h] l))]# (pack (deduct [h] t))
;;
It is rather clear that this implementation is overkill, as I have to count the frequency of every element in the list, expand this and remove the identical elements from the list, then repeat the procedure. The algorithm complexity is about O(N*(N+N+N))=O(N^2) and would not work with large lists, even though it achieved the required purpose. I tried to read the official solution on the website, which says:
# let pack list =
let rec aux current acc = function
| [] -> [] (* Can only be reached if original list is empty *)
| [x] -> (x :: current) :: acc
| a :: (b :: _ as t) ->
if a = b then aux (a :: current) acc t
else aux [] ((a :: current) :: acc) t in
List.rev (aux [] [] list);;
val pack : 'a list -> 'a list list = <fun>
the code should be better as it is more concise and does the same thing. But I am confused with the use of "aux current acc" in the inside. It seems to me that the author has created a new function inside of the "pack" function and after some elaborate procedure was able to get the desired result using List.rev which reverses the list. What I do not understand is:
1) What is the point of using this, which makes the code very hard to read on first sight?
2) What is the benefit of using an accumulator and an auxiliary function inside of another function which takes 3 inputs? Did the author implicitly used tail recursion or something?
3) Is there anyway to modify the program so that it can pack all duplicates like my program?

These are questions mostly of opinion rather than fact.
1) Your code is far harder to understand, in my opinion.
2a) It's very common to use auxiliary functions in OCaml and other functional languages. You should think of it more like nested curly braces in a C-like language rather than as something strange.
2b) Yes, the code is using tail recursion, which yours doesn't. You might try giving your code a list of (say) 200,000 distinct elements. Then try the same with the official solution. You might try determining the longest list of distinct values your code can handle, then try timing the two different implementations for that length.
2c) In order to write a tail-recursive function, it's sometimes necessary to reverse the result at the end. This just adds a linear cost, which is often not enough to notice.
3) I suspect your code doesn't solve the problem as given. If you're only supposed to compress adjacent elements, your code doesn't do this. If you wanted to do what your code does with the official solution you could sort the list beforehand. Or you could use a map or hashtable to keep counts.
Generally speaking, the official solution is far better than yours in many ways. Again, you're asking for an opinion and this is mine.
Update
The official solution uses an auxiliary function named aux that takes three parameters: the currently accumulated sublist (some number of repetitions of the same value), the currently accumulated result (in reverse order), and the remaining input to be processed.
The invariant is that all the values in the first parameter (named current) are the same as the head value of the unprocessed list. Initially this is true because current is empty.
The function looks at the first two elements of the unprocessed list. If they're the same, it adds the first of them to the beginning of current and continues with the tail of the list (all but the first). If they're different, it wants to start accumulating a different value in current. It does this by adding current (with the one extra value added to the front) to the accumulated result, then continuing to process the tail with an empty value for current. Note that both of these maintain the invariant.

Related

Ocaml function parsing list of lists

I am trying to build a function in Ocaml which parses a list of lists, eg from [[0;1];[3;4;8]] to [0;1;3;4;8]. I tried to do something like:
#let rec parse listoflists=
match listoflists with
[[]]->[]
|[h::t]->h::parse [t];;
but it doesn't work... I also need an explanation, because I don't understand how the lists of lists actually work...
I don't have to use the Ocaml library functions.
If you can understand a list, then I claim you already know about lists of lists. This is the beauty of recursion.
The only real difficulty (as I see it) is keeping track of which list you're talking about. Your code needs to work with the list of lists itself, which consists of one list (call it h) followed by some other lists. It also needs to work with the list h, which consists of some element (call it hh) followed by some other elements.
It seems to me there are three interesting cases: (a) the list of lists is empty; (b) the first element of the list of lists h is empty (c) neither the list of lists nor h is empty.
You are not handling all three of these cases. That's one way to see that your code probably wouldn't work.
Here is a match that matches the three cases, which might help a little:
match listoflists with
| [] -> ... (* List of lists is empty *)
| [] :: t -> ... (* First list h is empty *)
| (hh :: ht) :: t -> ... (* Neither is empty *)

How does this nested fold_left work? and what is ~f: and ~init:?

I have this code snippet in Ocaml which is taken from here. I know it fills a data structure for a demand (traffic matrix) with a the specified value and when the two hosts are the same it just fill the value with 0. In python or in any imerative language, we would use two for loop one inside another to do the task. I assume this is the reason we have two (fold_left) in this code in which each one is equivilant to a one for loop (I might be mistaken!). My question is how this code works? and what is ~f: and ~init:? are these labels. If they are labels why the compiler complains when I remove them or when I change them? even when I put these arguments in the right order?!
I have finished one book and have watched alot of youtube videos but still find it difficult to understand most of Ocaml code.
let create_3cycle_input () =
let topo = Net.Parse.from_dotfile "./data/topologies/3cycle.dot" in
let hosts = get_hosts topo in
let demands =
List.fold_left
hosts
~init:SrcDstMap.empty
~f:(fun acc u ->
List.fold_left
hosts
~init:acc
~f:(fun acc v ->
let r = if u = v then 0.0 else 53. in
SrcDstMap.set acc ~key:(u,v) ~data:r)) in
(hosts,topo,demands);;
Please, read my another SO answer that explains how fold_left works. Once you understand how a single fold works, we can move forward to the nested case (as well as to the labels).
When you have a collection of collections, i.e., when an element of a collection is another collection by itself, and you want to iterate over each element of those inner collections than you need to nest your folds. A good example, are matrices which could be seen as collections of vectors, where vectors are by themselves are also collections.
The iteration algorithm is simple,
state := init
for each inner-collection in outer-collection do
for each element in inner-collection do
state := user-function(state, element)
done
done
Or, the same in OCaml (using the Core version of the fold)
let fold_list_of_lists outer ~init ~f =
List.fold outer ~init ~f:(fun state inner ->
List.fold inner ~init:state ~f:(fun state elt ->
f state elt)
This function will have type 'a list list -> init:'b -> f:('b -> 'a -> 'b) -> 'b
and will be applicable to any list of lists.
Concerning the labels and their removal. The labels are keyworded arguments and enable passing arguments to a function in an arbitrary manner, which is very useful when you have so many arguments. Removing labels is sometimes possible, but could be disabled using a compiler option. And the Core library (which is used by the project that you have referenced) is disabling removing the labels, probably for the good sake.
In general, labels could be omitted if the application is total, i.e., when the returned value is not a function by itself. Since fold_left returns a type variable, it could always be a function, therefore we always need to use labels with the Core's List.fold (and List.fold_left) function.

foldl vs foldr: which should I prefer?

I remember that when I showed some code that I wrote to my professor he remarked, offhand, that
It rarely matters, but it's worth noting that fold* is a little bit more efficient than fold*' in SML/NJ, so you should prefer it over fold* when possible.
I forget whether fold* was foldr or foldl. I know that this is one of those micro-optimization things that probably doesn't make a big difference in practice, but I'd like to be in the habit of using the more efficient one when I have the choice.
Which is which? My guess is that this is SML/NJ specific and that MLton will be smart enough to optimize both down to the same machine code, but answers for other compilers are good to know.
foldl is tail-recursive, while foldr is not. Although you can do foldr in a tail-recursive way by reversing the list (which is tail recursive), and then doing foldl.
This is only going to matter if you are folding over huge lists.
Prefer the one that converts the given input into the intended output.
If both produce the same output such as with a sum, and if dealing with a list, folding from the left will be more efficient because the fold can begin with head element, while folding from the right will first require walking the list to find the last element before calculating the first intermediate result.
With arrays and similar random access data structures, there's probably not going to be much difference.
A compiler optimization that always chose the better of left and right would require the compiler to determine that left and right were equivalent over all possible inputs. Since foldl and foldr take a functions as arguments, this is a bit of a tall order.
I'm going to keep the accepted answer here, but I had the chance to speak to my professor, and his reply was actually the opposite, because I forgot a part of my question. The code in question was building up a list, and he said:
Prefer foldr over foldl when possible, because it saves you a reverse at the end in cases where you're building up a list by appending elements during the fold.
As in, for a trivial example:
- val ls = [1, 2, 3];
val ls = [1,2,3] : int list
- val acc = (fn (x, xs) => x::xs);
val acc = fn : 'a * 'a list -> 'a list
- foldl acc [] ls;
val it = [3,2,1] : int list
- foldr acc [] ls;
val it = [1,2,3] : int list
The O(n) save of a reverse is probably more important than the other differences between foldl and foldr mentioned in answers to this question.

Haskell List complexity

sorry if this question has already been asked, I didn't find it. And sorry for my poor english.
I'm learning Haskell and try to use lists.
I wrote a function which transforms a list following a specific pattern, I can't check if it works now, but i think so.
This function is not a tail call function, so I think it will be horrible to compute this function with a big list:
transform :: [Int] -> [Int]
transform list = case list of
(1:0:1:[]) -> [1,1,1,1]
(1:1:[]) -> [1,0,1]
[1] -> [1,1]
(1:0:1:0:s) -> 1:1:1:1: (transform s)
(1:1:0:s) -> 1:0:1: (transform s)
(1:0:s) -> 1:1: (transform s)
(0:s) -> 0: (transform s)
So I thought about another function, which would be "better":
transform = reverse . aux []
where
aux buff (1:0:[1]) = (1:1:1:1:buff)
aux buff (1:[1]) = (1:0:1:buff)
aux buff [1] = (1:1:buff)
aux buff (1:0:1:0:s) = aux (1:1:1:1:buff) s
aux buff (1:1:0:s) = aux (1:0:1:buff) s
aux buff (1:0:s) = aux (1:1:buff) s
aux buff (0:s) = aux (0:buff) s
The problem is that I don't know how it compiles and if I'm wrong with the second function. Can you explain me how lists work ? Is it better to use (++) or reverse the list at the end ?
Thank you in advance for your answers
The first function is perfectly fine and in my opinion preferable to the second one.
The reason is laziness. If you have a transform someList expression in your code, the resulting list will not be evaluated unless you demand it. In particular the list will be evaluated only as far as it is needed; print $ take 10 $ transform xs will do less work than print $ take 20 $ transform xs.
In a strict language transform would indeed encumber the stack, since it would have to evaluate the whole list (in a non-tail recursive way) before returning anything of use. In Haskell transform (0:xs) evaluates to 0 : transform xs, a usable partial result. We can inspect the head of this result without touching the tail. There is no danger of stack overflow either: at any time there is at most a single unevaluated thunk (like transform xs in the previous example) in the tail of the list. If you demand more elements, the thunk will be just pushed further back, and the stack frame of the previous thunk can be garbage collected.
If we always fully evaluate the list then the performance of the two functions should be similar, or even then the lazy version could be somewhat faster because of the lack of reversing or the extra ++-s. So, by switching to the second function we lose laziness and gain no extra performance.
Your first version looks much better to me1. It's fine that it's not tail-recursive: you don't want it to be tail-recursive, you want it to be lazy. The second version can't produce even a single element without processing the entire input list, because in order to reverse the result of aux, the entirety of aux must be known. However,
take 10 . transform $ cycle [1,0,0,1,1,1]
would work fine with your first definition of transform, because you only consume as much of the list as you need in order to make a decision.
1 But note that (1:0:1:[]) is just [1,0,1].

Compare int with List SML

I am just starting to learn SML and having issues with my code. I want to compare an int with List of ints and return a list of numbers less than my int
fun less(e, L): L =
if L = [] then []
else (hd[L] < e :: less tl(hd))
I want to return a list of all numbers less than e by comparing it to the list L. What I'm I doing wrong?
hd[L] < e :: less tl(hd)
First of all less tl(hd) is the same as just less tl hd or (less tl) hd. Writing f g(x) does not mean "apply g to x and f to the result". For that you'd need to write f (g x). In general putting parentheses around atomic expressions won't change anything and by leaving out the space before the opening parentheses, you make the syntax look like something it's not, so you should avoid that.
Then tl hd simply doesn't make much sense as hd is a function and tl expects a list. You probably meant to apply tl to L, not to hd.
Then hd [L] takes the first element of the list [L]. [L] is a list with a single element: L. So writing hd [L] is the same thing as just writing L. You probably meant tot take the head of L. For that you'd just write hd L without the brackets.
Now the problem is that you're trying to prepend the result of (hd L) < e, which is a boolean, to the list. This will work (as in compile and run without error), but it will result in a list of booleans (which will contain a true for any element less than e and a false for any other element). That's not what you said you wanted. To get what you want, you should have an if which appends the head to the list if it is less than e and doesn't prepend anything when it's not.
Fixing these problems should make your code work as intended.
PS: Usually it's preferred to use pattern matching to find out whether a list is empty and to split a non-empty list into its head and tail. The functions hd and tl are best avoided in most circumstances and so is = []. However if you haven't covered pattern matching yet, this is fine for now.
PPS: You can do what you want much more easily and without recursion by using the filter function. But again: if you haven't covered that yet, your way is fine for now.
am i wrong is it that you didn't compare the elements. you only catered for the empty list.
something like
if e < hd L then e::less(e, (tl L))