TAKE function in SML - sml

In the TAKE function which is given by
fun TAKE (xs,0) = []
| TAKE (NIL, n) = raise Subscript
| TAKE (CONS (x,xf),n) = x :: TAKE(xf(), n-1);
What are xs, x , xf?
And can you also please tell me how take function works.

Your take function seems to operate over a data structure of some type like
datatype 'a stream = NIL | CONS of 'a * (unit -> 'a stream)
Your take function iterates over the stream data structure and takes n elements out of it, and returns a list containing those elements.
The identifier xs is the function parameter that holds the stream data structure, the identifier n is the function parameter holding the number of elements you want to retrieve (ie take). The identifiers x,xf are patterns, they are bound to the values of the CONS cell, so x is the head (ie 'a) and xf is the tail (ie (unit -> 'a stream).
It is my impression (based on your question) that you need to gain a deeper understanding of SML and functional programming in general to make sense of this answer, though. Most likely you won't achieve that asking questions here. I recommend you to get a good reference book, like the ones suggested in the information section of the SML tag here in SO.
You may also want to read the section 3.5 Streams from the great book Structure and Interpretation of Computer Programs. The code in the book is in Scheme. It might take a while to get it all (if you are unfamiliar with any lisp-related language), but it is worth the effort.

Related

How to run Ocaml simulations

If I want to run a simulation of the abstract machine on the code below, how do I know what would be in the workspace, stack, and heap?
let rec map (f: 'a -> 'b) (y: 'a list): 'b list =
begin match y with
| [] -> []
| h :: t -> (f h) :: (map f t)
end in
let x = map (fun t -> t + 1) [0; 1; 2] in
0 :: x
You're using terminology that we (people reading StackOverflow) don't share. In particular, we don't know what abstract machine you're talking about.
Speaking very generally about computer systems, the workspace usually contains the current definitions. At the beginning it might contain predefined functions and so on. Your code uses some predefined type names (like 'a list) and constructors (like ::). I don't know if they need to be in your workspace specifically.
Very possibly your definitions of map and x would need to be loaded into the workspace as the first step.
The stack contains a record of the functions that have been called but haven't returned. In general it will be empty when you start an evaluation.
The heap is a general term for the set of values that exist at the moment but might disappear later (when no longer needed). Unless you count your values named map and x, the heap could also be empty at the beginning.
Sorry I can't be more specific. It sounds like you're taking a class, and you might want to consult some of the class resources (including the professor or TA :-)

How does this nested fold_left work? and what is ~f: and ~init:?

I have this code snippet in Ocaml which is taken from here. I know it fills a data structure for a demand (traffic matrix) with a the specified value and when the two hosts are the same it just fill the value with 0. In python or in any imerative language, we would use two for loop one inside another to do the task. I assume this is the reason we have two (fold_left) in this code in which each one is equivilant to a one for loop (I might be mistaken!). My question is how this code works? and what is ~f: and ~init:? are these labels. If they are labels why the compiler complains when I remove them or when I change them? even when I put these arguments in the right order?!
I have finished one book and have watched alot of youtube videos but still find it difficult to understand most of Ocaml code.
let create_3cycle_input () =
let topo = Net.Parse.from_dotfile "./data/topologies/3cycle.dot" in
let hosts = get_hosts topo in
let demands =
List.fold_left
hosts
~init:SrcDstMap.empty
~f:(fun acc u ->
List.fold_left
hosts
~init:acc
~f:(fun acc v ->
let r = if u = v then 0.0 else 53. in
SrcDstMap.set acc ~key:(u,v) ~data:r)) in
(hosts,topo,demands);;
Please, read my another SO answer that explains how fold_left works. Once you understand how a single fold works, we can move forward to the nested case (as well as to the labels).
When you have a collection of collections, i.e., when an element of a collection is another collection by itself, and you want to iterate over each element of those inner collections than you need to nest your folds. A good example, are matrices which could be seen as collections of vectors, where vectors are by themselves are also collections.
The iteration algorithm is simple,
state := init
for each inner-collection in outer-collection do
for each element in inner-collection do
state := user-function(state, element)
done
done
Or, the same in OCaml (using the Core version of the fold)
let fold_list_of_lists outer ~init ~f =
List.fold outer ~init ~f:(fun state inner ->
List.fold inner ~init:state ~f:(fun state elt ->
f state elt)
This function will have type 'a list list -> init:'b -> f:('b -> 'a -> 'b) -> 'b
and will be applicable to any list of lists.
Concerning the labels and their removal. The labels are keyworded arguments and enable passing arguments to a function in an arbitrary manner, which is very useful when you have so many arguments. Removing labels is sometimes possible, but could be disabled using a compiler option. And the Core library (which is used by the project that you have referenced) is disabling removing the labels, probably for the good sake.
In general, labels could be omitted if the application is total, i.e., when the returned value is not a function by itself. Since fold_left returns a type variable, it could always be a function, therefore we always need to use labels with the Core's List.fold (and List.fold_left) function.

Pack consecutive duplicates of list elements into sublists in Ocaml

I found this problem in the website 99 problems in ocaml. After some thinking I solved it by breaking the problem into a few smaller subproblems. Here is my code:
let rec frequency x l=
match l with
|[]-> 0
|h::t-> if x=[h] then 1+(frequency x t)
else frequency x t
;;
let rec expand x n=
match n with
|0->[]
|1-> x
|_-> (expand x (n-1)) # x
;;
let rec deduct a b=
match b with
|[]-> []
|h::t -> if a=[h] then (deduct a t)
else [h]# (deduct a t)
;;
let rec pack l=
match l with
|[]-> []
|h::t -> [(expand [h] (frequency [h] l))]# (pack (deduct [h] t))
;;
It is rather clear that this implementation is overkill, as I have to count the frequency of every element in the list, expand this and remove the identical elements from the list, then repeat the procedure. The algorithm complexity is about O(N*(N+N+N))=O(N^2) and would not work with large lists, even though it achieved the required purpose. I tried to read the official solution on the website, which says:
# let pack list =
let rec aux current acc = function
| [] -> [] (* Can only be reached if original list is empty *)
| [x] -> (x :: current) :: acc
| a :: (b :: _ as t) ->
if a = b then aux (a :: current) acc t
else aux [] ((a :: current) :: acc) t in
List.rev (aux [] [] list);;
val pack : 'a list -> 'a list list = <fun>
the code should be better as it is more concise and does the same thing. But I am confused with the use of "aux current acc" in the inside. It seems to me that the author has created a new function inside of the "pack" function and after some elaborate procedure was able to get the desired result using List.rev which reverses the list. What I do not understand is:
1) What is the point of using this, which makes the code very hard to read on first sight?
2) What is the benefit of using an accumulator and an auxiliary function inside of another function which takes 3 inputs? Did the author implicitly used tail recursion or something?
3) Is there anyway to modify the program so that it can pack all duplicates like my program?
These are questions mostly of opinion rather than fact.
1) Your code is far harder to understand, in my opinion.
2a) It's very common to use auxiliary functions in OCaml and other functional languages. You should think of it more like nested curly braces in a C-like language rather than as something strange.
2b) Yes, the code is using tail recursion, which yours doesn't. You might try giving your code a list of (say) 200,000 distinct elements. Then try the same with the official solution. You might try determining the longest list of distinct values your code can handle, then try timing the two different implementations for that length.
2c) In order to write a tail-recursive function, it's sometimes necessary to reverse the result at the end. This just adds a linear cost, which is often not enough to notice.
3) I suspect your code doesn't solve the problem as given. If you're only supposed to compress adjacent elements, your code doesn't do this. If you wanted to do what your code does with the official solution you could sort the list beforehand. Or you could use a map or hashtable to keep counts.
Generally speaking, the official solution is far better than yours in many ways. Again, you're asking for an opinion and this is mine.
Update
The official solution uses an auxiliary function named aux that takes three parameters: the currently accumulated sublist (some number of repetitions of the same value), the currently accumulated result (in reverse order), and the remaining input to be processed.
The invariant is that all the values in the first parameter (named current) are the same as the head value of the unprocessed list. Initially this is true because current is empty.
The function looks at the first two elements of the unprocessed list. If they're the same, it adds the first of them to the beginning of current and continues with the tail of the list (all but the first). If they're different, it wants to start accumulating a different value in current. It does this by adding current (with the one extra value added to the front) to the accumulated result, then continuing to process the tail with an empty value for current. Note that both of these maintain the invariant.

foldl vs foldr: which should I prefer?

I remember that when I showed some code that I wrote to my professor he remarked, offhand, that
It rarely matters, but it's worth noting that fold* is a little bit more efficient than fold*' in SML/NJ, so you should prefer it over fold* when possible.
I forget whether fold* was foldr or foldl. I know that this is one of those micro-optimization things that probably doesn't make a big difference in practice, but I'd like to be in the habit of using the more efficient one when I have the choice.
Which is which? My guess is that this is SML/NJ specific and that MLton will be smart enough to optimize both down to the same machine code, but answers for other compilers are good to know.
foldl is tail-recursive, while foldr is not. Although you can do foldr in a tail-recursive way by reversing the list (which is tail recursive), and then doing foldl.
This is only going to matter if you are folding over huge lists.
Prefer the one that converts the given input into the intended output.
If both produce the same output such as with a sum, and if dealing with a list, folding from the left will be more efficient because the fold can begin with head element, while folding from the right will first require walking the list to find the last element before calculating the first intermediate result.
With arrays and similar random access data structures, there's probably not going to be much difference.
A compiler optimization that always chose the better of left and right would require the compiler to determine that left and right were equivalent over all possible inputs. Since foldl and foldr take a functions as arguments, this is a bit of a tall order.
I'm going to keep the accepted answer here, but I had the chance to speak to my professor, and his reply was actually the opposite, because I forgot a part of my question. The code in question was building up a list, and he said:
Prefer foldr over foldl when possible, because it saves you a reverse at the end in cases where you're building up a list by appending elements during the fold.
As in, for a trivial example:
- val ls = [1, 2, 3];
val ls = [1,2,3] : int list
- val acc = (fn (x, xs) => x::xs);
val acc = fn : 'a * 'a list -> 'a list
- foldl acc [] ls;
val it = [3,2,1] : int list
- foldr acc [] ls;
val it = [1,2,3] : int list
The O(n) save of a reverse is probably more important than the other differences between foldl and foldr mentioned in answers to this question.

Haskell List complexity

sorry if this question has already been asked, I didn't find it. And sorry for my poor english.
I'm learning Haskell and try to use lists.
I wrote a function which transforms a list following a specific pattern, I can't check if it works now, but i think so.
This function is not a tail call function, so I think it will be horrible to compute this function with a big list:
transform :: [Int] -> [Int]
transform list = case list of
(1:0:1:[]) -> [1,1,1,1]
(1:1:[]) -> [1,0,1]
[1] -> [1,1]
(1:0:1:0:s) -> 1:1:1:1: (transform s)
(1:1:0:s) -> 1:0:1: (transform s)
(1:0:s) -> 1:1: (transform s)
(0:s) -> 0: (transform s)
So I thought about another function, which would be "better":
transform = reverse . aux []
where
aux buff (1:0:[1]) = (1:1:1:1:buff)
aux buff (1:[1]) = (1:0:1:buff)
aux buff [1] = (1:1:buff)
aux buff (1:0:1:0:s) = aux (1:1:1:1:buff) s
aux buff (1:1:0:s) = aux (1:0:1:buff) s
aux buff (1:0:s) = aux (1:1:buff) s
aux buff (0:s) = aux (0:buff) s
The problem is that I don't know how it compiles and if I'm wrong with the second function. Can you explain me how lists work ? Is it better to use (++) or reverse the list at the end ?
Thank you in advance for your answers
The first function is perfectly fine and in my opinion preferable to the second one.
The reason is laziness. If you have a transform someList expression in your code, the resulting list will not be evaluated unless you demand it. In particular the list will be evaluated only as far as it is needed; print $ take 10 $ transform xs will do less work than print $ take 20 $ transform xs.
In a strict language transform would indeed encumber the stack, since it would have to evaluate the whole list (in a non-tail recursive way) before returning anything of use. In Haskell transform (0:xs) evaluates to 0 : transform xs, a usable partial result. We can inspect the head of this result without touching the tail. There is no danger of stack overflow either: at any time there is at most a single unevaluated thunk (like transform xs in the previous example) in the tail of the list. If you demand more elements, the thunk will be just pushed further back, and the stack frame of the previous thunk can be garbage collected.
If we always fully evaluate the list then the performance of the two functions should be similar, or even then the lazy version could be somewhat faster because of the lack of reversing or the extra ++-s. So, by switching to the second function we lose laziness and gain no extra performance.
Your first version looks much better to me1. It's fine that it's not tail-recursive: you don't want it to be tail-recursive, you want it to be lazy. The second version can't produce even a single element without processing the entire input list, because in order to reverse the result of aux, the entirety of aux must be known. However,
take 10 . transform $ cycle [1,0,0,1,1,1]
would work fine with your first definition of transform, because you only consume as much of the list as you need in order to make a decision.
1 But note that (1:0:1:[]) is just [1,0,1].