Split a list in two and preserve order - ocaml

How do you efficiently split a list in 2, preserving the order of the elements?
Here's an example of input and expected output
[] should produce ([],[])
[1;] can produce ([1;], []) or ([], [1;])
[1;2;3;4;] should produce ([1; 2;], [3; 4;])
[1;2;3;4;5;] can produce ([1;2;3;], [4;5;]) or ([1;2;], [3;4;5;])
I tried a few things but I'm unsure which is the most efficient... Maybe there is a solution out there that I'm missing completely(calls to C code don't count).
My first attempt was to use List's partition function with a ref to 1/2 the length of the list. This works but you walk through the whole list when you only need to cover half.
let split_list2 l =
let len = ref ((List.length l) / 2) in
List.partition (fun _ -> if !len = 0 then false else (len := !len - 1; true)) l
My next attempt was to use a accumulator and then reverse it. This only walks through half the list but I call reverse to correct the order of the accumulator.
let split_list4 l =
let len = List.length l in
let rec split_list4_aux ln acc lst =
if ln < 1
then
(List.rev acc, lst)
else
match lst with
| [] -> failwith "Invalid split"
| hd::tl ->
split_list4_aux (ln - 1) (hd::acc) tl in
split_list4_aux (len / 2) [] l
My final attempt used function closures for the accumulator and it works but I have no idea how efficient closures are.
let split_list3 l =
let len = List.length l in
let rec split_list3_aux ln func lst =
if ln < 1
then
(func [], lst)
else
match lst with
| hd::tl -> split_list3_aux (ln - 1) (fun t -> func (hd::t)) tl
| _ -> failwith "Invalid split" in
split_list3_aux (len / 2) (fun t -> t) l
So is there a standard way to split a list in OCaml(preserving element order) that's most efficient?

You need to traverse the whole list for all of your solutions. The List.length function traverses the whole list. But it's true that your later solutions re-use the tail of the original list rather than constructing a new list.
It is difficult to say how fast any given bit of code is going to be just by inspection. Generally it's good enough to think in aysmptotic O(f(n)) terms, then work on slow functions in detail through timing tests (of realistic data).
All of your answers look to be O(n), which is the best you can do since you clearly need to know the length of the list to get the answer.
Your split_list2 and split_list3 solutions look pretty complicated to me, so I would expect (intuitively) them to be slower. A closure is a fairly complicated data structure containing a function and the environment of accessible variables. So it's problaby not all that fast to construct one.
Your split_list4 solution is what I would code up myself.
If you really care about timings you should time your solutions on some long lists. Keep in mind that you might get different timings on different systems.

Couldn't give up this question. I had to find a way that I could walk through this list one time to create a split with order preserved..
How about this?
let split lst =
let cnt = ref 0 in
let acc = ref ([], []) in
let rec split_aux c l =
match l with
| [] -> cnt := (c / 2)
| hd::tl ->
(
split_aux (c + 1) tl;
let (f, s) = (!acc) in
if c < (!cnt)
then
acc := ((hd::f), s)
else
acc := (f, hd::s)
)
in
split_aux 0 lst; !acc

Related

SML- how to look at a string and put letters a-z into a list (only once)

I have seen some similar questions, but nothing that really helped me. Basically the title says it all. Using SML I want to take a string that I have, and make a list containing each letter found in the string. Any help would be greatly appreciated.
One possibility is to use the basic logic of quicksort to sort the letters while removing duplicates at the same time. Something like:
fun distinctChars []:char list = []
| distinctChars (c::cs) =
let val smaller = List.filter (fn x => x < c) cs
val bigger = List.filter (fn x => x > c) cs
in distinctChars smaller # [c] # distinctChars bigger
end
If the < and > in the definitions of smaller and bigger were to be replaced by <= and >= then it would simply be an implementation of quicksort (although not the most efficient one since it makes two passes over cs when a suitably defined auxiliary function could split into smaller and bigger in just one pass). The strict inequalities have the effect of throwing away duplicates.
To get what you want from here, do something like explode the string into a list of chars, remove non-alphabetical characters from the resulting list, while simultaneously converting to lower case, then invoke the above function -- ideally first refined so that it uses a custom split function rather than List.filter twice.
On Edit: # is an expensive operator and probably results in the naïve SML quicksort not being all that quick. You can use the above idea of a modified sort, but one that modifies mergesort instead of quicksort:
fun split ls =
let fun split' [] (xs,ys) = (xs,ys)
| split' (a::[]) (xs, ys) = (a::xs,ys)
| split' (a::b::cs) (xs, ys) = split' cs (a::xs, b::ys)
in split' ls ([],[])
end
fun mergeDistinct ([], ys) = ys:char list
| mergeDistinct (xs, []) = xs
| mergeDistinct (x::xs, y::ys) =
if x < y then x::mergeDistinct(xs,y::ys)
else if x > y then y::mergeDistinct(x::xs,ys)
else mergeDistinct(x::xs, ys)
fun distinctChars [] = []
| distinctChars [c] = [c]
| distinctChars chars =
let val (xs,ys) = split chars
in mergeDistinct (distinctChars xs, distinctChars ys)
end
You can get a list of all the letters in a few different ways:
val letters = [#"a",#"b",#"c",#"d",#"e",#"f",#"g",#"h",#"i",#"j",#"k",#"l",#"m",#"n",#"o",#"p",#"q",#"r",#"s",#"t",#"u",#"v",#"w",#"x",#"y",#"z"]
val letters = explode "abcdefghijklmnopqrstuvwxyz"
val letters = List.tabulate (26, fn i => chr (i + ord #"a"))
Update: Looking at your question and John's answer, I might have misunderstood your intention. An efficient way to iterate over a string and gather some result (e.g. a set of characters) could be to write a "foldr for strings":
fun string_foldr f acc0 s =
let val len = size s
fun loop i acc = if i < len then loop (i+1) (f (String.sub (s, i), acc)) else acc
in loop 0 acc0 end
Given an implementation of sets with at least setEmpty and setInsert, one could then write:
val setLetters = string_foldr (fn (c, ls) => setInsert ls c) setEmpty "some sentence"
The simplest solution I can think of:
To get the distinct elements of a list:
Take the head
Remove that value from the tail and get the distinct elements of the result.
Put 1 and 2 together.
In code:
(* Return the distinct elements of a list *)
fun distinct [] = []
| distinct (x::xs) = x :: distinct (List.filter (fn c => x <> c) xs);
(* All the distinct letters, in lower case. *)
fun letters s = distinct (List.map Char.toLower (List.filter Char.isAlpha (explode s)));
(* Variation: "point-free" style *)
val letters' = distinct o (List.map Char.toLower) o (List.filter Char.isAlpha) o explode;
This is probably not the most efficient solution, but it's uncomplicated.

OCaml. Return first n elements of a list

I am new to OCaml and functional programming as a whole. I am working on a part of an assignment where I must simply return the first n elements of a list. I am not allowed to use List.Length.
I feel that what I have written is probably overly complicated for what I'm trying to accomplish. What my code attempts to do is concatenate the front of the list to the end until n is decremented to 1. At which point the head moves a further n-1 spots to that the tail of the list and then return the tail. Again, I realize that there is probably a much simpler way to do this, but I am stumped and probably showing my inability to grasp functional programming.
let rec take n l =
let stopNum = 0 - (n - 1) in
let rec subList n lst =
match lst with
| hd::tl -> if n = stopNum then (tl)
else if (0 - n) = 0 then (subList (n - 1 ) tl )
else subList (n - 1) (tl # [hd])
| [] -> [] ;;
My compiler tells me that I have a syntax error on the last line. I get the same result regardless of whether "| [] -> []" is the last line or the one above it. The syntax error does not exist when I take out the nested subList let. Clearly there is something about nested lets that I am just not understanding.
Thanks.
let rec firstk k xs = match xs with
| [] -> failwith "firstk"
| x::xs -> if k=1 then [x] else x::firstk (k-1) xs;;
You might have been looking for this one.
What you have to do here, is to iterate on your initial list l and then add elements of this list in an accumulator until n is 0.
let take n l =
let rec sub_list n accu l =
match l with
| [] -> accu (* here the list is now empty, return the partial result *)
| hd :: tl ->
if n = 0 then accu (* if you reach your limit, return your result *)
else (* make the call to the recursive sub_list function:
- decrement n,
- add hd to the accumulator,
- call with the rest of the list (tl)*)
in
sub_list n [] l
Since you're just starting with FP, I suggest you look for the simplest and most elegant solution. What you're looking for is a way to solve the problem for n by building it up from a solution for a smaller problem.
So the key question is: how could you produce the first n elements of your list if you already had a function that could produce the first (n - 1) elements of a list?
Then you need to solve the "base" cases, the cases that are so simple that the answer is obvious. For this problem I'd say there are two base cases: when n is 0, the answer is obvious; when the list is empty, the answer is obvious.
If you work this through you get a fairly elegant definition.

Ocaml list of ints to list of int lists (Opposite of flattening)

With a list of integers such as:
[1;2;3;4;5;6;7;8;9]
How can I create a list of list of ints from the above, with all new lists the same specified length?
For example, I need to go from:
[1;2;3;4;5;6;7;8;9] to [[1;2;3];[4;5;6];[7;8;9]]
with the number to split being 3?
Thanks for your time.
So what you actually want is a function of type
val split : int list -> int -> int list list
that takes a list of integers and a sub-list-size. How about one that is even more general?
val split : 'a list -> int -> 'a list list
Here comes the implementation:
let split xs size =
let (_, r, rs) =
(* fold over the list, keeping track of how many elements are still
missing in the current list (csize), the current list (ys) and
the result list (zss) *)
List.fold_left (fun (csize, ys, zss) elt ->
(* if target size is 0, add the current list to the target list and
start a new empty current list of target-size size *)
if csize = 0 then (size - 1, [elt], zss # [ys])
(* otherwise decrement the target size and append the current element
elt to the current list ys *)
else (csize - 1, ys # [elt], zss))
(* start the accumulator with target-size=size, an empty current list and
an empty target-list *)
(size, [], []) xs
in
(* add the "left-overs" to the back of the target-list *)
rs # [r]
Please let me know if you get extra points for this! ;)
The code you give is a way to remove a given number of elements from the front of a list. One way to proceed might be to leave this function as it is (maybe clean it up a little) and use an outer function to process the whole list. For this to work easily, your function might also want to return the remainder of the list (so the outer function can easily tell what still needs to be segmented).
It seems, though, that you want to solve the problem with a single function. If so, the main thing I see that's missing is an accumulator for the pieces you've already snipped off. And you also can't quit when you reach your count, you have to remember the piece you just snipped off, and then process the rest of the list the same way.
If I were solving this myself, I'd try to generalize the problem so that the recursive call could help out in all cases. Something that might work is to allow the first piece to be shorter than the rest. That way you can write it as a single function, with no accumulators
(just recursive calls).
I would probably do it this way:
let split lst n =
let rec parti n acc xs =
match xs with
| [] -> (List.rev acc, [])
| _::_ when n = 0 -> (List.rev acc, xs)
| x::xs -> parti (pred n) (x::acc) xs
in let rec concat acc = function
| [] -> List.rev acc
| xs -> let (part, rest) = parti n [] xs in concat (part::acc) rest
in concat [] lst
Note that we are being lenient if n doesn't divide List.length lst evenly.
Example:
split [1;2;3;4;5] 2 gives [[1;2];[3;4];[5]]
Final note: the code is very verbose because the OCaml standard lib is very bare bones :/ With a different lib I'm sure this could be made much more concise.
let rec split n xs =
let rec take k xs ys = match k, xs with
| 0, _ -> List.rev ys :: split n xs
| _, [] -> if ys = [] then [] else [ys]
| _, x::xs' -> take (k - 1) xs' (x::ys)
in take n xs []

How to partition a list with a given group size?

I'm looking for the best way to partition a list (or seq) so that groups have a given size.
for ex. let's say I want to group with size 2 (this could be any other number though):
let xs = [(a,b,c); (a,b,d); (y,z,y); (w,y,z); (n,y,z)]
let grouped = partitionBySize 2 input
// => [[(a,b,c);(a,b,d)]; [(y,z,y);(w,y,z)]; [(n,y,z)]]
The obvious way to implement partitionBySize would be by adding the position to every tuple in the input list so that it becomes
[(0,a,b,c), (1,a,b,d), (2,y,z,y), (3,w,y,z), (4,n,y,z)]
and then use GroupBy with
xs |> Seq.ofList |> Seq.GroupBy (function | (i,_,_,_) -> i - (i % n))
However this solution doesn't look very elegant to me.
Is there a better way to implement this function (maybe with a built-in function)?
This seems to be a repeating pattern that's not captured by any function in the F# core library. When solving similar problems earlier, I defined a function Seq.groupWhen (see F# snippets) that turns a sequence into groups. A new group is started when the predicate holds.
You could solve the problem using Seq.groupWhen similarly to Seq.group (by starting a new group at even index). Unlike with Seq.group, this is efficient, because Seq.groupWhen iterates over the input sequence just once:
[3;3;2;4;1;2;8]
|> Seq.mapi (fun i v -> i, v) // Add indices to the values (as first tuple element)
|> Seq.groupWhen (fun (i, v) -> i%2 = 0) // Start new group after every 2nd element
|> Seq.map (Seq.map snd) // Remove indices from the values
Implementing the function directly using recursion is probably easier - the solution from John does exactly what you need - but if you wanted to see a more general approach then Seq.groupWhen may be interesting.
List.chunkBySize (hat tip: Scott Wlaschin) is now available and does exactly what you're talking about. It appears to be new with F# 4.0.
let grouped = [1..10] |> List.chunkBySize 3
// val grouped : int list list =
// [[1; 2; 3]; [4; 5; 6]; [7; 8; 9]; [10]]
Seq.chunkBySize and Array.chunkBySize are also now available.
Here's a tail-recursive function that traverses the list once.
let chunksOf n items =
let rec loop i acc items =
seq {
match i, items, acc with
//exit if chunk size is zero or input list is empty
| _, [], [] | 0, _, [] -> ()
//counter=0 so yield group and continue looping
| 0, _, _::_ -> yield List.rev acc; yield! loop n [] items
//decrement counter, add head to group, and loop through tail
| _, h::t, _ -> yield! loop (i-1) (h::acc) t
//reached the end of input list, yield accumulated elements
//handles items.Length % n <> 0
| _, [], _ -> yield List.rev acc
}
loop n [] items
Usage
[1; 2; 3; 4; 5]
|> chunksOf 2
|> Seq.toList //[[1; 2]; [3; 4]; [5]]
I like the elegance of Tomas' approach, but I benchmarked both our functions using an input list of 10 million elements. This one clocked in at 9 secs vs 22 for his. Of course, as he admitted, the most efficient method would probably involve arrays/loops.
What about a recursive approach? - only requires a single pass
let rec partitionBySize length inp dummy =
match inp with
|h::t ->
if dummy |> List.length < length then
partitionBySize length t (h::dummy)
else dummy::(partitionBySize length t (h::[]))
|[] -> dummy::[]
Then invoke it with partitionBySize 2 xs []
let partitionBySize size xs =
let sq = ref (seq xs)
seq {
while (Seq.length !sq >= size) do
yield Seq.take size !sq
sq := Seq.skip size !sq
if not (Seq.isEmpty !sq) then yield !sq
}
// result to list, if you want
|> Seq.map (Seq.toList)
|> Seq.toList
UPDATE
let partitionBySize size (sq:seq<_>) =
seq {
let e = sq.GetEnumerator()
let empty = ref true;
while !empty do
yield seq { for i = 1 to size do
empty := e.MoveNext()
if !empty then yield e.Current
}
}
array slice version:
let partitionBySize size xs =
let xa = Array.ofList xs
let len = xa.Length
[
for i in 0..size..(len-1) do
yield ( if i + size >= len then xa.[i..] else xa.[i..(i+size-1)] ) |> Array.toList
]
Well, I was late for the party. The code below is a tail-recursive version using high-order functions on List:
let partitionBySize size xs =
let i = size - (List.length xs - 1) % size
let xss, _, _ =
List.foldBack( fun x (acc, ls, j) ->
if j = size then ((x::ls)::acc, [], 1)
else (acc, x::ls, j+1)
) xs ([], [], i)
xss
I did the same benchmark as Daniel did. This function is efficient while it is 2x faster than his approach on my machine. I also compared it with an array/loop version, they are comparable in terms of performance.
Moreover, unlike John's answer, this version preserves order of elements in inner lists.

Combine Lists with Same Heads in a 2D List (OCaml)

I'm working with a list of lists in OCaml, and I'm trying to write a function that combines all of the lists that share the same head. This is what I have so far, and I make use of the List.hd built-in function, but not surprisingly, I'm getting the failure "hd" error:
let rec combineSameHead list nlist = match list with
| [] -> []#nlist
| h::t -> if List.hd h = List.hd (List.hd t)
then combineSameHead t nlist#uniq(h#(List.hd t))
else combineSameHead t nlist#h;;
So for example, if I have this list:
[[Sentence; Quiet]; [Sentence; Grunt]; [Sentence; Shout]]
I want to combine it into:
[[Sentence; Quiet; Grunt; Shout]]
The function uniq I wrote just removes all duplicates within a list. Please let me know how I would go about completing this. Thanks in advance!
For one thing, I generally avoid functions like List.hd, as pattern maching is usually clearer and less error-prone. In this case, your if can be replaced with guarded patterns (a when clause after the pattern). I think what is happening to cause your error is that your code fails when t is []; guarded patterns help avoid this by making the cases more explicit. So, you can do (x::xs)::(y::ys)::t when x = y as a clause in your match expression to check that the heads of the first two elements of the list are the same. It's not uncommon in OCaml to have several successive patterns which are identical except for guards.
Further things: you don't need []#nlist - it's the same as just writing nlist.
Also, it looks like your nlist#h and similar expressions are trying to concatenate lists before passing them to the recursive call; in OCaml, however, function application binds more tightly than any operator, so it actually appends the result of the recursive call to h.
I don't, off-hand, have a correct version of the function. But I would start by writing it with guarded patterns, and then see how far that gets you in working it out.
Your intended operation has a simple recursive description: recursively process the tail of your list, then perform an "insert" operation with the head which looks for a list that begins with the same head and, if found, inserts all elements but the head, and otherwise appends it at the end. You can then reverse the result to get your intended list of list.
In OCaml, this algorithm would look like this:
let process list =
let rec insert (head,tail) = function
| [] -> head :: tail
| h :: t ->
match h with
| hh :: tt when hh = head -> (hh :: (tail # t)) :: t
| _ -> h :: insert (head,tail) t
in
let rec aux = function
| [] -> []
| [] :: t -> aux t
| (head :: tail) :: t -> insert (head,tail) (aux t)
in
List.rev (aux list)
Consider using a Map or a hash table to keep track of the heads and the elements found for each head. The nlist auxiliary list isn't very helpful if lists with the same heads aren't adjacent, as in this example:
# combineSameHead [["A"; "a0"; "a1"]; ["B"; "b0"]; ["A"; "a2"]]
- : list (list string) = [["A"; "a0"; "a1"; "a2"]; ["B"; "b0"]]
I probably would have done something along the lines of what antonakos suggested. It would totally avoid the O(n) cost of searching in a list. You may also find that using a StringSet.t StringMap.t be easier on further processing. Of course, readability is paramount, and I still find this hold under that criteria.
module OrderedString =
struct
type t = string
let compare = Pervasives.compare
end
module StringMap = Map.Make (OrderedString)
module StringSet = Set.Make (OrderedString)
let merge_same_heads lsts =
let add_single map = function
| hd::tl when StringMap.mem hd map ->
let set = StringMap.find hd map in
let set = List.fold_right StringSet.add tl set in
StringMap.add hd set map
| hd::tl ->
let set = List.fold_right StringSet.add tl StringSet.empty in
StringMap.add hd set map
| [] ->
map
in
let map = List.fold_left add_single StringMap.empty lsts in
StringMap.fold (fun k v acc-> (k::(StringSet.elements v))::acc) map []
You can do a lot just using the standard library:
(* compares the head of a list to a supplied value. Used to partition a lists of lists *)
let partPred x = function h::_ -> h = x
| _ -> false
let rec combineHeads = function [] -> []
| []::t -> combineHeads t (* skip empty lists *)
| (hh::_ as h)::t -> let r, l = List.partition (partPred hh) t in (* split into lists with the same head as the first, and lists with different heads *)
(List.fold_left (fun x y -> x # (List.tl y)) h r)::(combineHeads l) (* combine all the lists with the same head, then recurse on the remaining lists *)
combineHeads [[1;2;3];[1;4;5;];[2;3;4];[1];[1;5;7];[2;5];[3;4;6]];;
- : int list list = [[1; 2; 3; 4; 5; 5; 7]; [2; 3; 4; 5]; [3; 4; 6]]
This won't be fast (partition, fold_left and concat are all O(n)) however.