Problem with a list in the form of [(key, [..]) ; ...] - ocaml

I'm trying to learn OCaml since I'm new to the language and I stumbled across this problem where I can't seem to find a way to see, in a function where I need to merge 2 kinds of these lists, if there is already an element with a key, and if so how to join the elements that come after. Would appreciate any guidance.
For example if I have:
l1: [(k, [e]); (ka, [])]
l2: [(k, [f; g])]
How can I end up with:
fl: [(k, [e; f; g]); (ka, [])]
Basically, how can I filter the key k from both lists while making their elements combine.

There are functions in the standard OCaml library for dealing with lists of pairs where the first element of each pair is a key. You will find them described here: https://ocaml.org/releases/4.12/api/List.html under Association lists.
I will repeat what #ivg says. This is not how you want to solve your problem if you have more than just a few pairs to work with.

First of all, using lists as mappings is a bad idea. It is much better to use dedicated data structures such as maps and hash tables.
Answering your question directly, you can concatenate two lists using the (#) operator, e.g.,
# [1;2;3] # [4;5;6];;
- : int list = [1; 2; 3; 4; 5; 6]
If you don't want repetitive elements when you merge then, and I feel like I repeat myself, it is bad to use lists for sets, it is better to use dedicated data structures such as sets and hash sets. But if you want to continue, then you can merge two lists without repetitions by checking if an element is already in the list before prepending to it. Easy to implement but hard to run, in a sense that it takes quadratic time to merge two lists this way.
If you still want to stick with the list of pairs, then you will find that the List.assoc function is useful, as it finds a value by key. The overall algorithm would be, given two lists, xs and ys, fold over elements of ys using xs as the initial value acc, and for each (ky,y) in ys if ky is already in acc, find the associated with ky value x and remove (List.remove_assoc) it, then merge x and y and prepend the merged value with the acc list, otherwise (if it is not in acc) just prepend (ky,y) to acc`. Note that this algorithm doesn't preserve order, so if it matters you need something more complex. Also, if your keys are sorted you can make it a little bit more efficient and easier to implement.

I guess you're doing this to practice with list.
What I would do is store the already found keys in an accumulator
let mergePairs yourList =
let rec aux accKeys = function
| [] -> []
| x :: xs -> let k,v = x in if (* k in accKeys *) then aux accKeys xs (*we suppress already
existing keys*)
else (k, v # (* get all the list of the other pairs with key = k in xs*))
:: aux (k::accKeys) xs
in aux [] yourList;;

Related

Sort a list of tuples by their second element without higher order functions or recursion

I have a list of (String, Int) pairs and am struggling to figure out how to sort the list by the snd field (Int). I am not allowed to use higher order functions or recursion which makes it more difficult.
For example, I have
[("aaaaa", 5),("bghdfe", 6),("dddr",4)]
and would like to sort it into
[("dddr",4),("aaaaa", 5),("bghdfe", 6)].
Edit:
I understand that the sort may not be possible without higher order functions, what I really need is to find the element with the minimum length (the snd field), so is there a way to find the minimum number and then take the fst field of the list's element at that index? If that way works better I am unsure about how to find the index of that minimum number however.
The task seems to be impossible, since you can't write a sort without recursion in Haskell. This means, you must use sort, which is usually something like sortBy compare and thus you have it.
But if you are allowed to use sort you can do it by first reversing all tuples, sorting the resulting list and reversing the tuples in the result again. This should be possible to do in a few nested list comprehension, so technically no higher order functions are needed.
After you have given more details, I'd do
homework list = snd (minimum [ (s,f) | (f,s) <- list ])
Without higher order functions or recursion all you have left, technically, is list comprehensions. Thus we define
-- sortBy (comparing snd) >>> take 1 >>> listToMaybe >>> fmap fst
-- ~= minimumBy (comparing snd) >>> fst
foo :: Ord b => [(a,b)] -> Maybe a
foo xs = case [ a | (a,b) <- xs
, null [ () | (_c,d) <- xs, d < b]]
of (a:_) -> Just a
[] -> Nothing

How recursion met the base case Haskell

I am trying to understand this piece of code which returns the all possible combinations of [a] passed to it:
-- Infinite list of all combinations for a given value domain
allCombinations :: [a] -> [[a]]
allCombinations [] = [[]]
allCombinations values = [] : concatMap (\w -> map (:w) values)
(allCombinations values)
Here i tried this sample input:
ghci> take 7 (allCombinations [True,False])
[[],[True],[False],[True,True],[False,True],[True,False],[False,False]]
Here it doesn't seems understandable to me which is that how the recursion will eventually stops and will return [ [ ] ], because allCombinations function certainly doesn't have any pointer which moves through the list, on each call and when it meets the base case [ ] it returns [ [ ] ]. According to me It will call allCombinations function infinite and will never stop on its own. Or may be i am missing something?
On the other hand, take only returns the first 7 elements from the final list after all calculation is carried out by going back after completing recursive calls. So actually how recursion met the base case here?
Secondly what is the purpose of concatMap here, here we could also use Map function here just to apply function to the list and inside function we could arrange the list? What is actually concatMap doing here. From definition it concatMap tells us it first map the function then concatenate the lists where as i see we are already doing that inside the function here?
Any valuable input would be appreciated?
Short answer: it will never meet the base case.
However, it does not need to. The base case is most often needed to stop a recursion, however here you want to return an infinite list, so no need to stop it.
On the other hand, this function would break if you try to take more than 1 element of allCombination [] -- have a look at #robin's answer to understand better why. That is the only reason you see a base case here.
The way the main function works is that it starts with an empty list, and then append at the beginning each element in the argument list. (:w) does that recursively. However, this lambda alone would return an infinitely nested list. I.e: [],[[True],[False]],[[[True,True],[True,False] etc. Concatmap removes the outer list at each step, and as it is called recursively this only returns one list of lists at the end. This can be a complicated concept to grasp so look for other example of the use of concatMap and try to understand how they work and why map alone wouldn't be enough.
This obviously only works because of Haskell lazy evaluation. Similarly, you know in a foldr you need to pass it the base case, however when your function is supposed to only take infinite lists, you can have undefined as the base case to make it more clear that finite lists should not be used. For example, foldr f undefined could be used instead of foldr f []
#Lorenzo has already explained the key point - that the recursion in fact never ends, and therefore this generates an infinite list, which you can still take any finite number of elements from because of Haskell's laziness. But I think it will be helpful to give a bit more detail about this particular function and how it works.
Firstly, the [] : at the start of the definition tells you that the first element will always be []. That of course is the one and only way to make a 0-element list from elements of values. The rest of the list is concatMap (\w -> map (:w) values) (allCombinations values).
concatMap f is as you observe simply the composition concat . (map f): it applies the given function to every element of the list, and concatenates the results together. Here the function (\w -> map (:w) values) takes a list, and produces the list of lists given by prepending each element of values to that list. For example, if values == [1,2], then:
(\w -> map (:w) values) [1,2] == [[1,1,2], [2,1,2]]
if we map that function over a list of lists, such as
[[], [1], [2]]
then we get (still with values as [1,2]):
[[[1], [2]], [[1,1], [2,1]], [[1,2], [2,2]]]
That is of course a list of lists of lists - but then the concat part of concatMap comes to our rescue, flattening the outermost layer, and resulting in a list of lists as follows:
[[1], [2], [1,1], [2,1], [1,2], [2,2]]
One thing that I hope you might have noticed about this is that the list of lists I started with was not arbitrary. [[], [1], [2]] is the list of all combinations of size 0 or 1 from the starting list [1,2]. This is in fact the first three elements of allCombinations [1,2].
Recall that all we know "for sure" when looking at the definition is that the first element of this list will be []. And the rest of the list is concatMap (\w -> map (:w) [1,2]) (allCombinations [1,2]). The next step is to expand the recursive part of this as [] : concatMap (\w -> map (:w) [1,2]) (allCombinations [1,2]). The outer concatMap
then can see that the head of the list it's mapping over is [] - producing a list starting [1], [2] and continuing with the results of appending 1 and then 2 to the other elements - whatever they are. But we've just seen that the next 2 elements are in fact [1] and [2]. We end up with
allCombinations [1,2] == [] : [1] : [2] : concatMap (\w -> map (:w) values) [1,2] (tail (allCombinations [1,2]))
(tail isn't strictly called in the evaluation process, it's done by pattern-matching instead - I'm trying to explain more by words than explicit plodding through equalities).
And looking at that we know the tail is [1] : [2] : concatMap .... The key point is that, at each stage of the process, we know for sure what the first few elements of the list are - and they happen to be all 0-element lists with values taken from values, followed by all 1-element lists with these values, then all 2-element lists, and so on. Once you've got started, the process must continue, because the function passed to concatMap ensures that we just get the lists obtained from taking every list generated so far, and appending each element of values to the front of them.
If you're still confused by this, look up how to compute the Fibonacci numbers in Haskell. The classic way to get an infinite list of all Fibonacci numbers is:
fib = 1 : 1 : zipWith (+) fib (tail fib)
This is a bit easier to understand that the allCombinations example, but relies on essentially the same thing - defining a list purely in terms of itself, but using lazy evaluation to progressively generate as much of the list as you want, according to a simple rule.
It is not a base case but a special case, and this is not recursion but corecursion,(*) which never stops.
Maybe the following re-formulation will be easier to follow:
allCombs :: [t] -> [[t]]
-- [1,2] -> [[]] ++ [1:[],2:[]] ++ [1:[1],2:[1],1:[2],2:[2]] ++ ...
allCombs vals = concat . iterate (cons vals) $ [[]]
where
cons :: [t] -> [[t]] -> [[t]]
cons vals combs = concat [ [v : comb | v <- vals]
| comb <- combs ]
-- iterate :: (a -> a ) -> a -> [a]
-- cons vals :: [[t]] -> [[t]]
-- iterate (cons vals) :: [[t]] -> [[[t]]]
-- concat :: [[ a ]] -> [ a ]
-- concat . iterate (cons vals) :: [[t]]
Looks different, does the same thing. Not just produces the same results, but actually is doing the same thing to produce them.(*) The concat is the same concat, you just need to tilt your head a little to see it.
This also shows why the concat is needed here. Each step = cons vals is producing a new batch of combinations, with length increasing by 1 on each step application, and concat glues them all together into one list of results.
The length of each batch is the previous batch length multiplied by n where n is the length of vals. This also shows the need to special case the vals == [] case i.e. the n == 0 case: 0*x == 0 and so the length of each new batch is 0 and so an attempt to get one more value from the results would never produce a result, i.e. enter an infinite loop. The function is said to become non-productive, at that point.
Incidentally, cons is almost the same as
== concat [ [v : comb | comb <- combs]
| v <- vals ]
== liftA2 (:) vals combs
liftA2 :: Applicative f => (a -> b -> r) -> f a -> f b -> f r
So if the internal order of each step results is unimportant to you (but see an important caveat at the post bottom) this can just be coded as
allCombsA :: [t] -> [[t]]
-- [1,2] -> [[]] ++ [1:[],2:[]] ++ [1:[1],1:[2],2:[1],2:[2]] ++ ...
allCombsA [] = [[]]
allCombsA vals = concat . iterate (liftA2 (:) vals) $ [[]]
(*) well actually, this refers to a bit modified version of it,
allCombsRes vals = res
where res = [] : concatMap (\w -> map (: w) vals)
res
-- or:
allCombsRes vals = fix $ ([] :) . concatMap (\w -> map (: w) vals)
-- where
-- fix g = x where x = g x -- in Data.Function
Or in pseudocode:
Produce a sequence of values `res` by
FIRST producing `[]`, AND THEN
from each produced value `w` in `res`,
produce a batch of new values `[v : w | v <- vals]`
and splice them into the output sequence
(by using `concat`)
So the res list is produced corecursively, starting from its starting point, [], producing next elements of it based on previous one(s) -- either in batches, as in iterate-based version, or one-by-one as here, taking the input via a back pointer into the results previously produced (taking its output as its input, as a saying goes -- which is a bit deceptive of course, as we take it at a slower pace than we're producing it, or otherwise the process would stop being productive, as was already mentioned above).
But. Sometimes it can be advantageous to produce the input via recursive calls, creating at run time a sequence of functions, each passing its output up the chain, to its caller. Still, the dataflow is upwards, unlike regular recursion which first goes downward towards the base case.
The advantage just mentioned has to do with memory retention. The corecursive allCombsRes as if keeps a back-pointer into the sequence that it itself is producing, and so the sequence can not be garbage-collected on the fly.
But the chain of the stream-producers implicitly created by your original version at run time means each of them can be garbage-collected on the fly as n = length vals new elements are produced from each downstream element, so the overall process becomes equivalent to just k = ceiling $ logBase n i nested loops each with O(1) space state, to produce the ith element of the sequence.
This is much much better than the O(n) memory requirement of the corecursive/value-recursive allCombsRes which in effect keeps a back pointer into its output at the i/n position. And in practice a logarithmic space requirement is most likely to be seen as a more or less O(1) space requirement.
This advantage only happens with the order of generation as in your version, i.e. as in cons vals, not liftA2 (:) vals which has to go back to the start of its input sequence combs (for each new v in vals) which thus must be preserved, so we can safely say that the formulation in your question is rather ingenious.
And if we're after a pointfree re-formulation -- as pointfree can at times be illuminating -- it is
allCombsY values = _Y $ ([] :) . concatMap (\w -> map (: w) values)
where
_Y g = g (_Y g) -- no-sharing fixpoint combinator
So the code is much easier understood in a fix-using formulation, and then we just switch fix with the semantically equivalent _Y, for efficiency, getting the (equivalent of the) original code from the question.
The above claims about space requirements behavior are easily tested. I haven't done so, yet.
See also:
Why does GHC make fix so confounding?
Sharing vs. non-sharing fixed-point combinator

Combination takeWhile, skipWhile

In F#, I find when I want to use takeWhile, I usually also want to use skipWhile, that is, take the list prefix that satisfies a predicate, and also remember the rest of the list for subsequent processing. I don't think there is a standard library function that does both, but I can write one easily enough.
My question is, what should this combination be called? It's obvious enough that there should be a standard name for it; what is it? Best I've thought of so far is split, which seems consistent with splitAt.
span is another name I've seen for this function. For example, in Haskell
This part of your question stood out to me (emphasis mine):
take the list prefix that satisfies a predicate, and also remember the rest of the list for subsequent processing
I am guessing that you want to recurse with the rest of the list and then apply this splitting function again. This is what I have wanted to do a few times before. Initially, I wrote the function that I think you are describing but after giving it more thought I realised that there might be a more general way to think about it and avoid the recursion completely, which usually makes code simpler. This is the function I came up with.
module List =
let groupAdjacentBy f xs =
let mutable prevKey, i = None, 0
xs
|> List.groupBy (fun x ->
let key = f x
if prevKey <> Some key then
i <- i + 1
prevKey <- Some key
(i, key))
|> List.map (fun ((_, k), v) -> (k, v))
let even x = x % 2 = 0
List.groupAdjacentBy even [1; 3; 2; 5; 4; 6]
// [(false, [1; 3]); (true, [2]); (false, [5]); (true, [4; 6])]
I found this one easier to name and more useful. Maybe it works for your current problem. If you don't need the group keys then you can get rid of them by adding |> List.map snd.
As much as I usually avoid mutation, using it here allowed me to use List.groupBy and avoid writing more code.
.slice could capture the intent of a contiguous range:
List.slice skipPredicate takePredicate

Appending lists in SML

I'm trying to add an int list list with another int list list using the append function, but I can't get it to work the way I want.
Say that I want to append [[1,2,3,4,5]] with [6,7] so that I get [[1,2,3,4,5,6,7]].
Here's my attempt: [1,2,3,4,5]::[]#[6,7]::[], but it just gives me the list I want to append as a list of its own instead of the two lists combined into one, like this: [[1,2,3,4,5],[6,7]].
How can I re-write the operation to make it return [[1,2,3,4,5,6,7]]?
Your question is too unspecific. You are dealing with nested lists. Do you want to append the second list to every inner list of the nested list, or only the first one? Your example doesn't tell.
For the former:
fun appendAll xss ys = List.map (fn xs => xs # ys) xss
For the latter:
fun appendHd [] ys = raise Empty
| appendHd (xs::xss) ys = (xs # ys)::xss
However, both of these should rarely be needed, and I somehow feel that you are trying to solve the wrong problem if you end up there.

Enumerating all pairs constructible from two lazy lists in OCaml

I am attempting to enumerate the set of all pairs made of elements from two lazy lists (first element from the first list, second element from the second list) in OCaml using the usual diagonalization idea. The idea is, in strict evaluation terms, something like
enum [0;1;2;...] [0;1;2;...] = [(0,0);(0,1);(1;0);(0;2);(1;1);(2;2);...]
My question is: how do you define this lazily?
I'll explain what I've thought so far, maybe it will be helpful for anyone trying to answer this. But if you know the answer already, you don't need to read any further. I may be going the wrong route.
I have defined lazy lists as
type 'a node_t =
| Nil
| Cons of 'a *'a t
and 'a t = ('a node_t) Lazy.t
Then I defined the function 'seq'
let seq m =
let rec seq_ n m max acc =
if n=max+1
then acc
else (seq_ (n+1) (m-1) max (lazy (Cons((n,m),acc))))
in seq_ 0 m m (lazy Nil)
which gives me a lazy list of pairs (x,y) such that x+y=m. This is what the diagonal idea is about. We start by enumerating all the pairs which sum 0, then all those which sum 1, then those which sum 2, etc.
Then I defined the function 'enum_pair'
let enum_pair () =
let rec enum_pair_ n = lazy (Cons(seq n,enum_pair_ (n+1)))
in enum_pair_ 0
which generates the infinite lazy list made up of: the lazy list of pairs which sum 0, concatenated with the lazy lists of pairs which sum 1, etc.
By now, it seems to me that I'm almost there. The problem now is: how do I get the actual pairs one by one?
It seems to me that I'd have to use some form of list concatenation (the lazy equivalent of #). But that is not efficient because, in my representation of lazy lists, concatenating two lists has complexity O(n^2) where n is the size of the first list. Should I go for a different representations of lazy lists? Or is there another way (not using 'seq' and 'enum_pair' above) which doesn't require list concatenation?
Any help would be really appreciated.
Thanks a lot,
Surikator.
In Haskell you can write:
concatMap (\l -> zip l (reverse l)) $ inits [0..]
First we generate all initial segments of [0..]:
> take 5 $ inits [0..]
[[],[0],[0,1],[0,1,2],[0,1,2,3]]
Taking one of the segments an zipping it with its reverse gives us one diagonal:
> (\l -> zip l (reverse l)) [0..4]
[(0,4),(1,3),(2,2),(3,1),(4,0)]
So mapping the zip will give all diagonals:
> take 10 $ concatMap (\l -> zip l (reverse l)) $ zipWith take [1..] (repeat [0..])
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
In the mean time I've managed to get somewhere but, although it solves the problem, the solution is not very elegant. After defining the functions defined in my initial question, I can define the additional function 'enum_pair_cat' as
let rec enum_pair_cat ls =
lazy(
match Lazy.force ls with
| Nil -> Nil
| Cons(h,t) -> match Lazy.force h with
| Nil -> Lazy.force (enum_pair_cat t)
| Cons (h2,t2) -> Cons (h2,enum_pair_cat (lazy (Cons (t2,t))))
)
This new function achieves the desired behavior. By doing
enum_pair_cat (enum_pair ())
we get a lazy list which has the pairs enumerated as described. So, this solves the problem.
However, I am not entirely satisfied with this because this solution doesn't scale up to higher enumerations (say, of three lazy lists). If you have any ideas on how to solve the general problem of enumerating all n-tuples taken from n lazy lists, let me know!
Thanks,
Surikator.