Flatten a list of tuples in Scala? - list

I would have thought that a list of tuples could easily be flattened:
scala> val p = "abcde".toList
p: List[Char] = List(a, b, c, d, e)
scala> val q = "pqrst".toList
q: List[Char] = List(p, q, r, s, t)
scala> val pq = p zip q
pq: List[(Char, Char)] = List((a,p), (b,q), (c,r), (d,s), (e,t))
scala> pq.flatten
But instead, this happens:
<console>:15: error: No implicit view available from (Char, Char) => scala.collection.GenTraversableOnce[B].
pq.flatten
^
I can get the job done with:
scala> (for (x <- pq) yield List(x._1, x._2)).flatten
res1: List[Char] = List(a, p, b, q, c, r, d, s, e, t)
But I'm not understanding the error message. And my alternative solution seems convoluted and inefficient.
What does that error message mean and why can't I simply flatten a List of tuples?

If the implicit conversion can't be found you can supply it explicitly.
pq.flatten {case (a,b) => List(a,b)}
If this is done multiple times throughout the code then you can save some boilerplate by making it implicit.
scala> import scala.language.implicitConversions
import scala.language.implicitConversions
scala> implicit def flatTup[T](t:(T,T)): List[T]= t match {case (a,b)=>List(a,b)}
flatTup: [T](t: (T, T))List[T]
scala> pq.flatten
res179: List[Char] = List(a, p, b, q, c, r, d, s, e, t)

jwvh's answer covers the "coding" solution to your problem perfectly well, so I am not going to go into any more detail about that. The only thing I wanted to add was clarifying why the solution that both you and jwvh found is needed.
As stated in the Scala library, Tuple2 (which (,) translates to) is:
A tuple of 2 elements; the canonical representation of a Product2.
And following up on that:
Product2 is a cartesian product of 2 components.
...which means that Tuple2[T1,T2] represents:
The set of all possible pairs of elements whose components are members of two sets (all elements in T1 and T2 respectively).
A List[T], on the other hand, represents an ordered collections of T elements.
What all this means practically is that there is no absolute way to translate any possible Tuple2[T1,T2] to a List[T], simply because T1 and T2 could be different. For example, take the following tuple:
val tuple = ("hi", 5)
How could such tuple be flattened? Should the 5 be made a String? Or maybe just flatten to a List[Any]? While both of these solutions could be used, they are working around the type system, so they are not encoded in the Tuple API by design.
All this comes down to the fact that there is no default implicit view for this case and you have to supply one yourself, as both jwvh and you already figured out.

We needed to do this recently. Allow me to explain the use case briefly before noting our solution.
Use case
Given a pool of items (which I'll call type T), we want to do an evaluation of each one against all others in the pool. The result of these comparisons is a Set of failed evaluations, which we represent as a tuple of the left item and the right item in said evaluation: (T, T).
Once these evaluations are complete, it becomes useful for us to flatten the Set[(T, T)] into another Set[T] that highlights all the items that have failed any comparisons.
Solution
Our solution for this was a fold:
val flattenedSet =
set.foldLeft(Set[T]())
{ case (acc, (x, y)) => acc + x + y }
This starts with an empty set (the initial parameter to foldLeft) as the accumulator.
Then, for each element in the consumed Set[(T, T)] (named set) here, the fold function is passed:
the last value of the accumulator (acc), and
the (T, T) tuple for that element, which the case deconstructs into x and y.
Our fold function then returns acc + x + y, which returns a set containing all the elements in the accumulator in addition to x and y. That result is passed to the next iteration as the accumulator—thus, it accumulates all the values inside each of the tuples.
Why not Lists?
I appreciated this solution in particular since it avoided creating intermediate Lists while doing the flattening—instead, it directly deconstructs each tuple while building the new Set[T].
We could also have changed our evaluation code to return List[T]s containing the left and right items in each failed evaluation—then flatten would Just Work™. But we thought the tuple more accurately represented what we were going for with the evaluation—specifically one item against another, rather than an open-ended type which could conceivably represent any number of items.

Related

SML - Using map to return combined results

I have these functions:
fun IsDivisible(t, t2) = if t mod t2 > 0 then true else false;
fun IsDivisibleFilter(ts, t) = List.filter(fn x => IsDivisible(x, t)) ts;
fun IsDivisibleMap(ts, ts2) = map(fn x => IsDivisibleFilter(ts, x)) ts2;
IsDivisibleMap - Takes two lists of ints, ts, and ts2, and returns a list containing those elements of ts that are indivisible by any elements in ts2.
E.g. IsDivisibleMap([10,11,12,13,14],[3,5,7]) should return [11,13].
The way I have it now it is returning a list of lists, where each list is the result for each number in ts2
E.g. IsDivisibleMap([10,11,12,13,14],[3,5,7]) is returning [10,11,13,14][11,12,13,14][10,11,12,13]
How can I return the result that I am looking for while still using map and filter wherever possible?
There are various problems here with terminology; I'd like to begin by addressing those.
First, the name IsDivisibleMap is not a good name for two reasons:
Based on the description of the function, it is a filter, not a map. That is, given an input list, it removes elements from that list which do not satisfy a predicate.
The elements produced by this function are indivisible by all elements of the second input.
Based on these considerations, I'd like to call the function IsIndivisibleFilter instead. I will also change the name IsDivisible to IsIndivisible.
Second, in your description of the function, you say that it should return a list containing those elements of ts that are indivisible by any elements in ts2. However, I think what you meant is: "return a list containing those elements of ts that are indivisible by all elements in ts2"
Now, back to the main problem. For each element of ts, we need to check that it is indivisible by all of the elements of ts2. There is a nice function called List.all which checks if all elements of a list satisfy some predicate. So to check a particular element t of ts, we could do:
fun IsIndivisibleByAll (t, ts2) =
List.all (fn t2 => IsIndivisible (t, t2)) ts2
Now we can implement the original function by filtering according to this predicate:
fun IsIndivisibleFilter (ts, ts2) =
List.filter (fn t => IsIndivisibleByAll (t, ts2)) ts
Finally, I'd like to mention that you could clean up this implementation quite a bit with proper currying. Here's how I would implement it:
fun IsIndivisible t t2 = (t mod t2 > 0)
fun IsIndivisibleByAll ts2 t = List.all (IsIndivisible t) ts2
fun IsIndivisibleFilter (ts, ts2) = List.filter (IsIndivisibleByAll ts2) ts

Understanding Prolog's empty lists

I am reading Bratko's Prolog: Programming for Artificial Intelligence. The easiest way for me to understand lists is visualising them as binary trees, which goes well. However, I am confused about the empty list []. It seems to me that it has two meanings.
When part of a list or enumeration, it is seen as an actual (empty) list element (because somewhere in the tree it is part of some Head), e.g. [a, []]
When it is the only item inside a Tail, it isn’t an element it literally is nothing, e.g. [a|[]]
My issue is that I do not see the logic behind 2. Why is it required for lists to have this possible ‘nothingness’ as a final tail? Simply because the trees have to be binary? Or is there another reason? (In other words, why is [] counted as an element in 1. but it isn't when it is in a Tail in 2?) Also, are there cases where the final (rightmost, deepest) final node of a tree is not ‘nothing’?
In other words, why is [] counted as an element in 1. but it isn't when it is in a Tail in 2?
Those are two different things. Lists in Prolog are (degenerate) binary trees, but also very much like a singly linked list in a language that has pointers, say C.
In C, you would have a struct with two members: the value, and a pointer to the next list element. Importantly, when the pointer to next points to a sentinel, this is the end of the list.
In Prolog, you have a functor with arity 2: ./2 that holds the value in the first argument, and the rest of the list in the second:
.(a, Rest)
The sentinel for a list in Prolog is the special []. This is not a list, it is the empty list! Traditionally, it is an atom, or a functor with arity 0, if you wish.
In your question:
[a, []] is actually .(a, .([], []))
[a|[]] is actually .(a, [])
which is why:
?- length([a,[]], N).
N = 2.
This is now a list with two elements, the first element is a, the second element is the empty list [].
?- [a|[]] = [a].
true.
This is a list with a single element, a. The [] at the tail just closes the list.
Question: what kind of list is .([], [])?
Also, are there cases where the final (rightmost, deepest) final node of a tree is not ‘nothing’?
Yes, you can leave a free variable there; then, you have a "hole" at the end of the list that you can fill later. Like this:
?- A = [a, a|Tail], % partial list with two 'a's and the Tail
B = [b,b], % proper list
Tail = B. % the tail of A is now B
A = [a, a, b, b], % we appended A and B without traversing A
Tail = B, B = [b, b].
You can also make circular lists, for example, a list with infinitely many x in it would be:
?- Xs = [x|Xs].
Xs = [x|Xs].
Is this useful? I don't know for sure. You could for example get a list that repeats a, b, c with a length of 7 like this:
?- ABCs = [a,b,c|ABCs], % a list that repeats "a, b, c" forever
length(L, 7), % a proper list of length 7
append(L, _, ABCs). % L is the first 7 elements of ABCs
ABCs = [a, b, c|ABCs],
L = [a, b, c, a, b, c, a].
In R at least many functions "recycle" shorter vectors, so this might be a valid use case.
See this answer for a discussion on difference lists, which is what A and Rest from the last example are usually called.
See this answer for implementation of a queue using difference lists.
Your confusion comes from the fact that lists are printed (and read) according to a special human-friendly format. Thus:
[a, b, c, d]
... is syntactic sugar for .(a, .(b, .(c, .(d, [])))).
The . predicate represents two values: the item stored in a list and a sublist. When [] is present in the data argument, it is printed as data.
In other words, this:
[[], []]
... is syntactic sugar for .([], .([], [])).
The last [] is not printed because in that context it does not need to. It is only used to mark the end of current list. Other [] are lists stored in the main list.
I understand that but I don't quite get why there is such a need for that final empty list.
The final empty list is a convention. It could be written empty or nil (like Lisp), but in Prolog this is denoted by the [] atom.
Note that in prolog, you can leave the sublist part uninstantiated, like in:
[a | T]
which is the same as:
.(a, T)
Those are known as difference lists.
Your understanding of 1. and 2. is correct -- where by "nothing" you mean, element-wise. Yes, an empty list has nothing (i.e. no elements) inside it.
The logic behind having a special sentinel value SENTINEL = [] to mark the end of a cons-cells chain, as in [1,2,3] = [1,2|[3]] = [1,2,3|SENTINEL] = .(1,.(2,.(3,SENTINEL))), as opposed to some ad-hoc encoding, like .(1,.(2,3)) = [1,2|3], is types consistency. We want the first field of a cons cell (or, in Prolog, the first argument of a . functored term) to always be treated as "a list's element", and the second -- as "a list". That's why [] in [1, []] counts as a list's element (as it appears as a 1st argument of a .-functored compound term), while the [] in [1 | []] does not (as it appears as a 2nd argument of such term).
Yes, the trees have to be binary -- i.e. the functor . as used to encode lists is binary -- and so what should we put there in the final node's tail field, that would signal to us that it is in fact the final node of the chain? It must be something, consistent and easily testable. And it must also represent the empty list, []. So it's only logical to use the representation of an empty list to represent the empty tail of a list.
And yes, having a non-[] final "tail" is perfectly valid, like in [1,2|3], which is a perfectly valid Prolog term -- it just isn't a representation of a list {1 2 3}, as understood by the rest of Prolog's built-ins.

How turn list of pair in list of int, where result int is sum of pair

I try to define function with the following protocol:
[(1,2), (6,5), (9,10)] -> [3, 11, 19]
Here is what I have now:
fun sum_pairs (l : (int * int) list) =
if null l
then []
else (#1 hd(l)) + (#2 hd(l))::sum_pairs(tl(l))
According to type checker I have some type mismatch, but I can't figure out where exactly I'm wrong.
This code runs in PolyML 5.2:
fun sum_pairs (l : (int * int) list) =
if null l
then []
else ((#1 (hd l)) + (#2 (hd l))) :: sum_pairs(tl l)
(* ------------^-------------^ *)
The difference from yours is subtle, but significant: (#1 hd(l)) is different from (#1 (hd l)); the former doesn't do what you think - it attempts to extract the first tuple field of hd, which is a function!
While we're at it, why don't we attempt to rewrite the function to make it a bit more idiomatic? For starters, we can eliminate the if expression and the clunky tuple extraction by matching on the argument in the function head, like so:
fun sum_pairs [] = []
| sum_pairs ((a, b)::rest) = (a + b)::sum_pairs(rest)
We've split the function into two clauses, the first one matching the empty list (the recursive base case), and the second one matching a nonempty list. As you can see, this significantly simplified the function and, in my opinion, made it considerably easier to read.
As it turns out, applying a function to the elements of a list to generate a new list is an incredibly common pattern. The basis library provides a builtin function called map to aid us in this task:
fun sum_pairs l = map (fn (a, b) => a + b) l
Here I'm using an anonymous function to add the pairs together. But we can do even better! By exploiting currying we can simply define the function as:
val sum_pairs = map (fn (a, b) => a + b)
The function map is curried so that applying it to a function returns a new function that accepts a list - in this case, a list of integer pairs.
But wait a minute! It looks like this anonymous function is just applying the addition operator to its arguments! Indeed it is. Let's get rid of that too:
val sum_pairs = map op+
Here, op+ denotes a builtin function that applies the addition operator, much like our function literal (above) did.
Edit: Answers to the follow-up questions:
What about arguments types. It looks like you've completely eliminate argument list in the function definition (header). Is it true or I've missed something?
Usually the compiler is able to infer the types from context. For instance, given the following function:
fun add (a, b) = a + b
The compiler can easily infer the type int * int -> int, as the arguments are involved in an addition (if you want real, you have to say so).
Could you explain what is happening here sum_pairs ((a, b)::rest) = (a + b)::sum_pairs(rest). Sorry for may be dummy question, but I just want to fully understand it. Especially what = means in this context and what order of evaluation of this expression?
Here we're defining a function in two clauses. The first clause, sum_pairs [] = [], matches an empty list and returns an empty list. The second one, sum_pairs ((a, b)::rest) = ..., matches a list beginning with a pair. When you're new to functional programming, this might look like magic. But to illustrate what's going on, we could rewrite the clausal definition using case, as follows:
fun sum_pairs l =
case l of
[] => []
| ((a, b)::rest) => (a + b)::sum_pairs(rest)
The clauses will be tried in order, until one matches. If no clause matches, a Match expression is raised. For example, if you omitted the first clause, the function would always fail because l will eventually be the empty list (either it's empty from the beginning, or we've recursed all the way to the end).
As for the equals sign, it means the same thing as in any other function definition. It separates the arguments of the function from the function body. As for evaluation order, the most important observation is that sum_pairs(rest) must happen before the cons (::), so the function is not tail recursive.

Adding elements to an immutable list in Scala

In Scala the way you add elements to an immutable list as follows:
val l = 1 :: 2 :: Nil
l: List[Int] = List(1, 2)
What this means is you first create a Nil (Empty) List, and to that you add 2 and then 1. i.e. These operations are right-associated. So, effectively, it can be re-written in a clearer way, like so:
val l = (1 :: (2 :: Nil))
l: List[Int] = List(1, 2)
The question is, if List is supposed to preserve the order of insertion, and if 2 is added first to an empty list and then 1 is added, then why is the answer not l: List[Int] = List(2, 1) ??
This is because elements are prepended: first 2 then 1.
From definition of cons method:
def ::[B >: A] (x: B): List[B] =
new scala.collection.immutable.::(x, this)
here you can see that each time new instance of case class scala.collection.immutable.:: is created:
case class ::[B](val head: B, var tail: List[B]) extends List[B]
you just use your new element as a head for new list and your whole previous list as its tail.
Also prepend operation for immutable List takes constant time O(1), append is linear O(n) (from Scala docs).
It's just convention. Lists are basically stacks. It's most efficient to access or modify the most-recently added items. You could just as well consider the head of the list to be the final item ordinally, in which case, your suggested notation would be appropriate.
I would speculate that the reason for the convention is that we don't typically put much care into how a list was constructed, but we do often want to consider the first item accessed to be the initial item in the ordering, and so the notation reflects that.

Difficulty thinking of properties for FsCheck

I've managed to get xUnit working on my little sample assembly. Now I want to see if I can grok FsCheck too. My problem is that I'm stumped when it comes to defining test properties for my functions.
Maybe I've just not got a good sample set of functions, but what would be good test properties for these functions, for example?
//transforms [1;2;3;4] into [(1,2);(3,4)]
pairs : 'a list -> ('a * 'a) list //'
//splits list into list of lists when predicate returns
// true for adjacent elements
splitOn : ('a -> 'a -> bool) -> 'a list -> 'a list list
//returns true if snd is bigger
sndBigger : ('a * 'a) -> bool (requires comparison)
There are already plenty of specific answers, so I'll try to give some general answers which might give you some ideas.
Inductive properties for recursive functions. For simple functions, this amounts probably to re-implementing the recursion. However, keep it simple: while the actual implementation more often than not evolves (e.g. it becomes tail-recursive, you add memoization,...) keep the property straightforward. The ==> property combinator usually comes in handy here. Your pairs function might make a good example.
Properties that hold over several functions in a module or type. This is usually the case when checking abstract data types. For example: adding an element to an array means that the array contains that element. This checks the consistency of Array.add and Array.contains.
Round trips: this is good for conversions (e.g. parsing, serialization) - generate an arbitrary representation, serialize it, deserialize it, check that it equals the original.
You may be able to do this with splitOn and concat.
General properties as sanity checks. Look for generally known properties that may hold - things like commutativity, associativity, idempotence (applying something twice does not change the result), reflexivity, etc. The idea here is more to exercise the function a bit - see if it does anything really weird.
As a general piece of advice, try not to make too big a deal out of it. For sndBigger, a good property would be:
let ``should return true if and only if snd is bigger`` (a:int) (b:int) =
sndBigger (a,b) = b > a
And that is probably exactly the implementation. Don't worry about it - sometimes a simple, old fashioned unit test is just what you need. No guilt necessary! :)
Maybe this link (by the Pex team) also gives some ideas.
I'll start with sndBigger - it is a very simple function, but you can write some properties that should hold about it. For example, what happens when you reverse the values in the tuple:
// Reversing values of the tuple negates the result
let swap (a, b) = (b, a)
let prop_sndBiggerSwap x =
sndBigger x = not (sndBigger (swap x))
// If two elements of the tuple are same, it should give 'false'
let prop_sndBiggerEq a =
sndBigger (a, a) = false
EDIT: This rule prop_sndBiggerSwap doesn't always hold (see comment by kvb). However the following should be correct:
// Reversing values of the tuple negates the result
let prop_sndBiggerSwap a b =
if a <> b then
let x = (a, b)
sndBigger x = not (sndBigger (swap x))
Regarding the pairs function, kvb already posted some good ideas. In addition, you could check that turning the transformed list back into a list of elements returns the original list (you'll need to handle the case when the input list is odd - depending on what the pairs function should do in this case):
let prop_pairsEq (x:_ list) =
if (x.Length%2 = 0) then
x |> pairs |> List.collect (fun (a, b) -> [a; b]) = x
else true
For splitOn, we can test similar thing - if you concatenate all the returned lists, it should give the original list (this doesn't verify the splitting behavior, but it is a good thing to start with - it at least guarantees that no elements will be lost).
let prop_splitOnEq f x =
x |> splitOn f |> List.concat = x
I'm not sure if FsCheck can handle this though (!) because the property takes a function as an argument (so it would need to generate "random functions"). If this doesn't work, you'll need to provide a couple of more specific properties with some handwritten function f. Next, implementing the check that f returns true for all adjacent pairs in the splitted lists (as kvb suggests) isn't actually that difficult:
let prop_splitOnAdjacentTrue f x =
x |> splitOn f
|> List.forall (fun l ->
l |> Seq.pairwise
|> Seq.forall (fun (a, b) -> f a b))
Probably the only last thing that you could check is that f returns false when you give it the last element from one list and the first element from the next list. The following isn't fully complete, but it shows the way to go:
let prop_splitOnOtherFalse f x =
x |> splitOn f
|> Seq.pairwise
|> Seq.forall (fun (a, b) -> lastElement a = firstElement b)
The last sample also shows that you should check whether the splitOn function can return an empty list as part of the returned list of results (because in that case, you couldn't find first/last element).
For some code (e.g. sndBigger), the implementation is so simple that any property will be at least as complex as the original code, so testing via FsCheck may not make sense. However, for the other two functions here are some things that you could check:
pairs
What's expected when the original length is not divisible by two? You could check for throwing an exception if that's the correct behavior.
List.map fst (pairs x) = evenEntries x and List.map snd (pairs x) = oddEntries x for simple functions evenEntries and oddEntries which you can write.
splitOn
If I understand your description of how the function is supposed to work, then you could check conditions like "For every list in the result of splitOn f l, no two consecutive entries satisfy f" and "Taking lists (l1,l2) from splitOn f l pairwise, f (last l1) (first l2) holds". Unfortunately, the logic here will probably be comparable in complexity to the implementation itself.