Why can't I zip two lists of different lengths? - list

In F# if one tries to zip two lists of different lengths one gets an error:
List.zip [1..4] [1..3]
// System.ArgumentException: The lists had different lengths.
However, it is very easy to define an alternative definition of zip that accepts two argument lists of different lengths:
let rec zip' (xs: 'T list) (ys: 'T list) =
match (xs, ys) with
| ([], _) -> []
| (_, []) -> []
| ((z::zs), (w::ws)) -> (z, w) :: zip' zs ws
zip' [1..4] [1..3]
// val it : (int * int) list = [(1, 1); (2, 2); (3, 3)]
Is there a good reason not to use this alternative definition? Why wasn't it adopted in the first place?

This is, indeed, a bit confusing because there is a mismatch between List.zip (which does not allow this) and Seq.zip (which truncates the longer list).
I think that zip which works only on lists of equal length is a reasonable default behaviour - if it automatically truncated data, there is a realistic chance that you would accidentally lose some useful data when using zip which might cause subtle bugs.
The fact that Seq.zip truncates the longer list is only sensible because sequences are lazy and so, by design, when I define a sequence I expect that the consumer might not read it all.
In summary, I think the behaviour difference is based on "what is the most sensible thing to do for a given data structure", but I do think having two names for the operations would make a lot more sense than calling both zip (alas, that's pretty much impossible to change now).

Related

Haskell working with Maybe lists

I'm pretty new to haskell and don't get how to work with Maybe [a]. Normally I'm coding OOP (VB.NET) and in my free time I want to learn haskell (don't ask why ;) ).
So, what do I want to do? I want to read two files with numerical IDs and find out only the IDs that match in both files. Reading the files is not a big thing, it works wonderfully easy. Now, I get two lists of Maybe[Ids] (for a simple example, just think the IDs are Int). So the function I need looks like this
playWithMaybe :: (Maybe [a]) -> (Maybe [Int]) -> [Int]
Now I want to access list members as I used to like this
playWithMaybe (x:xs) (y:ys) = undefined
But unfortunately it's not allowed, GHC says for both lists
Couldn't match expected type ‘Maybe [Int]’ with actual type ‘[t0]’
So I played around a little bit but didn't find a way to access list members. Can someone help me out? A little bit of explanation would be a great thing!
To approach your problem from a different direction, I would argue you don't want a function that processes two Maybe [a]. Bear with me:
Fundamentally, the operation you want to do operates on two lists to give you a new list, eg,
yourFunction :: [a] -> [a] -> [a]
yourFunction a b = ...
That's fine, and you can and should write yourFunction as such. The fact that the data you have is Maybe [a] captures some additional, auxiliary information: the operation that created your input lists may have failed. The next step is to chain together yourFunction with the auxiliary information. This is exactly the purpose of do notation, to mix pure operations (like yourFunction) with contexts (the fact that the creation of one of your input lists may have failed):
playWithMaybe :: Maybe [a] -> Maybe [a] -> Maybe [a]
playWithMaybe maybeA maybeB =
do a <- maybeA -- if A is something, get it; otherwise, pass through Nothing
b <- maybeB -- if B is something, get it; otherwise, pass through Nothing
Just (yourFunction a b) -- both inputs succeeded! do the operation, and return the result
But then it turns out there are other kinds of contexts you might want to work with (a simple one, instead of Maybe that just captures "something bad happened", we can use Either to capture "something bad happened, and here is a description of what happened). Looking back at playWithMaybe, the "Maybe-ness" only shows up in one place, the Just in the last line. It turns out Haskell offers a generic function pure to wrap a pure value, like what we get from yourFunction, in a minimal context:
playWithMaybe' :: Maybe [a] -> Maybe [a] -> Maybe [a]
playWithMaybe' maybeA maybeB =
do a <- maybeA
b <- maybeB
pure (yourFunction a b)
But then Haskell also has a generic type to abstract the idea of a context, the Monad. This lets us make our function even more generic:
playWithMonad :: Monad m => m [a] -> m [a] -> m [a]
playWithMonad mA mB =
do a <- mA
b <- mB
pure (yourFunction a b)
Now we have something very generic, and it turns out it is so generic, it's already in the standard library! (This is getting quite subtle, so don't worry if it doesn't all make sense yet.)
import Control.Applicative
play :: Monad m => m [a] -> m [a] -> m [a]
play mA mB = liftA2 yourFunction mA mB
or even
import Control.Applicative
play' :: Monad m => m [a] -> m [a] -> m [a]
play' = liftA2 yourFunction
Why did I switch from Monad to Applicative suddenly? Applicative is similar to Monad, but even more generic, so given the choice, it is generally better to use Applicative if you can (similar to my choice to use pure instead of return earlier). For a more complete explanation, I strongly recommend Learn You a Haskell (http://learnyouahaskell.com/chapters), in particular chapters 11 and 12. Note- definitely read chapter 11 first! Monads only makes sense after you have a grasp on Functor and Applicative.
In general:
yourFunction Nothing Nothing = ...
yourFunction (Just xs) Nothing =
case xs of
[] -> ...
x':xs' -> ...
-- or separately:
yourFunction (Just []) Nothing = ...
yourFunction (Just (x:xs)) Nothing = ...
et cetera. Which cases need to be treated separately depends on the specific function. More likely you would combine functions working on Maybe with functions working on [].
If you want to "Just return an list with no elements" for Nothing, then you can write
maybeToList1 :: Maybe [a] -> [a]
maybeToList1 Nothing = []
maybeToList1 (Just xs) = xs
A better way to write the same function is maybeToList1 = maybe [] id (docs for maybe) or maybeToList1 = fromMaybe [], but since you are just starting you may want to come back to this one later.
Like others have said, [Int] and Maybe [Int] are not the same thing. Maybe [Int] includes the extra information that the list may or may not be there. You said that you read the Ids from files. Perhaps, the Maybe signifies whether the file existed or not, while the empty list signifies that the file did exist but contained no Ids.
If you want to work with the list, you first need to define what to do if there is no list. You can use this function to extract the list:
fromMaybe :: a -> Maybe a -> a
Maybe you want to treat having no list the same as having an empty list:
fromMaybe [] :: Maybe [a] -> [a]
Maybe you want to crash the entire program:
fromMaybe (error "Missing list") :: Maybe a -> a
There is also the more general maybe function, which you can use if the default value is not of the same type as what's contained in the Maybe:
maybe :: b -> (a -> b) -> Maybe a -> b
Both these functions are defined in the module Data.Maybe.
You can also just work with the lists as if they existed and worry about their existence later by using Applicative. You said you wanted to find the Ids common to both lists. You can do it like this:
maybeCommonIds :: Maybe [Int] -> Maybe [Int] -> Maybe [Int]
maybeCommonIds xs ys = intersect <$> xs <*> ys
intersect is defined in Data.List. Using maybeCommonIds will result in Maybe [Int]. The list contained inside will hold the common ids, but if either of the two lists did not exist, there is not list of common ids. In your case, this might be the same as having no common ids. In that case, you might want to pass the result to fromMaybe [], to get back the list.

Pack consecutive duplicates of list elements into sublists in Ocaml

I found this problem in the website 99 problems in ocaml. After some thinking I solved it by breaking the problem into a few smaller subproblems. Here is my code:
let rec frequency x l=
match l with
|[]-> 0
|h::t-> if x=[h] then 1+(frequency x t)
else frequency x t
;;
let rec expand x n=
match n with
|0->[]
|1-> x
|_-> (expand x (n-1)) # x
;;
let rec deduct a b=
match b with
|[]-> []
|h::t -> if a=[h] then (deduct a t)
else [h]# (deduct a t)
;;
let rec pack l=
match l with
|[]-> []
|h::t -> [(expand [h] (frequency [h] l))]# (pack (deduct [h] t))
;;
It is rather clear that this implementation is overkill, as I have to count the frequency of every element in the list, expand this and remove the identical elements from the list, then repeat the procedure. The algorithm complexity is about O(N*(N+N+N))=O(N^2) and would not work with large lists, even though it achieved the required purpose. I tried to read the official solution on the website, which says:
# let pack list =
let rec aux current acc = function
| [] -> [] (* Can only be reached if original list is empty *)
| [x] -> (x :: current) :: acc
| a :: (b :: _ as t) ->
if a = b then aux (a :: current) acc t
else aux [] ((a :: current) :: acc) t in
List.rev (aux [] [] list);;
val pack : 'a list -> 'a list list = <fun>
the code should be better as it is more concise and does the same thing. But I am confused with the use of "aux current acc" in the inside. It seems to me that the author has created a new function inside of the "pack" function and after some elaborate procedure was able to get the desired result using List.rev which reverses the list. What I do not understand is:
1) What is the point of using this, which makes the code very hard to read on first sight?
2) What is the benefit of using an accumulator and an auxiliary function inside of another function which takes 3 inputs? Did the author implicitly used tail recursion or something?
3) Is there anyway to modify the program so that it can pack all duplicates like my program?
These are questions mostly of opinion rather than fact.
1) Your code is far harder to understand, in my opinion.
2a) It's very common to use auxiliary functions in OCaml and other functional languages. You should think of it more like nested curly braces in a C-like language rather than as something strange.
2b) Yes, the code is using tail recursion, which yours doesn't. You might try giving your code a list of (say) 200,000 distinct elements. Then try the same with the official solution. You might try determining the longest list of distinct values your code can handle, then try timing the two different implementations for that length.
2c) In order to write a tail-recursive function, it's sometimes necessary to reverse the result at the end. This just adds a linear cost, which is often not enough to notice.
3) I suspect your code doesn't solve the problem as given. If you're only supposed to compress adjacent elements, your code doesn't do this. If you wanted to do what your code does with the official solution you could sort the list beforehand. Or you could use a map or hashtable to keep counts.
Generally speaking, the official solution is far better than yours in many ways. Again, you're asking for an opinion and this is mine.
Update
The official solution uses an auxiliary function named aux that takes three parameters: the currently accumulated sublist (some number of repetitions of the same value), the currently accumulated result (in reverse order), and the remaining input to be processed.
The invariant is that all the values in the first parameter (named current) are the same as the head value of the unprocessed list. Initially this is true because current is empty.
The function looks at the first two elements of the unprocessed list. If they're the same, it adds the first of them to the beginning of current and continues with the tail of the list (all but the first). If they're different, it wants to start accumulating a different value in current. It does this by adding current (with the one extra value added to the front) to the accumulated result, then continuing to process the tail with an empty value for current. Note that both of these maintain the invariant.

foldl vs foldr: which should I prefer?

I remember that when I showed some code that I wrote to my professor he remarked, offhand, that
It rarely matters, but it's worth noting that fold* is a little bit more efficient than fold*' in SML/NJ, so you should prefer it over fold* when possible.
I forget whether fold* was foldr or foldl. I know that this is one of those micro-optimization things that probably doesn't make a big difference in practice, but I'd like to be in the habit of using the more efficient one when I have the choice.
Which is which? My guess is that this is SML/NJ specific and that MLton will be smart enough to optimize both down to the same machine code, but answers for other compilers are good to know.
foldl is tail-recursive, while foldr is not. Although you can do foldr in a tail-recursive way by reversing the list (which is tail recursive), and then doing foldl.
This is only going to matter if you are folding over huge lists.
Prefer the one that converts the given input into the intended output.
If both produce the same output such as with a sum, and if dealing with a list, folding from the left will be more efficient because the fold can begin with head element, while folding from the right will first require walking the list to find the last element before calculating the first intermediate result.
With arrays and similar random access data structures, there's probably not going to be much difference.
A compiler optimization that always chose the better of left and right would require the compiler to determine that left and right were equivalent over all possible inputs. Since foldl and foldr take a functions as arguments, this is a bit of a tall order.
I'm going to keep the accepted answer here, but I had the chance to speak to my professor, and his reply was actually the opposite, because I forgot a part of my question. The code in question was building up a list, and he said:
Prefer foldr over foldl when possible, because it saves you a reverse at the end in cases where you're building up a list by appending elements during the fold.
As in, for a trivial example:
- val ls = [1, 2, 3];
val ls = [1,2,3] : int list
- val acc = (fn (x, xs) => x::xs);
val acc = fn : 'a * 'a list -> 'a list
- foldl acc [] ls;
val it = [3,2,1] : int list
- foldr acc [] ls;
val it = [1,2,3] : int list
The O(n) save of a reverse is probably more important than the other differences between foldl and foldr mentioned in answers to this question.

ocaml sum of two lists with different length

I tried to add two lists with different lengths using this:
let sumList(a,b) = match a,b with
|[],_ -> []
|(x::xs,y::ys)-> (x + y)::diffList(xs,ys)
It returns Unbound value sumList. Is it possible to do this as in Haskell: zipWith(+) a b.
Possibly the actual error is "Unbound value diffList", since you don't define diffList in your code.
If this is a transcription error, then the next problem is that you need to declare sumList as a recursive function: let rec sumList (a, b) = ....
Your pattern match is not exhaustive. It fails if the first list is longer.
The Haskell zipWith is friendlier than the OCaml List.map2, which requires the lists to be the same length. I don't think there's anything so friendly in the OCaml standard library.

Why is Haskell giving "ambiguous type variable" error?

A past paper problem asked me; to define a function p :: [a] -> [a] that swaps every two items in a list. Your function should swap the first with the second item, the third
with the fourth, and so on. define one by list comprehension another by recursion.
Well this is what I came up with:
import Test.QuickCheck
p :: [a] -> [a]
p [] = []
p xs = concat [ [y,x] | ((x,y),i) <- zip (zip xs (tail xs)) [1..], odd i]
q :: [a] -> [a]
q [] = []
q (x:y:zs) | even (length zs) = [y,x] ++ q zs
| otherwise = error "The list provided is not of even length"
prop_2 xs = even (length xs) ==> p xs == q xs
check2 = quickCheck prop_2
The functions work fine, but I wanted to check if the two are identical, so I put the quickCheck below; but this gives me an error for some reason saying
"ambiguous type variable [a0] arising from the use of prop_2"
I just don't understand what's wrong here, I looks perfectly sensible to me...
what exactly is Haskell complaining??
Let's start by commenting out check2 and asking GHCi what the type of prop_2 is:
*Main> :t prop_2
prop_2 :: Eq a => [a] -> Property
Ok, so the way you've written prop_2, it works for lists of any element type that is in the equality class.
Now, you want to pass prop_2 to the quickCheck function. Let's look at the type of quickCheck next:
*Main> :t quickCheck
quickCheck :: Testable prop => prop -> IO ()
This function actually has an immensely general type. It works on anything that's in the Testable class. So how does this Testable class work? There are a couple of base instances here, for example:
instance Testable Bool
instance Testable Property -- this is a simplification, but it's more or less true
These instances are defined in the QuickCheck library and tell you that you can quick-check
constant Booleans as well as elements of type Property.
Now, testing properties that do not depend on any inputs isn't particularly interesting. The interesting instance is this one:
instance (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)
What this says is that if you know how to generate random values of a particular type (Arbitrary a) and how to show values of that type (Show a), then you can also test functions from that type a to an already testable type prop.
Why? Because that's how QuickCheck operates. In such a situation, QuickCheck will consult the Arbitrary instance in order to come up with random test cases of type a, apply the function to each of them, and check if the outcome is positive. If any of the tests fails, it will print a message informing you about the test failure, and it will print the test case (that's why there's a Show requirement, too).
Now, in our situation this means that we should be able to quick-check prop_2: it is a function that results in a Property. The important thing is that the function argument (of type [a] as long as Eq a holds) is a member of class Arbitrary and a member of class Show.
Here we arrive at the source of the error. The information available is not sufficient to make that conclusion. As I said in the beginning, prop_2 works for lists of any element type that admits equality. There's no built-in rule that says that all these type are in Arbitrary and Show. But even if there was, what kind of lists should QuickCheck generate? Should it generate lists of Booleans, lists of unit type, lists of functions, lists of characters, lists of integers? So many options, and the choice of element type may well affect whether you find a bug or not. (Consider that GHC would pick the unit type () with just one element. Then your property would hold for any two functions p and q that preserve the input lists' length, regardless of whether they have your desired swapping property or not.)
This is why you need to provide extra type information to GHC so that it can resolve which element type to use for the list. Doing so is simple enough. You can either annotate prop_2 itself:
prop_2 :: [Integer] -> Property
Or, if you don't want that (because you might want to run the tests on different kinds of lists without reimplementing prop_2), you can add a type annotation when calling quickCheck:
check2 = quickCheck (prop_2 :: [Integer] -> Property)
Now the code compiles, and we can run check2:
Main*> check2
+++ OK, passed 100 tests.