Haskell - How to get index of elem in list? - list

I want to get an index of elem in list? Not Maybe Int, only int.
>elemIndex 'f' "BarFoof"
>Just 6
But need 6

You can use fromMaybe :: a -> Maybe a -> a to unwrap the value from a Just, and furtermore add a default value in case it is a Nothing.
So you can implement a function:
import Data.Maybe(fromMaybe)
elemIndex' :: Eq a => a -> [a] -> Int
elemIndex' x = fromMaybe (-1) . elemIndex x
Here it will thus return -1 in case the element can not be found. For example:
Prelude Data.Maybe Data.List> elemIndex' 'f' "BarFoof"
6
Prelude Data.Maybe Data.List> elemIndex' 'q' "BarFoof"
-1
That being said, in Haskell a Maybe is often used to denote a computation that might "fail", such that if you post-process the result, you will take the Nothing (element not found in the list) into account as well.

As Willem Van Onsem says, you can use fromMaybe. However, there are other options which may be better suited to your particular use case. (It's not clear from your code what your use case is, though, so I'm just going to list some options.)
If you know for certain that the value you're looking for is going to be somewhere in the list, usually because you've just pulled it out of the same list using some sort of fold, you can use fromJust. fromJust is similar to fromMaybe, except it doesn't take the additional first argument as a fallback value and instead will throw an error if the value isn't in the list. People will argue about whether or not you should use functions that can throw an error (these functions are called "partial," because they only handle part of their input domain) because they can really cause headaches down the line when assumptions turn out to be wrong, but in carefully controlled circumstances they can be useful.
fromJust (Just 10)
> 10
fromJust Nothing
> *** Exception: Maybe.fromJust: Nothing
Depending on the semantics of your program, it may also be more idiomatic to change your function signature to return a Maybe whatever instead of a whatever. In this case, you should look into using things like fmap, >>=, and <$> to pass your value around without ever needing to unwrap it. I can't tell you if this is the right approach for your program in particular without seeing more of your code, but for most use cases, fromMaybe is not the right approach.
fmap (+ 5) (Just 10)
> Just 15
fmap (+ 5) Nothing
> Nothing
Just 10 <$> (+ 5)
> Just 15
Nothing <$> (+ 5)
> Nothing
Just 10 >>= (\x -> if even x then Just $ x `div` 2 else Nothing)
> Just 5
Just 5 >>= (\x -> if even x then Just $ x `div` 2 else Nothing)
> Nothing
Nothing >>= (\x -> if even x then Just $ x `div` 2 else Nothing)
> Nothing
Using a value like -1 to indicate failure means that the rest of your program needs to know that -1 indicates a failure state. These sorts of failure values usually need to be handled separately from normal return values, which is exactly the use case that the Maybe type is designed to make more elegant. The quick test for whether you should be unwrapping a Maybe is to look and see if there is anywhere in your program where you are pattern matching (or using guards or if or case, etc.) to try to catch this -1 somewhere else to trigger specialized behavior. If you are, then you shouldn't be unwrapping Maybe (here).

Related

Iterate produces StackOverflow errors

So I just started out with Frege and Haskell as well. I have experience with functional languages, since I was using Clojure for a couple of years now.
The first thing I wanted to try out is my usual approach at the Fibonacci numbers.
next_fib (a, b) = (b, a + b)
fibs = map fst $ iterate next_fib (0, 1)
fib x = head $ drop x fibs
This is how it turned out in Frege. It works, but for very high numbers for fib, e.g. (fib 4000), it throws StackOverflow errors. This surprised me, because same functions in Clojure would work just fine. Is this a Frege bug or am I getting the whole lazy evaluation thing wrong?
You probably don't "get the whole lazy evaluation thing wrong", but you're bitten twice by too lazy evaluation in this case.
And although GHC essentially works exactly the same as Frege in this regard, the outcome is different and seemingly unfavorable for Frege.
But the reason that Haskell can get awya with really big thunks[see below], while Frege early aborts with stack overflow is the way the runtime systems manage heap and stack. The Haskell RTS is flexible and can devote huge portions of the available memory to the stack, if the need arises. While Frege's runtime system is the JVM, which usually starts out with a tiny stack, just enough to accomodate a call depth of a few hundred. As you have observed, giving the JVM enough stack space makes the think work, exactly like it would in GHC.
Because of the ever scarce stack space in the JVM, we have developed some techniques in Frege to avoid unwanted and unneeded laziness. Two of them will be explained below. In the end, in Frege you are forced to control bad effects of laziness early, while the GHC developer can happily code away without having to take notice.
To understand the following, we need to introduce the concept "thunk". A thunk is first and foremost some yet to be evaluated expression. For example, since tuples are lazy, an expression like
(b, b+a)
is compiled to an application of the tuple constructor (,) to b and {a+b} where the notation { e } for the sake of this discussion means some implementation dependent representation of a thunk that promises to compute the expression e when evaluated. In addition, a thunk memoizes its result upon evaluation, so whenver the same thunk is evaluated again, it just returns the precomputed result. (This is only possible in a pure functional language, of course.)
For example, in Frege, to represent thunks there is a class Delayed<X> that implements Callable<X> and arranges for memoization of the result.
We shall now investigate what the result of
next_fib (next_fib (0, 1))
is. The inner application results in:
(1, {0+1})
and then the outer one computes from that:
({0+1}, {1+{0+1}})
We see here that thunks can get nested in other thunks, and this is the problem here, since every application of next_fib will result in a tuple that will have as its elements thunks that have thunks of the previous iteration nested inside them.
Now consider what is happening when the thunk for the 4000th fib-number gets evaluated, which happens, for instance, when you print it. It will have to perform an addition, but the numbers to add are actually both thunks, which must be evaluated before the addition can take place. In this way, each nested thunk means an invocation of that thunks evaluation method, unless the thunk is already evaluated. Hence, to print the 4000th number, we need a stack depth of at least 4000 in the case when no other thunk of this series was evaluated before.
So the first measure was to replace the lazy tuple constructor with the strict one:
(b; a+b)
It doesn't build thunks but computes the arguments right away. This is not available in Haskell, to do the same there you need to say something like:
let c = a+b in b `seq` c `seq` (b,c)
But this was not the end of the story. It turned out that the computation fib 4000 still overflowed the stack.
The reason is the implementation of iterate that goes like this:
iterate f x = x : iterate f (f x)
This builds an infinite list
[ x, f x, f (f x), f (f (f x)), ...]
Needless to say, all the terms except the first one are thunks!
This is normally not a problem when the list elements are evaluated in sequential order, because when, for example, the 3rd term
{f {f x}}
gets evaluated, the inner thunk is already evaluated and returns the result right away. In general, we need only enough stack depth to reach the first previously evaluated term. Here is a demo straight from the frege online REPL at try.frege-lang.org
frege> next (a,b) = (b; a+b) :: (Integer, Integer)
function next :: (Integer,Integer) -> (Integer,Integer)
frege> fibs = map fst $ iterate next (0,1)
function fibs :: [Integer]
frege> fib = (fibs!!)
function fib :: Int -> Integer
frege> map (length . show . fib) [0,500 ..]
[1,105,209,314,418,523,627,732,836,941,1045,1150,...]
frege> fib 4000
39909473435004422792081248094960912600792...
Here, with the map, we force evaluation of every 500th number (as far as the REPL demands output, it will only print initial portions of infinite lists), and compute the length of the decimal representation of each number (just so as not to display the large resulting numbers). This, in turn forces evaluation of the 500 preceding numbers, but this is ok, as there is enough stack space for that. Once that is done, we can even compute fib 4000! Because now, all the thunks up to 6000 are already evaluated.
But we can do even better with a slightly better version of iterate, which uses the head strict constructor (!:):
a !: as = a `seq` (a:as)
This evaluates the head of the list right away, which is appropriate in our case.
With the two changes, we get a program whose stack demand does not depend on the argument of fib anymore. Here is the proof:
frege> iterate' f x = x !: iterate' f (f x)
function iterate' :: (a->a) -> a -> [a]
frege> fibs2 = map fst $ iterate' next (0,1)
function fibs2 :: [Integer]
frege> (length . show . (fibs2 !!)) 4000
836
frege> (length . show . (fibs2 !!)) 8000
1672
frege> (length . show . (fibs2 !!)) 16000
3344
frege> (length . show . (fibs2 !!)) 32000
6688
frege> (length . show . (fibs2 !!)) 64000
13375
frege> (length . show . (fibs2 !!)) 128000
java.lang.OutOfMemoryError: Java heap space
Well, we'd need more heap space now to keep more than 100.000 huge numbers. But notice that there was no stack problem anymore to compute 32.000 new numbers in the last step.
We could get rid of the heap space problem with a simple tail recursive definition that doesn't need to mark all those numbers:
fib :: Int -> Integer
fib n = go n 0 1 where
go :: Int -> Integer -> Integer -> Integer
go 0 !a !b = a
go n !a !b = go (n-1) b (a+b)
I guess this would be even faster than traversing the list.
Unlike(?) in Clojure, direct list access is O(n), and long lists consume lots of space. Therefore, if you need to cache something and have an upper limit, you better use arrays. Here are 2 ways to construct an array of 10000 fibs:
frege> zzz = arrayFromList $ take 10000 $ map fst $ iterate (\(a,b) -> (b; a+b)) (0n,1)
function zzz :: JArray Integer
frege> elemAt zzz 4000
39909473435004422792081248094960912600792570982820257 ...
This works, because the intermediate list should never exist as a whole. And once created, access is O(1)
And there is also a special function for creating caches like that:
yyy = arrayCache f 10000 where
f 0 a = 0n
f 1 a = 1n
f n a = elemAt a (n-1) + elemAt a (n-2)
fib = elemAt yyy
This avoids even the intermediate list, all the tuples, and so on.
This way, you can keep your good habit of prefering combinators over explicit recursion. Please give it a try.

Pack consecutive duplicates of list elements into sublists in Ocaml

I found this problem in the website 99 problems in ocaml. After some thinking I solved it by breaking the problem into a few smaller subproblems. Here is my code:
let rec frequency x l=
match l with
|[]-> 0
|h::t-> if x=[h] then 1+(frequency x t)
else frequency x t
;;
let rec expand x n=
match n with
|0->[]
|1-> x
|_-> (expand x (n-1)) # x
;;
let rec deduct a b=
match b with
|[]-> []
|h::t -> if a=[h] then (deduct a t)
else [h]# (deduct a t)
;;
let rec pack l=
match l with
|[]-> []
|h::t -> [(expand [h] (frequency [h] l))]# (pack (deduct [h] t))
;;
It is rather clear that this implementation is overkill, as I have to count the frequency of every element in the list, expand this and remove the identical elements from the list, then repeat the procedure. The algorithm complexity is about O(N*(N+N+N))=O(N^2) and would not work with large lists, even though it achieved the required purpose. I tried to read the official solution on the website, which says:
# let pack list =
let rec aux current acc = function
| [] -> [] (* Can only be reached if original list is empty *)
| [x] -> (x :: current) :: acc
| a :: (b :: _ as t) ->
if a = b then aux (a :: current) acc t
else aux [] ((a :: current) :: acc) t in
List.rev (aux [] [] list);;
val pack : 'a list -> 'a list list = <fun>
the code should be better as it is more concise and does the same thing. But I am confused with the use of "aux current acc" in the inside. It seems to me that the author has created a new function inside of the "pack" function and after some elaborate procedure was able to get the desired result using List.rev which reverses the list. What I do not understand is:
1) What is the point of using this, which makes the code very hard to read on first sight?
2) What is the benefit of using an accumulator and an auxiliary function inside of another function which takes 3 inputs? Did the author implicitly used tail recursion or something?
3) Is there anyway to modify the program so that it can pack all duplicates like my program?
These are questions mostly of opinion rather than fact.
1) Your code is far harder to understand, in my opinion.
2a) It's very common to use auxiliary functions in OCaml and other functional languages. You should think of it more like nested curly braces in a C-like language rather than as something strange.
2b) Yes, the code is using tail recursion, which yours doesn't. You might try giving your code a list of (say) 200,000 distinct elements. Then try the same with the official solution. You might try determining the longest list of distinct values your code can handle, then try timing the two different implementations for that length.
2c) In order to write a tail-recursive function, it's sometimes necessary to reverse the result at the end. This just adds a linear cost, which is often not enough to notice.
3) I suspect your code doesn't solve the problem as given. If you're only supposed to compress adjacent elements, your code doesn't do this. If you wanted to do what your code does with the official solution you could sort the list beforehand. Or you could use a map or hashtable to keep counts.
Generally speaking, the official solution is far better than yours in many ways. Again, you're asking for an opinion and this is mine.
Update
The official solution uses an auxiliary function named aux that takes three parameters: the currently accumulated sublist (some number of repetitions of the same value), the currently accumulated result (in reverse order), and the remaining input to be processed.
The invariant is that all the values in the first parameter (named current) are the same as the head value of the unprocessed list. Initially this is true because current is empty.
The function looks at the first two elements of the unprocessed list. If they're the same, it adds the first of them to the beginning of current and continues with the tail of the list (all but the first). If they're different, it wants to start accumulating a different value in current. It does this by adding current (with the one extra value added to the front) to the accumulated result, then continuing to process the tail with an empty value for current. Note that both of these maintain the invariant.

Haskell List complexity

sorry if this question has already been asked, I didn't find it. And sorry for my poor english.
I'm learning Haskell and try to use lists.
I wrote a function which transforms a list following a specific pattern, I can't check if it works now, but i think so.
This function is not a tail call function, so I think it will be horrible to compute this function with a big list:
transform :: [Int] -> [Int]
transform list = case list of
(1:0:1:[]) -> [1,1,1,1]
(1:1:[]) -> [1,0,1]
[1] -> [1,1]
(1:0:1:0:s) -> 1:1:1:1: (transform s)
(1:1:0:s) -> 1:0:1: (transform s)
(1:0:s) -> 1:1: (transform s)
(0:s) -> 0: (transform s)
So I thought about another function, which would be "better":
transform = reverse . aux []
where
aux buff (1:0:[1]) = (1:1:1:1:buff)
aux buff (1:[1]) = (1:0:1:buff)
aux buff [1] = (1:1:buff)
aux buff (1:0:1:0:s) = aux (1:1:1:1:buff) s
aux buff (1:1:0:s) = aux (1:0:1:buff) s
aux buff (1:0:s) = aux (1:1:buff) s
aux buff (0:s) = aux (0:buff) s
The problem is that I don't know how it compiles and if I'm wrong with the second function. Can you explain me how lists work ? Is it better to use (++) or reverse the list at the end ?
Thank you in advance for your answers
The first function is perfectly fine and in my opinion preferable to the second one.
The reason is laziness. If you have a transform someList expression in your code, the resulting list will not be evaluated unless you demand it. In particular the list will be evaluated only as far as it is needed; print $ take 10 $ transform xs will do less work than print $ take 20 $ transform xs.
In a strict language transform would indeed encumber the stack, since it would have to evaluate the whole list (in a non-tail recursive way) before returning anything of use. In Haskell transform (0:xs) evaluates to 0 : transform xs, a usable partial result. We can inspect the head of this result without touching the tail. There is no danger of stack overflow either: at any time there is at most a single unevaluated thunk (like transform xs in the previous example) in the tail of the list. If you demand more elements, the thunk will be just pushed further back, and the stack frame of the previous thunk can be garbage collected.
If we always fully evaluate the list then the performance of the two functions should be similar, or even then the lazy version could be somewhat faster because of the lack of reversing or the extra ++-s. So, by switching to the second function we lose laziness and gain no extra performance.
Your first version looks much better to me1. It's fine that it's not tail-recursive: you don't want it to be tail-recursive, you want it to be lazy. The second version can't produce even a single element without processing the entire input list, because in order to reverse the result of aux, the entirety of aux must be known. However,
take 10 . transform $ cycle [1,0,0,1,1,1]
would work fine with your first definition of transform, because you only consume as much of the list as you need in order to make a decision.
1 But note that (1:0:1:[]) is just [1,0,1].

Why is Haskell giving "ambiguous type variable" error?

A past paper problem asked me; to define a function p :: [a] -> [a] that swaps every two items in a list. Your function should swap the first with the second item, the third
with the fourth, and so on. define one by list comprehension another by recursion.
Well this is what I came up with:
import Test.QuickCheck
p :: [a] -> [a]
p [] = []
p xs = concat [ [y,x] | ((x,y),i) <- zip (zip xs (tail xs)) [1..], odd i]
q :: [a] -> [a]
q [] = []
q (x:y:zs) | even (length zs) = [y,x] ++ q zs
| otherwise = error "The list provided is not of even length"
prop_2 xs = even (length xs) ==> p xs == q xs
check2 = quickCheck prop_2
The functions work fine, but I wanted to check if the two are identical, so I put the quickCheck below; but this gives me an error for some reason saying
"ambiguous type variable [a0] arising from the use of prop_2"
I just don't understand what's wrong here, I looks perfectly sensible to me...
what exactly is Haskell complaining??
Let's start by commenting out check2 and asking GHCi what the type of prop_2 is:
*Main> :t prop_2
prop_2 :: Eq a => [a] -> Property
Ok, so the way you've written prop_2, it works for lists of any element type that is in the equality class.
Now, you want to pass prop_2 to the quickCheck function. Let's look at the type of quickCheck next:
*Main> :t quickCheck
quickCheck :: Testable prop => prop -> IO ()
This function actually has an immensely general type. It works on anything that's in the Testable class. So how does this Testable class work? There are a couple of base instances here, for example:
instance Testable Bool
instance Testable Property -- this is a simplification, but it's more or less true
These instances are defined in the QuickCheck library and tell you that you can quick-check
constant Booleans as well as elements of type Property.
Now, testing properties that do not depend on any inputs isn't particularly interesting. The interesting instance is this one:
instance (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)
What this says is that if you know how to generate random values of a particular type (Arbitrary a) and how to show values of that type (Show a), then you can also test functions from that type a to an already testable type prop.
Why? Because that's how QuickCheck operates. In such a situation, QuickCheck will consult the Arbitrary instance in order to come up with random test cases of type a, apply the function to each of them, and check if the outcome is positive. If any of the tests fails, it will print a message informing you about the test failure, and it will print the test case (that's why there's a Show requirement, too).
Now, in our situation this means that we should be able to quick-check prop_2: it is a function that results in a Property. The important thing is that the function argument (of type [a] as long as Eq a holds) is a member of class Arbitrary and a member of class Show.
Here we arrive at the source of the error. The information available is not sufficient to make that conclusion. As I said in the beginning, prop_2 works for lists of any element type that admits equality. There's no built-in rule that says that all these type are in Arbitrary and Show. But even if there was, what kind of lists should QuickCheck generate? Should it generate lists of Booleans, lists of unit type, lists of functions, lists of characters, lists of integers? So many options, and the choice of element type may well affect whether you find a bug or not. (Consider that GHC would pick the unit type () with just one element. Then your property would hold for any two functions p and q that preserve the input lists' length, regardless of whether they have your desired swapping property or not.)
This is why you need to provide extra type information to GHC so that it can resolve which element type to use for the list. Doing so is simple enough. You can either annotate prop_2 itself:
prop_2 :: [Integer] -> Property
Or, if you don't want that (because you might want to run the tests on different kinds of lists without reimplementing prop_2), you can add a type annotation when calling quickCheck:
check2 = quickCheck (prop_2 :: [Integer] -> Property)
Now the code compiles, and we can run check2:
Main*> check2
+++ OK, passed 100 tests.

`Ord a =>` or `Num a =>`

I have the following functions:
which (x:xs) = worker x xs
worker x [] = x
worker x (y:ys)
| x > y = worker y ys
| otherwise = worker x ys
and am wondering how I should define the types signatures of these above functions which and worker?
For Example, which of the following ways would be best as a type signature for worker?
worker :: Num a => a -> [a] -> a,
or
worker :: Ord a => a -> [a] -> a?
I'm just really confused and don't get which these three I should choose. I'd appreciate your thoughts. Thanks.
If you define the function without an explicit type signature, Haskell will infer the most general one. If you’re unsure, this is the easiest way to figure out how your definition will be read; you can then copy it into your source code. A common mistake is incorrectly typing a function and then getting a confusing type error somewhere else.
Anyway, you can get info on the Num class by typing :i Num into ghci, or by reading the documentation. The Num class gives you +, *, -, negate, abs, signum, fromInteger, as well as every function of Eq and Show. Notice that < and > aren’t there! Requiring values of Num and attempting to compare them will in fact produce a type error — not every kind of number can be compared.
So it should be Ord a => ..., as Num a => ... would produce a type error if you tried it.
If you think about what your functions do, you'll see that which xs returns the minimum value in xs. What can have a minimum value? A list of something Orderable!
Ask ghci and see what it says. I just copy-pasted your code as is into a file and loaded it into ghci. Then I used :t which is a special ghci command to determine the type of something.
ghci> :t which
which :: (Ord t) => [t] -> t
ghci> :t worker
worker :: (Ord a) => a -> [a] -> a
Haskell's type inference is pretty smart in most cases; learn to trust it. Other answers sufficiently cover why Ord should be used in this case; I just wanted to make sure ghci was clearly mentioned as a technique for determining the type of something.
I would always go with the Ord type constraint. It is the most general, so it can be reused more often.
There is no advantage to using Num over Ord.
Int may have a small advantage as it is not polymorphic and would not require a dictionary lookup. I would stil use Ord and use the specialize pragma if I needed to for performance.
Edit: Altered my answer after comments.
It depends on what you want to being able to compare. If you want to being able to compare Double, Float, Int, Integer and Char then use Ord. If you only want to being able to compare Int then just use Int.
If you have another problem like this, just look at the instances of the type class to tell which types you want to be able to use in the function.
Ord documentation