list understanding and recursive if statements in Haskell - if-statement

I'm currently practicing Haskell in a number of ways.
using lists to create a ton of fun things.
Now I'm (think) having problems with understanding if statements.
What I want to do is make a Focuslist that shifts the focus to the most left item.
Focuslists are already quite tricky, essentially being two independent lists.
And it also splits the whole list in half, and reverses the back list.
For instance. If you want to make a focuslist of [0,1,2,3,4,5], and want a focus on 3, The focuslist will be [3,4,5][2,1,0].
I've already made three specific functions.
One that makes the focuslist datatype:
data FocusList a = FocusList { forward :: [a], backward :: [a]}
With which you can call it by using
fromList :: [a] -> FocusList a
fromList list = FocusList list []
One that changes it back to a list from a focuslist:
toList :: FocusList a -> [a]
toList (FocusList fw bw) = reverse bw ++ fw
and one that shifts it to left once, changes [0,1,2,3,4,5] to [0,1,2,3,4,5] which now looks like [2,3,4,5][0,1] as focuslist:
goLeft :: FocusList a -> FocusList a
goLeft (FocusList fw (f:bw)) = FocusList (f:fw) bw
Now, to the main point. If I were to shift it all the way to the left. I want to use goLeft until the length of the list is 1. I was thinking of using a recursive if statement until the length of the first list equals to one. and using goLeft if it was not one.
So i thought of a simple if statement. Which (for now) doesn't work at all.
it uses leftMost :: FocusList a -> FocusList a
leftMost (FocusList fw (f:bw)) = if (length (FocusList fw) == 1)
then FocusList (f:fw) bw
return leftMost
else FocusList fw (f:bw)
I was thinking of it the pythonic way. Which doesn't seem to work. How do I make it logically recursive?

Don't use length, it costs O(N) since it has to scan the whole list. If you instead use pattern matching, in this case you only pay a O(1) cost.
A simple approach is to use two equations, which are tried one after the other one.
leftMost :: FocusList a -> FocusList a
-- if there's only one backward item, leave the list as it is
leftMost (FocusList fw [f]) = FocusList fw [f]
-- otherwise, goLeft, then recurse
leftMost fl = leftMost (goLeft fl)
The pattern [f] only matches lists with a single item. It's equivalent to (f:[]).
Note that the code above will crash if the backward part is empty. If that's an issue, you'll need to handle that by adding more equations.
Alternatively, one can shorten the code using an as-pattern:
leftMost :: FocusList a -> FocusList a
leftMost fl#(FocusList fw [f]) = fl -- fl is the whole input
leftMost fl = leftMost (goLeft fl)
We can also inline goLeft, if we wish:
leftMost :: FocusList a -> FocusList a
leftMost fl#(FocusList fw [f]) = fl
leftMost (FocusList fw (f:bw)) = leftMost (FocusList (f:fw) bw)
Or even handle the empty-backward-list issue mentioned above:
leftMost :: FocusList a -> FocusList a
leftMost fl#(FocusList fw [f]) = fl
leftMost (FocusList fw (f:bw)) = leftMost (FocusList (f:fw) bw)
leftMost (FocusList (f:fw) []) = FocusList fw [f]
leftMost (FocusList [] []) = error "leftMost: empty list"
The last case, referring to an empty list, is hard to handle in a sensible way. You can choose to crash with an error message (as done above), return the empty list (is that the intended result?), or report the error to the caller by returning a Maybe (FocusList a) instead.

Related

How to use foldr to add variables to each other in a list?

When given a list [x0, x1, x2, . . . , xn−1], the function
should return the list [y0, y1, y2, . . . , yn−1] where y0 = x0, y1 = x0 + x1, ...
So if you had [1,2,3] as input, you would get [1,3,6] as output
I don't completely understand foldr, so maybe if I could get some help in trying to figure out how to change that last line to get the right answer.
scan :: [Integer] -> [Integer]
scan [] = []
scan [x] = [x]
scan (x:xs) = x : foldr (/y -> y (+) x) 0 (scan xs)
My initial solution (that works) uses the map function.
scan :: [Integer] -> [Integer]
scan [] = []
scan [x] = [x]
scan (x:xs) = x : map (+x) (scan xs)
EDIT, I added this first section to better address your two implementations.
First, addressing your issue with your implementation using foldr, here are a few remarks:
Lambdas start with a backslash in Haskell, not a slash. That's because backslashes kind of look like the lambda greek letter (λ).
Functions named using only special characters, like +, are infix by default. If you use parens around them, it turns them into prefix functions:
$> (+) 1 5
$> 6
The function passed to foldr takes two argument, whereas you're only supplying one in your lambda. If you really want to ignore the second one, you can use a _ instead of binding it to a variable (\x _ -> x).
I think this you're going down a rabbit hole with this implementation. See the discussion below for my take on the right way to tackle this issue.
Note: It is possible to implement map using foldr (source), that's one way you could use foldr in your working (second) implementation.
Implementing this with foldr is not optimal, since it folds, as the name implies, from the right:
foldr1 (+) [1..5]
--is equivalent to:
(1+(2+(3+(4+5))))
As you can see, the summing operation is done starting from the tail of the list, which is not what you're looking for. To make this work, you would have to "cheat", and reverse your list twice, once before folding it and once after:
scan = tail . reverse . foldr step [0] . reverse where
step e acc#(a:_) = (e + a) : acc
You can make this better using a left fold, which folds from the left:
foldl1 (+) [1..5]
--is equivalent to:
((((1+2)+3)+4)+5)
This, however, still isn't ideal, because to keep the order of elements in your accumulator the same, you would have to use the ++ function, which amounts to quadratic time complexity in such a function. A compromise is to use the : function, but then you still have to reverse your accumulator list after the fold, which is only linear complexity:
scan' :: [Integer] -> [Integer]
scan' = tail . reverse . foldl step [0] where
step acc#(a:_) e = (e + a) : acc
This still isn't very good, since the reverse adds an extra computation. The ideal solution would therefore be to use scanl1, which, as a bonus, doesn't require you to give a starting value ([0] in the examples above):
scan'' :: [Integer] -> [Integer]
scan'' = scanl1 (+)
scanl1 is implemented in terms of scanl, which is defined roughly like this:
scanl f init list = init : (case list of
[] -> []
x:xs -> scanl f (f init x) xs)
You can therefore simply do:
$> scanl1 (+) [1..3]
$> [1,3,6]
As a final note, your scan function is unnecessarily specialized to Integer, as it only requires a Num constraint:
scan :: Num a => [a] -> [a]
This might even lead to an increase in performance, but that's where my abilities end, so I won't go any further :)

Check if a list of lists has two or more identical elements

I need to write a function which checks if a list has two or more same elements and returns true or false.
For example [3,3,6,1] should return true, but [3,8] should return false.
Here is my code:
identical :: [Int] -> Bool
identical x = (\n-> filter (>= 2) n )( group x )
I know this is bad, and it does not work.
I wanted to group the list into list of lists, and if the length of a list is >= 2, then it is should return with true otherwise false.
Use any to get a Bool result.
any ( . . . ) ( group x )
Don’t forget to sort the list, group works on consecutive elements.
any ( . . . ) ( group ( sort x ) )
You can use (not . null . tail) for a predicate, as one of the options.
Just yesterday I posted a similar algorithm here. A possible way to go about it is,
generate the sequence of cumulative sets of elements
{}, {x0}, {x0,x1}, {x0,x1,x2} ...
pair the original sequence of elements with the cumulative sets
x0, x1 , x2 , x3 ...
{}, {x0}, {x0,x1}, {x0,x1,x2} ...
check repeated insertions, i.e.
xi such that xi ∈ {x0..xi-1}
This can be implemented for instance, via the functions below.
First we use scanl to iteratively add the elements of the list to a set, producing the cumulative sequence of these iterations.
sets :: [Int] -> [Set Int]
sets = scanl (\s x -> insert x s) empty
Then we zip the original list with this sequence, so each xi is paired with {x0...xi-1}.
elsets :: [Int] -> [(Int, Set Int)]
elsets xs = zip xs (sets xs)
Finally we use find to search for an element that is "about to be inserted" in a set which already contains it. The function find returns the pair element / set, and we pattern match to keep only the element, and return it.
result :: [Int] -> Maybe Int
result xs = do (x,_) <- find(\(y,s)->y `elem` s) (elsets xs)
return x
The another way to do that using Data.Map as below is not efficient than ..group . sort.. solution, it is still O(n log n) but able to work with infinite list.
import Data.Map.Lazy as Map (empty, lookup, insert)
identical :: [Int] -> Bool
identical = loop Map.empty
where loop _ [] = False
loop m (x:xs) = if Map.lookup x m == Nothing
then loop (insert x 0 m) xs
else True
OK basically this is one of the rare cases where you really need sort for efficiency. In fact Data.List.Unique package has a repeated function just for this job and if the source is checked one can see that sort and group strategy is chosen. I guess this is not the most efficient algorithm. I will come to how we can make sort even more efficient but for the time being let's enjoy a little since this is a nice question.
So we have the tails :: [a] -> [[a]] functions in Data.List package. Accordingly;
*Main> tails [3,3,6,1]
[[3,3,6,1],[3,6,1],[6,1],[1],[]]
As you may quickly notice we can zipWith the tail of tails list which is [[3,6,1],[6,1],[1],[]], with the given original list by applying a function to check if all item are different. This function could be a list comprehension or simply the all :: Foldable t => (a -> Bool) -> t a -> Bool function. The thing is, I would like to short circuit zipWith so that once i meet the first dupe let's just stop zipWith doing wasteful work by checking the rest. For this purpose i can use the monadic version of zipWith, namely zipWithM :: Applicative m => (a -> b -> m c) -> [a] -> [b] -> m [c] which lives in Control.Monad package. The reason being, from it's type signature we understand that it shall stop calculating any further when it accounts for a Nothing or Left whatever in the middle if my monad happens to be Maybe or Either.
Oh..! In Haskell I also love to use the bool :: a -> a -> Bool -> a function instead of if and then. bool is the ternary operation of Haskell which goes like
bool "work time" "coffee break" isCoffeeTime
The negative choice is on the left and the positive one is on the right where isCoffeeTime :: Bool is a function to return True if it is coffee time. Very composable as well.. so cool..!
So since we now have all the background knowledge we may proceed with the code
import Control.Monad (zipWithM)
import Data.List (tails)
import Data.Bool (bool)
anyDupe :: Eq a => [a] -> Either a [a]
anyDupe xs = zipWithM f xs ts
where ts = tail $ tails xs
f = \x t -> bool (Left x) (Right x) $ all (x /=) t
*Main> anyDupe [1,2,3,4,5]
Right [1,2,3,4,5] -- no dupes so we get the `Right` with the original list
*Main> anyDupe [3,3,6,1]
Left 3 -- here we have the first duplicate since zipWithM short circuits.
*Main> anyDupe $ 10^7:[1..10^7]
Left 10000000 -- wow zipWithM worked and returned reasonably fast.
But again.. as i said, this is still a naive approach because theoretically we are doing n(n+1)/2 operations. Yes zipWithM cuts redundancy down greatly if the first met dupe is close to the head but still this algorithm is O(n^2).
I believe it would be best to use the heavenly sort algorithm of Haskell (which is not merge sort as we know it by the way) in this particular case.
Now the algorithm award goes to -> drum roll here -> sort and fold -> applause. Sorry no grouping.
So now... once again we will use a monadic trick to utilize short circuits. We will use foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b. This, when used with Either monad also allows us to return a more meaningful result. OK lets do it. Any Left n means n is the first dupe and no more calculations while any Right _ means there are no dupes.
import Control.Monad (foldM)
import Data.List (sort)
import Data.Bool (bool)
anyDupe' :: (Eq a, Ord a, Enum a) => [a] -> Either a a
anyDupe' xs = foldM f i $ sort xs
where i = succ $ head xs -- prevent the initial value to be equal with the value at the head
f = \b a -> bool (Left a) (Right a) (a /= b)
*Main> anyDupe' [1,2,3,4,5]
Right 5
*Main> anyDupe' [3,3,6,1]
Left 3
*Main> anyDupe' $ 1:[10^7,(10^7-1)..1]
Left 1
(2.97 secs, 1,040,110,448 bytes)
*Main> anyDupe $ 1:[10^7,(10^7-1)..1]
Left 1
(2.94 secs, 1,440,112,888 bytes)
*Main> anyDupe' $ [1..10^7]++[10^7]
Left 10000000
(5.71 secs, 3,600,116,808 bytes) -- winner by far
*Main> anyDupe $ [1..10^7]++[10^7] -- don't try at home, it's waste of energy
In real world scenarios anyDupe' should always be the winner.

Recursively defining a list of monadic random numbers: most idiomatic Haskell and analogous to pure code

I am trying to recursively make a list of random numbers that uses the previous value to get the next (so recursion is required instead of map or fold, and also I prefer to make it explicit unless map/foldr makes it ridiculously simple in comparison).
Using a pure PRNG this is very straightforward and idiomatic, in my opinion (puregaussian uses System.Random to generate a normal variate and has type puregaussian :: System.Random.RandomGen t => t -> Double -> Double -> (Double, t)).
purecurse :: System.Random.RandomGen t => t -> Double -> [Double] -> [Double]
purecurse gen current [] = []
purecurse gen current (x:xs) = let (rand, gen2) = puregaussian gen 0 1
next = current + rand
in current:purecurse gen2 next xs
Unfortunately, pure PRNGs don't seem do be as well developed in Haskell as the monadic ones, so I want to do the same thing using a library like random-fu or mwc-probability, and the solutions I found to work are either unidiomatic, not as concise, or both.
Here's a solution using do notation that works, and why I'm not satisfied with it:
import Control.Monad.Primitive
import System.Random.MWC.Probability
recurse :: PrimMonad m => Gen (PrimState m) -> [Double] -> [Double] -> m [Double]
recurse gen history#(current:_) [] = return history
recurse gen history#(current:_) (x:xs) = do
rand <- (sample (normal 0 1) gen)
let next = current + rand
recurse gen (next:history) xs
First of all I would rather use >>= than do notation, but I couldn't find a way of binding the rand variable that has type m Double and then lifting it to get m [Double] at the end case. There doesn't seem to be a lot of documentation (that I could find) or examples on how to do something like that.
I thought maybe it would be necessary to nest the (>>=) operators, but that could make the function extremely complicated or unreadable. If that is the tradeoff, maybe do notation is just cleaner, but I didn't manage to make even that work and would like to know how to.
Second, the function requires the entire list to be passed on at each call, and gives the list back in reverse (and just switching next and history breaks it).
So. I would like to be able to pass the initial state and a list to recurse over that returns a monadic list of values.
The main question I would like help with is: is there a Haskell idiomatic way of writing such a recursion of monadic values resulting in a monadic list that is similar to the structure of a pure function?
The main question I would like help with is: is there a Haskell idiomatic way of writing such a recursion of monadic values resulting in a monadic list that is similar to the structure of a pure function?
You can do it in two steps. Have your recursive function return a list of "monadic actions", then compose / sequence those actions.
Lets consider a simpler but analogous function to yours, for ease of presentation. Instead of randomness lets consider input. The list you recourse over is there for size only (content is ignored) so lets just use an integer.
rc :: Int -> [Double] -> IO [Double]
rc 0 h = return h
rc n h#(cr:_) = do rand <- readLn :: IO Double
let nx = cr + rand
rc (n-1)(nx:h)
Here is a similar alternative that works the way you wants
rc' :: Int -> Double -> IO [Double]
rc' 0 cr = return []
rc' n cr = do rand <- readLn :: IO Double
let nx = cr + rand
xs <- rc' (n-1) nx
return (nx : xs)
And here without do notation
rc'' :: Int -> Double -> IO [Double]
rc'' 0 cr = return []
rc'' n cr = (readLn :: IO Double) >>= (\rand ->
let nx = cr + rand
in (rc'' (n-1) nx) >>= (\xs ->
return (nx : xs)))
In any case, another thing you can do is abstract away pieces of code, rather than have a monolithic presentation.
In each step you require the current value to generate a new one. So a step is a function of type Double -> IO Double. And this is a pretty neat type, fundamental in the world of monads. You can bind values to a step via x >>= step or compose two steps with step1 >=> step2. So lets go with it.
step :: Double -> IO Double
step cr = do rand <- readLn :: IO Double
return (cr + rand)
It's very easy to understand. You 'generate' a number, add the current one and return the result. And you want to do n such steps, so make a list of steps.
steps :: Int -> [Double -> IO Double]
steps n = replicate n step
Now you can choose how to combine them. For instance it would be very natural to fold a list of steps with >=>. You would get this,
runSteps :: Int -> Double -> IO Double
runSteps n = foldr (>=>) return (steps n)
It's close to what you want but only returns the final result, rather than accumulate the generated values at each step. Below is a (restricted) type of (>=>) and the type of the operator (*=>) we want.
(>=>) :: Monad m => (a -> m a) -> (b -> m a) -> a -> m a
(*=>) :: Monad m => (a -> m a) -> (a -> m [a]) -> a -> m [a]
The definition is,
(*=>) :: Monad m => (a -> m a) -> (a -> m [a]) -> a -> m [a]
(*=>) ac uc c = do x <- ac c
xs <- uc x
return (x:xs)
I actually think this encapsulates the bit you didn't particularly like. Now we abstracted it away to this isolated piece of code. Even away from the recursive calls. And finally we just fold to execute the steps.
execSteps :: Int -> Double -> IO [Double]
execSteps n = foldr (*=>) (\x -> return []) (steps n)
This function differs from the original one in the initial input being a Double rather than a [Double]. But this is the type that makes sense. You'd just be passing a single wrapped double in the original function. And it accumulates the elements in the 'right' order as you requested.
is there a Haskell idiomatic way of writing such a recursion of
monadic values resulting in a monadic list that is similar to the
structure of a pure function
Usually, when need apply a monadic values to a pure function, Applicative operator, such as <$>, <*> may be helpful.
In particular, for list construction, it is often apply operator (:) in recursive way to build a list, like
f [] = []
f (x:xs) = x : f xs
in prefix way:
(:) x (f xs)
However, (:) is pure function, not accept monadic value by default, but the good new is, every data type which is instance of Monad, it also be an instance of Applicative. with help of Applicative operator mentioned above, monadic value can be applied to pure function without any change. For example,
(:) <$> (pure x) <*> (pure .f) xs
will return a monadic List instead of pure list.
Return to your question, personally, I think your solution in question is already almost a idiomatic way to do that (since it is simple and readable) except always append next random value at the head of history.
As you said, the list back in reverse and worse, when the history list has old random value already, it is inconvenient to find out which is new add to it.
To solve it, it can be modified slightly as:
recurse :: PrimMonad m => Gen (PrimState m) -> [Double] -> [Double] -> m [Double]
recurse gen history [] = return history
recurse gen history (x:xs) = do rand <- (sample (normal 0 1) gen)
let next = (last history) + rand
recurse gen (history ++ [next]) xs
It make sense, if the last element of history is the newest random value.
However, the different between (:) and (++) is: (:) is O(1), but (++) is O(N), where the N is the length of history list. (and last history is also O(N) instead of O(1)).
To archive an efficient solution, a helper function may need to introduce, say, newHistory, to construct a new list of random value as:
newHistory::PrimMonad m=>Gen(PrimState m)->m Double->[Double]->m [Double]
newHistory _ _ [] = return []
newHistory gen current (x:xs) = let next = (+) <$> current <*> sample (normal 0 1) gen
in (:) <$> next <*> newHistory gen next xs
As said before, with help of Applicative operator, the syntax look like pure function, except apply function in prefix way and use Applicative operator.
And then append back to the original history list as:
(++) <$> pure history <*> newHistory gen (pure $ last history) xs
And the Applicative version of recurse function look like:
recurse2::PrimMonad m=>Gen(PrimState m)->[Double]->[Double]->m [Double]
recurse2 gen history xs =
(++) <$> pure history <*> newHistory gen (pure $ last history) xs
where newHistory::PrimMonad m=>Gen(PrimState m)->m Double->[Double]->m [Double]
newHistory _ _ [] = return []
newHistory gen current (x:xs) =
let next = (+) <$> current <*> sample (normal 0 1) gen
in (:) <$> next <*> newHistory gen next xs
In situations like this, I usually jump straight to using a streaming library with a suitably list-like interface, like streaming. They allow a more natural translation from pure code to monadic, and have the added benefit that you aren't required to construct/consume all of the results at once, but instead incrementally, just as with pure lists.
I'm not sure what purecurse is doing, but it could be written as
import Streaming
import qualified Streaming.Prelude as S
recurse :: PrimMonad m
=> Gen (PrimState m)
-> Double
-> [Double]
-> Stream (Of Double) m ()
recurse gen current [] =
return ()
recurse gen current (x:xs) =
S.yield current *> -- (*>) and (>>) work like concatenation for pure lists
lift (sample (normal 0 1) gen) >>= \rand ->
recurse gen (current + rand) xs
Or, more naturally using do-notation, as:
recurse :: PrimMonad m
=> Gen (PrimState m)
-> Double
-> [Double]
-> Stream (Of Double) m ()
recurse gen current [] =
return ()
recurse gen current (x:xs) =
do S.yield current -- (*>) and (>>) work like concatenation for pure lists
rand <- lift $ sample (normal 0 1) gen
recurse gen (current + rand) xs
Now you can use function like S.take to generate/extract only parts of the result. If you want to get the whole list, you can use S.toList_.
Your issue seems to lie with do-notation and monads. You're assuming there's much more magic going on than there actually is: learning how the desugaring works will help you out here.
Anyway, let's try and convert the non-monadic version into the monadic one step-by-step. First, the type signature:
recurse :: PrimMonad m => Gen (PrimState m) -> Double -> [Double] -> m [Double]
I'm not sure why you had [Double] as the second parameter in your version: we want to change as little as possible from the original. The first clause, then:
purecurse gen current [] = []
-- Goes to:
recurse gen current [] = return []
Again, we're changing as little as possible: no effects were happening in this clause in your pure code, so no effects should be happening here, either. You got the next two lines right:
purecurse gen current (x:xs) = let (rand, gen2) = puregaussian gen 0 1
next = current + rand
-- Goes to:
recurse gen current (x:xs) = do rand <- (sample (normal 0 1) gen)
let next = current + rand
But the last one tripped you up. Ideally, we would write:
in current:purecurse gen2 next xs
-- Goes to:
current:recurse gen next xs
But it doesn't work! What's more, you get a confusing error:
• Couldn't match type ‘Double’ with ‘[Double]’
Expected type: m [Double]
Actual type: [Double]
This is probably what led you down the wrong path. The issue has nothing to do with the lists: it's to do with the m (the encapsulating monad). When you write current : xs, xs has to be a list: in this example, it's actually a m [Double], or a list wrapped in the monad. There's two ways to solve the problem (which are both equivalent). We could unwrap the list, using do notation again:
rest <- recurse gen next xs
return (current : rest)
Or we could lift the function current : to work inside the monad:
fmap (current:) (recurse gen next xs)

To leave in the list items whose values coincide with the numbers of their positions in the list

I need to change list for example:
[1,2,4,6,5,10]
To this one
[1,2,5] (the list of elements that are on correct position).
1st element value is 1 - ok,
element value is 2 - ok,
3rd element value is 4 but expected 3 (due to the index)- remove
and etc. How can I solve the error which is attached below?
My code:
module Count where
import Control.Monad.State
nthel n xs = last xsxs
where xsxs = take n xs
deleteNth i items = take i items ++ drop (1 + i) items
repeatNTimes 0 _ = return ()
repeatNTimes n xs =
do
if (n == nthel n xs) then return()
else deleteNth (n-1) xs
repeatNTimes (n-1) xs
list = [1,2,3,4,5]
main = repeatNTimes (length list) list
I have the following error:
* Couldn't match type `Int' with `()'
Expected type: [()]
Actual type: [Int]
* In the expression: deleteNth (n - 2) xs
In a stmt of a 'do' block:
if (n == nthel n xs) then return () else deleteNth (n - 2) xs
In the expression:
do { if (n == nthel n xs) then return () else deleteNth (n - 2) xs;
repeatNTimes (n - 1) xs }
A really nice way to work with this is to stitch functions together. First one might need to get to know the functions in the Data.List module, which you can find with hoogle: http://hoogle.haskell.org
Data.List Module functions
I'll give you a little bit of a boost here. The functions I would pick out are the zip function: https://hackage.haskell.org/package/base-4.9.1.0/docs/Data-List.html#v:zip whose type is [a] -> [b] -> [(a, b)] and then the filter function https://hackage.haskell.org/package/base-4.9.1.0/docs/Prelude.html#v:filter whose type is (a -> Bool) -> [a] -> [a] and then the map function whose type is (a -> b) -> [a] -> [b] along with the fst :: (a, b) -> a
Function Composition
These functions can be stitched together using the function composition operator: (.) :: (b -> c) -> (a -> b) -> a -> c it takes two functions that share a common input/output point (in the type signature they are the second and first parameters, respectively; a -> b and b -> c) and it will then join them into one single function.
Stacking it up - required knowledge
In order to do what you want to do, you really need to know about simple types, parameterised types, ranges (including lazy infinite ranges would help), functions and possibly recursion as well some higher order functions and how Haskell functions are curried, and to understand function composition. It wouldn't hurt to add a basic understanding of what typeclasses do and are into the mix.
I helped author a tutorial which can really help with understanding how this stuff works from a usage point of view by following a series of interesting examples. It's not too long, and you might find it much easier to approach your problem once you have understood some of the more foundational stuff: http://happylearnhaskelltutorial.com — note that it's not tuned to teaching you how to construct stuff, that'll be coming in a later volume, but it should give you enough understanding to be able to at least guess at an answer, or understand the one below.
The Answer - spoilers
If you want to work this out yourself, you should stop here and come back later on when you're feeling more confident. However, I'm going to put one possible answer just below, so don't look if you don't want to know yet!
positionals :: (Enum a, Eq a, Num a) => [a] -> [a]
positionals = map fst . filter (\(x, y) -> x == y) . zip [1..]
Keep in mind this is only one way of doing this. There are simpler more explanatory ways to do it, and while it might possibly seem inefficient, Haskell has list/stream fusion which compiles that function into something that will do a single pass across your data.

Converting a hierarchical data structure to a flat one in Haskell

I'm extracting some data from a text document organized like this:
- "day 1"
- "Person 1"
- "Bill 1"
- "Person 2"
- "Bill 2"
I can read this into a list of tuples that looks like this:
[(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Where the first item of each tuple indicates the heading level, and the second item the information associated with each heading.
My question is, how can I get a list of items that looks like this:
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
I.e. one list per deepest nested item, containing all the information from the headings above it.
The closest I've gotten is this:
f [] = []
f (x:xs) = row:f rest where
leaves = takeWhile (\i -> fst i > fst x) xs
rest = dropWhile (\i -> fst i > fst x) xs
row = concat $ map (\i -> (snd x):[snd i]) leaves
Which gives me this:
[[["day 1"],["Intro 1"],["day 1"],["Bill 1"],["day 1"],["Intro 2"],["day 1"],["Bill 2"]]]
I'd like the solution to work for any number of levels.
P.s. I'm new to Haskell. I have a sense that I could/should use a tree to store the data, but I can't wrap my head around it. I also could not think of a better title.
Trees
You were right that you should probably use a tree to store the data. I'll copy how Data.Tree does it:
data Tree a = Node a (Forest a) deriving (Show)
type Forest a = [Tree a]
Building the Tree
Now we want to take your weakly typed list of tuples and convert it to a (slightly) stronger Tree of Strings. Any time you need to convert a weakly typed value and validate it before converting to a stronger type, you use a Parser:
type YourData = [(Int, [String])]
type Parser a = YourData -> Maybe (a, YourData)
The YourData type synonym represents the weak type that you are parsing. The a type variable is the value you are retrieving from the parse. Our Parser type returns a Maybe because the Parser might fail. To see why, the following input does not correspond to a valid Tree, since it is missing level 1 of the tree:
[(0, ["val1"]), (2, ["val2"])]
If the Parser does succeed, it also returns the unconsumed input so that subsequent parsing stages can use it.
Now, curiously enough, the above Parser type exactly matches a well known monad transformer stack:
StateT s Maybe a
You can see this if you expand out the underlying implementation of StateT:
StateT s Maybe a ~ s -> Maybe (a, s)
This means we can just define:
import Control.Monad.Trans.State.Strict
type Parser a = StateT [(Int, [String])] Maybe a
If we do this, we get a Monad, Applicative and Alternative instance for our Parser type for free. This makes it very easy to define parsers!
First, we must define a primitive parser that consumes a single node of the tree:
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
This is the only non-trivial piece of code we have to write, which, because it is total, handles all the following corner cases:
The list is empty
Your node has multiple values in it
The number in the tuple doesn't match the expected depth
The next part is where things get really elegant. We can then define two mutually recursive parsers, one for parsing a Tree, and the other for parsing a Forest:
import Control.Applicative
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
The first parser uses Applicative style, since StateT gave us an Applicative instance for free. However, I could also have used StateT's Monad instance instead, to give code that's more readable for an imperative programmer:
parseTree :: Int -> Parser (Tree String)
parseTree level = do
str <- parseElement level
forest <- parseForest (level + 1)
return $ Node str forest
But what about the many function? What's that doing? Let's look at its type:
many :: (Alternative f) => f a -> f [a]
It takes anything that returns a value and implements Applicative and instead calls it repeatedly to return a list of values instead. When we defined our Parser type in terms of State, we got an Alternative instance for free, so we can use the many function to convert something that parses a single Tree (i.e. parseTree), into something that parses a Forest (i.e. parseForest).
To use our Parser, we just rename an existing StateT function to make its purpose clear:
runParser :: Parser a -> [(Int, [String])] -> Maybe a
runParser = evalStateT
Then we just run it!
>>> runParser (parseForest 0) [(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Just [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
That's just magic! Let's see what happens if we give it an invalid input:
>>> runParser (parseForest 0) [(0, ["val1"]), (2, ["val2"])]
Just [Node "val1" []]
It succeeds on a portion of the input! We can actually specify that it must consume the entire input by defining a parser that matches the end of the input:
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
Now let's try it:
>>> runParser (parseForest 0 >> eof) [(0, ["val1"]), (2, ["val2"])]
Nothing
Perfect!
Flattening the Tree
To answer your second question, we again solve the problem using mutually recursive functions:
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
Let's try it!
>>> flattenForest [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
Now, technically I didn't have to use mutually recursive functions. I could have done a single recursive function. I was just following the definition of the Tree type from Data.Tree.
Conclusion
So in theory I could have shortened the code even further by skipping the intermediate Tree type and just parsing the flattened result directly, but I figured you might want to use the Tree-based representation for other purposes.
The key take home points from this are:
Learn Haskell abstractions to simplify your code
Always write total functions
Learn to use recursion effectively
If you do these, you will write robust and elegant code that exactly matches the problem.
Appendix
Here is the final code that incorporates everything I've said:
import Control.Applicative
import Control.Monad.Trans.State.Strict
import Data.Tree
type YourType = [(Int, [String])]
type Parser a = StateT [(Int, [String])] Maybe a
runParser :: Parser a -> [(Int, [String])] -> Maybe a
runParser = evalStateT
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
I seem to have solved it.
group :: [(Integer, [String])] -> [[String]]
group ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subgroups = map (str ++) $ group children
in if null children then [str] ++ group rest
else subgroups ++ group rest
group [] = []
I didn't test it much though.
The idea is to notice the recursive pattern. This function takes the first element (N, S) of the list and then gathers all entries in higher levels until another element at level N, into a list 'children'. If there are no children, we are at the top level and S forms the output. If there are some, S is appended to all of them.
As for why your algorithm doesn't work, the problem is mostly in row. Notice that you are not descending recursively.
Trees can be used too.
data Tree a = Node a [Tree a] deriving Show
listToTree :: [(Integer, [String])] -> [Tree [String]]
listToTree ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subtrees = listToTree children
in Node str subtrees : listToTree rest
listToTree [] = []
treeToList :: [Tree [String]] -> [[String]]
treeToList (Node s ns:ts) = children ++ treeToList ts where
children = if null ns then [s] else map (s++) (treeToList ns)
treeToList [] = []
The algorithm is essentially the same. The first half goes to the first function, the second half to the second.