Haskell: Scan Through a List and Apply A Different Function for Each Element - list

I need to scan through a document and accumulate the output of different functions for each string in the file. The function run on any given line of the file depends on what is in that line.
I could do this very inefficiently by making a complete pass through the file for every list I wanted to collect. Example pseudo-code:
at :: B.ByteString -> Maybe Atom
at line
| line == ATOM record = do stuff to return Just Atom
| otherwise = Nothing
ot :: B.ByteString -> Maybe Sheet
ot line
| line == SHEET record = do other stuff to return Just Sheet
| otherwise = Nothing
Then, I would map each of these functions over the entire list of lines in the file to get a complete list of Atoms and Sheets:
mapper :: [B.ByteString] -> IO ()
mapper lines = do
let atoms = mapMaybe at lines
let sheets = mapMaybe to lines
-- Do stuff with my atoms and sheets
However, this is inefficient because I am maping through the entire list of strings for every list I am trying to create. Instead, I want to map through the list of line strings only once, identify each line as I am moving through it, and then apply the appropriate function and store these values in different lists.
My C mentality wants to do this (pseudo code):
mapper' :: [B.ByteString] -> IO ()
mapper' lines = do
let atoms = []
let sheets = []
for line in lines:
| line == ATOM record = (atoms = atoms ++ at line)
| line == SHEET record = (sheets = sheets ++ ot line)
-- Now 'atoms' is a complete list of all the ATOM records
-- and 'sheets' is a complete list of all the SHEET records
What is the Haskell way of doing this? I simply can't get my functional-programming mindset to come up with a solution.

First of all, I think that the answers others have supplied will work at least 95% of the time. It's always good practice to code for the problem at hand by using appropriate data types (or tuples in some cases). However, sometimes you really don't know in advance what you're looking for in the list, and in these cases trying to enumerate all possibilities is difficult/time-consuming/error-prone. Or, you're writing multiple variants of the same sort of thing (manually inlining multiple folds into one) and you'd like to capture the abstraction.
Fortunately, there are a few techniques that can help.
The framework solution
(somewhat self-evangelizing)
First, the various "iteratee/enumerator" packages often provide functions to deal with this sort of problem. I'm most familiar with iteratee, which would let you do the following:
import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Maybe
-- first, you'll need some way to process the Atoms/Sheets/etc. you're getting
-- if you want to just return them as a list, you can use the built-in
-- stream2list function
-- next, create stream transformers
-- given at :: B.ByteString -> Maybe Atom
-- create a stream transformer from ByteString lines to Atoms
atIter :: Enumeratee [B.ByteString] [Atom] m a
atIter = I.mapChunks (catMaybes . map at)
otIter :: Enumeratee [B.ByteString] [Sheet] m a
otIter = I.mapChunks (catMaybes . map ot)
-- finally, combine multiple processors into one
-- if you have more than one processor, you can use zip3, zip4, etc.
procFile :: Iteratee [B.ByteString] m ([Atom],[Sheet])
procFile = I.zip (atIter =$ stream2list) (otIter =$ stream2list)
-- and run it on some data
runner :: FilePath -> IO ([Atom],[Sheet])
runner filename = do
resultIter <- enumFile defaultBufSize filename $= enumLinesBS $ procFile
run resultIter
One benefit this gives you is extra composability. You can create transformers as you like, and just combine them with zip. You can even run the consumers in parallel if you like (although only if you're working in the IO monad, and probably not worth it unless the consumers do a lot of work) by changing to this:
import Data.Iteratee.Parallel
parProcFile = I.zip (parI $ atIter =$ stream2list) (parI $ otIter =$ stream2list)
The result of doing so isn't the same as a single for-loop - this will still perform multiple traversals of the data. However, the traversal pattern has changed. This will load a certain amount of data at once (defaultBufSize bytes) and traverse that chunk multiple times, storing partial results as necessary. After a chunk has been entirely consumed, the next chunk is loaded and the old one can be garbage collected.
Hopefully this will demonstrate the difference:
Data.List.zip:
x1 x2 x3 .. x_n
x1 x2 x3 .. x_n
Data.Iteratee.zip:
x1 x2 x3 x4 x_n-1 x_n
x1 x2 x3 x4 x_n-1 x_n
If you're doing enough work that parallelism makes sense this isn't a problem at all. Due to memory locality, the performance is much better than multiple traversals over the entire input as Data.List.zip would make.
The beautiful solution
If a single-traversal solution really does make the most sense, you might be interested in Max Rabkin's Beautiful Folding post, and Conal Elliott's followup work (this too). The essential idea is that you can create data structures to represent folds and zips, and combining these lets you create a new, combined fold/zip function that only needs one traversal. It's maybe a little advanced for a Haskell beginner, but since you're thinking about the problem you may find it interesting or useful. Max's post is probably the best starting point.

I show a solution for two types of line, but it is easily extended to five types of line by using a five-tuple instead of a two-tuple.
import Data.Monoid
eachLine :: B.ByteString -> ([Atom], [Sheet])
eachLine bs | isAnAtom bs = ([ {- calculate an Atom -} ], [])
| isASheet bs = ([], [ {- calculate a Sheet -} ])
| otherwise = error "eachLine"
allLines :: [B.ByteString] -> ([Atom], [Sheet])
allLines bss = mconcat (map eachLine bss)
The magic is done by mconcat from Data.Monoid (included with GHC).
(On a point of style: personally I would define a Line type, a parseLine :: B.ByteString -> Line function and write eachLine bs = case parseLine bs of .... But this is peripheral to your question.)

It is a good idea to introduce a new ADT, e.g. "Summary" instead of tuples.
Then, since you want to accumulate the values of Summary you came make it an istance of Data.Monoid. Then you classify each of your lines with the help of classifier functions (e.g. isAtom, isSheet, etc.) and concatenate them together using Monoid's mconcat function (as suggested by #dave4420).
Here is the code (it uses String instead of ByteString, but it is quite easy to change):
module Classifier where
import Data.List
import Data.Monoid
data Summary = Summary
{ atoms :: [String]
, sheets :: [String]
, digits :: [String]
} deriving (Show)
instance Monoid Summary where
mempty = Summary [] [] []
Summary as1 ss1 ds1 `mappend` Summary as2 ss2 ds2 =
Summary (as1 `mappend` as2)
(ss1 `mappend` ss2)
(ds1 `mappend` ds2)
classify :: [String] -> Summary
classify = mconcat . map classifyLine
classifyLine :: String -> Summary
classifyLine line
| isAtom line = Summary [line] [] [] -- or "mempty { atoms = [line] }"
| isSheet line = Summary [] [line] []
| isDigit line = Summary [] [] [line]
| otherwise = mempty -- or "error" if you need this
isAtom, isSheet, isDigit :: String -> Bool
isAtom = isPrefixOf "atom"
isSheet = isPrefixOf "sheet"
isDigit = isPrefixOf "digits"
input :: [String]
input = ["atom1", "sheet1", "sheet2", "digits1"]
test :: Summary
test = classify input

If you have only 2 alternatives, using Either might be a good idea. In that case combine your functions, map the list, and use lefts and rights to get the results:
import Data.Either
-- first sample function, returning String
f1 x = show $ x `div` 2
-- second sample function, returning Int
f2 x = 3*x+1
-- combined function returning Either String Int
hotpo x = if even x then Left (f1 x) else Right (f2 x)
xs = map hotpo [1..10]
-- [Right 4,Left "1",Right 10,Left "2",Right 16,Left "3",Right 22,Left "4",Right 28,Left "5"]
lefts xs
-- ["1","2","3","4","5"]
rights xs
-- [4,10,16,22,28]

Related

Implement a 'List.map' function that works on a circular list

So, I found out that Ocaml supports the creation of circular lists using let rec.
utop # let rec ones = 1::ones;;
val ones : int list = [1; <cycle>]
That is pretty neat, and it even prints out in utop without blowing up.
But when I try to use List.map on this kind of data it does blow up:
utop # let twos = List.map ((+) 1) ones;;
Stack overflow during evaluation (looping recursion?).
Raised by primitive operation at Stdlib__List.map in file "list.ml", line 92, characters 32-39
Called from Stdlib__List.map in file "list.ml", line 92, characters 32-39
...
That is somewhat disapointing, though not totally unexpected.
Now the question, would it be possible to implement a 'better' map function that can handle this properly. I.e. you would do something like:
let twos = betterMap ((+) 1) ones;;
And instead of blowing up it would be able to detect the cycle properly and produce:
val twos : int list = [2; <cycle>]
Since the list of ones, though looping back on itself, is effectively a finite structure, it feels like this should be possible. But how?
It is only possible to create cyclic lists when the cycle is statistically known. It is thus impossible to create a map function that works on any cyclic lists without knowing in advance the topology of cycles in the list. For instance, this function works for lists that are 1-cycle:
let map_1_cycle f = function
| [] -> []
| a :: l ->
let rec answer = f a :: answer in
answer
The generic solution is to use sequences since as a form of lazy list, they have a much better support for infinite sequences of elements:
let ones = Seq.repeat 1
let twos = Seq.map ((+) 1) ones

Grouping a list into lists of n elements in Haskell

Is there an operation on lists in library that makes groups of n elements? For example: n=3
groupInto 3 [1,2,3,4,5,6,7,8,9] = [[1,2,3],[4,5,6],[7,8,9]]
If not, how do I do it?
A quick search on Hoogle showed that there is no such function. On the other hand, it was replied that there is one in the split package, called chunksOf.
However, you can do it on your own
group :: Int -> [a] -> [[a]]
group _ [] = []
group n l
| n > 0 = (take n l) : (group n (drop n l))
| otherwise = error "Negative or zero n"
Of course, some parentheses can be removed, I left there here for understanding what the code does:
The base case is simple: whenever the list is empty, simply return the empty list.
The recursive case tests first if n is positive. If n is 0 or lower we would enter an infinite loop and we don't want that. Then we split the list into two parts using take and drop: take gives back the first n elements while drop returns the other ones. Then, we add the first n elements to the list obtained by applying our function to the other elements in the original list.
This function, among other similar ones, can be found in the popular split package.
> import Data.List.Split
> chunksOf 3 [1,2,3,4,5,6,7,8,9]
[[1,2,3],[4,5,6],[7,8,9]]
You can write one yourself, as Mihai pointed out. But I would use the splitAt function since it doesn't require two passes on the input list like the take-drop combination does:
chunks :: Int -> [a] -> [[a]]
chunks _ [] = []
chunks n xs =
let (ys, zs) = splitAt n xs
in ys : chunks n zs
This is a common pattern - generating a list from a seed value (which in this case is your input list) by repeated iteration. This pattern is captured in the unfoldr function. We can use it with a slightly modified version of splitAt (thanks Will Ness for the more concise version):
chunks n = takeWhile (not . null) . unfoldr (Just . splitAt n)
That is, using unfoldr we generate chunks of n elements while at the same time we shorten the input list by n elements, and we generate these chunks until we get the empty list -- at this point the initial input is completely consumed.
Of course, as the others have pointed out, you should use the already existing function from the split module. But it's always good to accustom yourself with the list processing functions in the standard Haskell libraries.
This is ofte called "chunk" and is one of the most frequently mentioned list operations that is not in base. The package split provides such an operation though, copy and pasting the haddock documentation:
> chunksOf 3 ['a'..'z']
["abc","def","ghi","jkl","mno","pqr","stu","vwx","yz"]
Additionally, against my wishes, hoogle only searches a small set of libraries (those provided with GHC or perhaps HP), but you can explicitly add packages to the search using +PKG_NAME - hoogle with Int -> [a] -> [[a]] +split gets what you want. Some people use Hayoo for this reason.

Yesod: Is it possible to to iterate a haskell list in Julius?

I have a list of coordinates that I need to put on map. Is it possible in julius to iterate the list ? Right now I am creating an hidden table in hamlet and accessing that table in julius which does not seem to be an ideal solution.
Could some one point to a better solution ? Thanks.
edit: Passing a JSON string for the list (which can be read by julius) seems to solve my problem.
As far as I know, you can't directly iterate over a list in julius. However, you can use the Monoid instance for the Javascript type to accomplish a similar effect. For example:
import Text.Julius
import Data.Monoid
rows :: [Int] -> t -> Javascript
rows xs = mconcat $ map row xs
where
row x = [julius|v[#{show x}] = #{show x};
|]
Then you can use rows xs wherever you'd normally put a julius block. For example, in ghci:
> renderJavascript $ rows [1..5] ()
"v[1] = 1;\nv[2] = 2;\nv[3] = 3;\nv[4] = 4;\nv[5] = 5;\n"

Haskell - Convert x number of tuples into a list [duplicate]

I have a question about tuples and lists in Haskell. I know how to add input into a tuple a specific number of times. Now I want to add tuples into a list an unknown number of times; it's up to the user to decide how many tuples they want to add.
How do I add tuples into a list x number of times when I don't know X beforehand?
There's a lot of things you could possibly mean. For example, if you want a few copies of a single value, you can use replicate, defined in the Prelude:
replicate :: Int -> a -> [a]
replicate 0 x = []
replicate n | n < 0 = undefined
| otherwise = x : replicate (n-1) x
In ghci:
Prelude> replicate 4 ("Haskell", 2)
[("Haskell",2),("Haskell",2),("Haskell",2),("Haskell",2)]
Alternately, perhaps you actually want to do some IO to determine the list. Then a simple loop will do:
getListFromUser = do
putStrLn "keep going?"
s <- getLine
case s of
'y':_ -> do
putStrLn "enter a value"
v <- readLn
vs <- getListFromUser
return (v:vs)
_ -> return []
In ghci:
*Main> getListFromUser :: IO [(String, Int)]
keep going?
y
enter a value
("Haskell",2)
keep going?
y
enter a value
("Prolog",4)
keep going?
n
[("Haskell",2),("Prolog",4)]
Of course, this is a particularly crappy user interface -- I'm sure you can come up with a dozen ways to improve it! But the pattern, at least, should shine through: you can use values like [] and functions like : to construct lists. There are many, many other higher-level functions for constructing and manipulating lists, as well.
P.S. There's nothing particularly special about lists of tuples (as compared to lists of other things); the above functions display that by never mentioning them. =)
Sorry, you can't1. There are fundamental differences between tuples and lists:
A tuple always have a finite amount of elements, that is known at compile time. Tuples with different amounts of elements are actually different types.
List an have as many elements as they want. The amount of elements in a list doesn't need to be known at compile time.
A tuple can have elements of arbitrary types. Since the way you can use tuples always ensures that there is no type mismatch, this is safe.
On the other hand, all elements of a list have to have the same type. Haskell is a statically-typed language; that basically means that all types are known at compile time.
Because of these reasons, you can't. If it's not known, how many elements will fit into the tuple, you can't give it a type.
I guess that the input you get from your user is actually a string like "(1,2,3)". Try to make this directly a list, whithout making it a tuple before. You can use pattern matching for this, but here is a slightly sneaky approach. I just remove the opening and closing paranthesis from the string and replace them with brackets -- and voila it becomes a list.
tuplishToList :: String -> [Int]
tuplishToList str = read ('[' : tail (init str) ++ "]")
Edit
Sorry, I did not see your latest comment. What you try to do is not that difficult. I use these simple functions for my task:
words str splits str into a list of words that where separated by whitespace before. The output is a list of Strings. Caution: This only works if the string inside your tuple contains no whitespace. Implementing a better solution is left as an excercise to the reader.
map f lst applies f to each element of lst
read is a magic function that makes a a data type from a String. It only works if you know before, what the output is supposed to be. If you really want to understand how that works, consider implementing read for your specific usecase.
And here you go:
tuplish2List :: String -> [(String,Int)]
tuplish2List str = map read (words str)
1 As some others may point out, it may be possible using templates and other hacks, but I don't consider that a real solution.
When doing functional programming, it is often better to think about composition of operations instead of individual steps. So instead of thinking about it like adding tuples one at a time to a list, we can approach it by first dividing the input into a list of strings, and then converting each string into a tuple.
Assuming the tuples are written each on one line, we can split the input using lines, and then use read to parse each tuple. To make it work on the entire list, we use map.
main = do input <- getContents
let tuples = map read (lines input) :: [(String, Integer)]
print tuples
Let's try it.
$ runghc Tuples.hs
("Hello", 2)
("Haskell", 4)
Here, I press Ctrl+D to send EOF to the program, (or Ctrl+Z on Windows) and it prints the result.
[("Hello",2),("Haskell",4)]
If you want something more interactive, you will probably have to do your own recursion. See Daniel Wagner's answer for an example of that.
One simple solution to this would be to use a list comprehension, as so (done in GHCi):
Prelude> let fstMap tuplist = [fst x | x <- tuplist]
Prelude> fstMap [("String1",1),("String2",2),("String3",3)]
["String1","String2","String3"]
Prelude> :t fstMap
fstMap :: [(t, b)] -> [t]
This will work for an arbitrary number of tuples - as many as the user wants to use.
To use this in your code, you would just write:
fstMap :: Eq a => [(a,b)] -> [a]
fstMap tuplist = [fst x | x <- tuplist]
The example I gave is just one possible solution. As the name implies, of course, you can just write:
fstMap' :: Eq a => [(a,b)] -> [a]
fstMap' = map fst
This is an even simpler solution.
I'm guessing that, since this is for a class, and you've been studying Haskell for < 1 week, you don't actually need to do any input/output. That's a bit more advanced than you probably are, yet. So:
As others have said, map fst will take a list of tuples, of arbitrary length, and return the first elements. You say you know how to do that. Fine.
But how do the tuples get into the list in the first place? Well, if you have a list of tuples and want to add another, (:) does the trick. Like so:
oldList = [("first", 1), ("second", 2)]
newList = ("third", 2) : oldList
You can do that as many times as you like. And if you don't have a list of tuples yet, your list is [].
Does that do everything that you need? If not, what specifically is it missing?
Edit: With the corrected type:
Eq a => [(a, b)]
That's not the type of a function. It's the type of a list of tuples. Just have the user type yourFunctionName followed by [ ("String1", val1), ("String2", val2), ... ("LastString", lastVal)] at the prompt.

How do I add x tuples into a list x number of times?

I have a question about tuples and lists in Haskell. I know how to add input into a tuple a specific number of times. Now I want to add tuples into a list an unknown number of times; it's up to the user to decide how many tuples they want to add.
How do I add tuples into a list x number of times when I don't know X beforehand?
There's a lot of things you could possibly mean. For example, if you want a few copies of a single value, you can use replicate, defined in the Prelude:
replicate :: Int -> a -> [a]
replicate 0 x = []
replicate n | n < 0 = undefined
| otherwise = x : replicate (n-1) x
In ghci:
Prelude> replicate 4 ("Haskell", 2)
[("Haskell",2),("Haskell",2),("Haskell",2),("Haskell",2)]
Alternately, perhaps you actually want to do some IO to determine the list. Then a simple loop will do:
getListFromUser = do
putStrLn "keep going?"
s <- getLine
case s of
'y':_ -> do
putStrLn "enter a value"
v <- readLn
vs <- getListFromUser
return (v:vs)
_ -> return []
In ghci:
*Main> getListFromUser :: IO [(String, Int)]
keep going?
y
enter a value
("Haskell",2)
keep going?
y
enter a value
("Prolog",4)
keep going?
n
[("Haskell",2),("Prolog",4)]
Of course, this is a particularly crappy user interface -- I'm sure you can come up with a dozen ways to improve it! But the pattern, at least, should shine through: you can use values like [] and functions like : to construct lists. There are many, many other higher-level functions for constructing and manipulating lists, as well.
P.S. There's nothing particularly special about lists of tuples (as compared to lists of other things); the above functions display that by never mentioning them. =)
Sorry, you can't1. There are fundamental differences between tuples and lists:
A tuple always have a finite amount of elements, that is known at compile time. Tuples with different amounts of elements are actually different types.
List an have as many elements as they want. The amount of elements in a list doesn't need to be known at compile time.
A tuple can have elements of arbitrary types. Since the way you can use tuples always ensures that there is no type mismatch, this is safe.
On the other hand, all elements of a list have to have the same type. Haskell is a statically-typed language; that basically means that all types are known at compile time.
Because of these reasons, you can't. If it's not known, how many elements will fit into the tuple, you can't give it a type.
I guess that the input you get from your user is actually a string like "(1,2,3)". Try to make this directly a list, whithout making it a tuple before. You can use pattern matching for this, but here is a slightly sneaky approach. I just remove the opening and closing paranthesis from the string and replace them with brackets -- and voila it becomes a list.
tuplishToList :: String -> [Int]
tuplishToList str = read ('[' : tail (init str) ++ "]")
Edit
Sorry, I did not see your latest comment. What you try to do is not that difficult. I use these simple functions for my task:
words str splits str into a list of words that where separated by whitespace before. The output is a list of Strings. Caution: This only works if the string inside your tuple contains no whitespace. Implementing a better solution is left as an excercise to the reader.
map f lst applies f to each element of lst
read is a magic function that makes a a data type from a String. It only works if you know before, what the output is supposed to be. If you really want to understand how that works, consider implementing read for your specific usecase.
And here you go:
tuplish2List :: String -> [(String,Int)]
tuplish2List str = map read (words str)
1 As some others may point out, it may be possible using templates and other hacks, but I don't consider that a real solution.
When doing functional programming, it is often better to think about composition of operations instead of individual steps. So instead of thinking about it like adding tuples one at a time to a list, we can approach it by first dividing the input into a list of strings, and then converting each string into a tuple.
Assuming the tuples are written each on one line, we can split the input using lines, and then use read to parse each tuple. To make it work on the entire list, we use map.
main = do input <- getContents
let tuples = map read (lines input) :: [(String, Integer)]
print tuples
Let's try it.
$ runghc Tuples.hs
("Hello", 2)
("Haskell", 4)
Here, I press Ctrl+D to send EOF to the program, (or Ctrl+Z on Windows) and it prints the result.
[("Hello",2),("Haskell",4)]
If you want something more interactive, you will probably have to do your own recursion. See Daniel Wagner's answer for an example of that.
One simple solution to this would be to use a list comprehension, as so (done in GHCi):
Prelude> let fstMap tuplist = [fst x | x <- tuplist]
Prelude> fstMap [("String1",1),("String2",2),("String3",3)]
["String1","String2","String3"]
Prelude> :t fstMap
fstMap :: [(t, b)] -> [t]
This will work for an arbitrary number of tuples - as many as the user wants to use.
To use this in your code, you would just write:
fstMap :: Eq a => [(a,b)] -> [a]
fstMap tuplist = [fst x | x <- tuplist]
The example I gave is just one possible solution. As the name implies, of course, you can just write:
fstMap' :: Eq a => [(a,b)] -> [a]
fstMap' = map fst
This is an even simpler solution.
I'm guessing that, since this is for a class, and you've been studying Haskell for < 1 week, you don't actually need to do any input/output. That's a bit more advanced than you probably are, yet. So:
As others have said, map fst will take a list of tuples, of arbitrary length, and return the first elements. You say you know how to do that. Fine.
But how do the tuples get into the list in the first place? Well, if you have a list of tuples and want to add another, (:) does the trick. Like so:
oldList = [("first", 1), ("second", 2)]
newList = ("third", 2) : oldList
You can do that as many times as you like. And if you don't have a list of tuples yet, your list is [].
Does that do everything that you need? If not, what specifically is it missing?
Edit: With the corrected type:
Eq a => [(a, b)]
That's not the type of a function. It's the type of a list of tuples. Just have the user type yourFunctionName followed by [ ("String1", val1), ("String2", val2), ... ("LastString", lastVal)] at the prompt.