For an assignment im trying to combine 4 lists of scraped data into 1.
All 4 of them are ordered correctly and shown below.
["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]
My desired output would be like this
[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]
I've tried some of the solutions found on the internet but i don't understand some of them and most of them are about 2 lists or give errors when i try to implement them.
For more reference, my code looks like this:
urlString :: String
urlString = "https://www.example.com"
--Main function in which we call the other functions
main :: IO()
main = do
resultTitle <- scrapeURL urlString scrapeHANTitle
resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
resultDate <- scrapeURL urlString scrapeHANDate
resultAuthor <- scrapeURL urlString scrapeHANAuthor
print resultTitle
print resultSubtitle
print resultDate
print resultAuthor
scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
chroots ("div" #: [hasClass "card-news__body"]) scrapeTitle
scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
chroots ("div" #: [hasClass "card-news__body"]) scrapeSubTitle
scrapeHANDate :: Scraper String [String]
scrapeHANDate =
chroots ("div" #: [hasClass "card-article__meta__body"]) scrapeDate
scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
chroots ("div" #: [hasClass "card-article__meta__body"]) scrapeAuthor
-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
text $ "a" #: [hasClass "card-news__body__title"]
-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
text $ "span" #: [hasClass "card-news__body__eyebrow"]
--gets the date on which the news item was posted
scrapeDate :: Scraper String String
scrapeDate = do
text $ "div" #: [hasClass "card-news__footer__body__date"]
--gets the author of the news item
scrapeAuthor :: Scraper String String
scrapeAuthor = do
text $ "div" #: [hasClass "card-news__footer__body__author"]
I also tried the following below but it gave me a bunch of type errors.
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = \s1 -> \s2 -> \s3 -> \s4 ->s1 ++ s2 ++ s3 ++ s4
You can make use of the Monoid instance and work with:
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4
Here you are however scraping the same page, so you can combine the data from the scraper with:
myScraper :: Scraper String [String]
myScraper = do
da <- scrapeHANTitle
db <- scrapeHANSubtitle
dc <- scrapeHANDate
dd <- scrapeHANAuthor
return da ++ db ++ dc ++ dd
and then run this with:
main :: IO()
main = do
result <- scrapeURL urlString myScraper
print result
or shorter:
main :: IO()
main = scrapeURL urlString myScraper >>= print
You can combine four lists using zip4 from Data.List.
import Data.List
list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]
result = zip4 list1 list2 list3 list4
result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]
The two results differ slightly. Result result creates a list of tuples. Result result2 creates a list of lists, as requested. A list of tuples is probably better, because:
The list can contain any number of values, all of the same type (Haskell lists are homogenous)
Tuples can contain any types, so more flexibility
Tuples with two values are a different type than tuples with three values, so if you want collections of four values using tuples stops the user squeezing in a collection of three values or five values
Related
Starting from a list of Text, I'd like to convert this into a Record where each field corresponds to a value in the list.
import Data.Text
data MyRecord = MyRecord {
rOne :: Text,
rTwo :: Text,
rThree :: Int
}
txt :: [Text]
txt = ["foo", "bar", "12"]
I'm not sure how to (cleanly) proceed from the list of Text up to the record. It is assumed that this list will always be of the same size (here, 3) and there will always be an Int-like value in third position.
The minimal naive working version I've got to is so desperately hacky that I can't resign myself to committing this into the codebase:
-- assuming RecordWildCards
readText :: [Text] -> MyRecord
readText l =
let rOne = l !! 0
rTwo = l !! 1
rThree :: Int = read $ l !! 2
in MyRecord {..}
What would be a more Haskell-friendly way of doing this?
Note: what I'm actually after is parsing a comma-separated file into a list of MyRecord, without a csv library. The rows will not contain commas inside a value, so splitting on commas can be considered safe here (e.g. values = T.splitOn "," <$> lines).
If you are absolutely sure the list is three elements long, you could use
readText :: [Text] -> MyRecord
readText [rOne, rTwo, rThreeText] = MyRecord {..}
where rThree :: Int = read rThreeText
Still, you might wish to make the pattern matching exhaustive, just in case:
readText :: [Text] -> MyRecord
readText [rOne, rTwo, rThreeText] = MyRecord {..}
where rThree :: Int = read rThreeText
readText _ = error "readText: list length /= 3"
I'm new at haskell and I'm trying to print the elements of a list in a same line . For example:
[1,2,3,4] = 1234
If elements are Strings I can print it with mapM_ putStr ["1","2","3","\n"]
but they aren't.. Someone knows a solution to make a function and print that?
I try dignum xs = [ mapM_ putStr x | x <- xs ] too buts don't work ..
You can use show :: Show a => a -> String to convert an element (here an integer), to its textual representation as a String.
Furthermore we can use concat :: [[a]] -> [a] to convert a list of lists of elements to a list of elements (by concatenating these lists together). In the context of a String, we can thus use concat :: [String] -> String to join the numbers together.
So we can then use:
printConcat :: Show a => [a] -> IO ()
printConcat = putStrLn . concat . map show
This then generates:
Prelude> printConcat [1,2,3,4]
1234
Note that the printConcat function is not limited to numbers (integers), it can take any type of objects that are a type instance of the Show class.
I have a list of tuples in the form:
[(String, Int)]
How can I print this to display like:
String : Int
String : Int
String : Int
...
I am very new to Haskell so please make it as clear as possible. Thank you!
Update: Here's how the code of my program now looks:
main = do
putStrLn "********* Haskell word frequency counter *********"
putStrLn ""
conts <- readFile "text.txt"
let lowConts = map toLower conts
let counted = countAllWords (lowConts)
let sorted = sortTuples (counted)
let reversed = reverse sorted
putStrLn "Word : Count"
mapM_ (printTuple) reversed
-- Counts all the words.
countAllWords :: String -> [(String, Int)]
countAllWords fileContents = wordsCount (toWords (noPunc fileContents))
-- Splits words and removes linking words.
toWords :: String -> [String]
toWords s = filter (\w -> w `notElem` ["and","the","for"]) (words s)
-- Remove punctuation from text String.
noPunc :: String -> String
noPunc xs = [ x | x <- xs, not (x `elem` ",.?!-:;\"\'") ]
-- Counts, how often each string in the given list appears.
wordsCount :: [String] -> [(String, Int)]
wordsCount xs = map (\xs -> (head xs, length xs)) . group . sort $ xs
-- Sort list in order of occurrences.
sortTuples :: [(String, Int)] -> [(String, Int)]
sortTuples sort = sortBy (comparing snd) sort
printTuple :: Show a => [(String, a)] -> IO ()
printTuple xs = forM_ xs (putStrLn . formatOne)
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
It returns this error to me:
fileToText.hs:18:28:
Couldn't match type ‘(String, Int)’ with ‘[(String, a0)]’
Expected type: [[(String, a0)]]
Actual type: [(String, Int)]
In the second argument of ‘mapM_’, namely ‘reversed’
In a stmt of a 'do' block: mapM_ (printTuple) reversed
Thanks for any help!
let's start by formatting one item:
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
now you can use this function (for example) with forM_ from Control.Monad to print it to the screen like this (forM_ because we want to be in the IO-Monad - because we are going to use putStrLn):
Prelude> let test = [("X1",4), ("X2",5)]
Prelude> import Control.Monad (forM_)
Prelude Control.Monad> forM_ test (putStrLn . formatOne)
X1 : 4
X2 : 5
in a file you would use it like this:
import Control.Monad (forM_)
printTuples :: Show a => [(String, a)] -> IO ()
printTuples xs = forM_ xs (putStrLn . formatOne)
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
compiling file
overall here is a version of your code that will at least compile (cannot test it without the text file ;) )
import Control.Monad (forM_)
import Data.Char (toLower)
import Data.List (sort, sortBy, group)
import Data.Ord (comparing)
main :: IO ()
main = do
putStrLn "********* Haskell word frequency counter *********"
putStrLn ""
conts <- readFile "text.txt"
let lowConts = map toLower conts
let counted = countAllWords lowConts
let sorted = sortTuples counted
let reversed = reverse sorted
putStrLn "Word : Count"
printTuples reversed
-- Counts all the words.
countAllWords :: String -> [(String, Int)]
countAllWords = wordsCount . toWords . noPunc
-- Splits words and removes linking words.
toWords :: String -> [String]
toWords = filter (\w -> w `notElem` ["and","the","for"]) . words
-- Remove punctuation from text String.
noPunc :: String -> String
noPunc xs = [ x | x <- xs, x `notElem` ",.?!-:;\"\'" ]
-- Counts, how often each string in the given list appears.
wordsCount :: [String] -> [(String, Int)]
wordsCount = map (\xs -> (head xs, length xs)) . group . sort
-- Sort list in order of occurrences.
sortTuples :: [(String, Int)] -> [(String, Int)]
sortTuples = sortBy $ comparing snd
-- print one tuple per line separated by " : "
printTuples :: Show a => [(String, a)] -> IO ()
printTuples = mapM_ (putStrLn . formatTuple)
where formatTuple (s,i) = s ++ " : " ++ show i
I also removed the compiler warnings and HLINTed it (but skipped the Control.Arrow stuff - I don't think head &&& length is more readable option here)
I'm trying to write a program that allows the user to build up a list of strings by entering them in one at a time, and displays the list after every step.
Here is my code so far:
buildList :: [String] -> IO ()
buildList arr = do
putStr "Enter a line:"
str <- getLine
if str == "" then
return ()
else do
let newarr = arr : str
putStrLn ("List is now: " ++ newarr)
buildList newarr
listBuilder :: IO ()
listBuilder = do
buildList []
listBuilder is starting the list by passing in the empty list, and I'm trying to use recursion so that the code keeps running until the user enters the empty string.
Its not working, any ideas welcome
Here is a desired input:
Enter a line: hello
List is now ["hello"]
Enter a line: world
List is now ["hello","world"]
Enter a line:
Error:
Couldn't match type `Char' with `[String]'
Expected type: [[String]]
Actual type: String
In the second argument of `(:)', namely `str'
In the expression: arr : str
In an equation for `newarr': newarr = arr : str
EDIT:
This fixed it, thanks to the clues and use of show
buildList :: [String] -> IO ()
buildList arr = do
putStr "Enter a line:"
str <- getLine
if str == "" then
return ()
else do
let newarr = arr++[str]
putStrLn ("List is now: " ++ show newarr)
buildList newarr
listBuilder :: IO ()
listBuilder = do
buildList []
You can get this working by
(a) putting the new string at the end of the list with arr++[str] instead of arr:str since : can only be used like singleThing:list,
(b) splitting the run-round into a separate function, and
(c) passing the result on with return so you can use it elsewhere in your program
buildList arr = do
putStrLn "Enter a line:"
str <- getLine
if str == "" then
return arr
else do
tell (arr++[str])
tell arr = do
putStrLn ("List is now: " ++ show arr) -- show arr to make it a String
buildList arr
giving
Enter a line:
Hello
List is now: ["Hello"]
Enter a line:
world
List is now: ["Hello","world"]
Enter a line:
done
You can solve this problem more declaratively using the pipes and foldl libraries:
import Control.Foldl (purely, list)
import Pipes
import qualified Pipes.Prelude as P
main = runEffect $ P.stdinLn >-> purely P.scan list >-> P.print
You can read this as a pipeline:
P.stdinLn is a source of lines input by the user
P.scan behaves like Data.List.scanl, except for pipelines instead of lists. purely P.scan list says to continuously output the values seen so far.
P.print prints these output lists to the console
Here's an example of this pipeline in action:
$ ./example
[]
Test<Enter>
["Test"]
ABC<Enter>
["Test","ABC"]
42<Enter>
["Test","ABC","42"]
<Ctrl-D>
$
You can also easily switch out other ways to fold the lines just by changing the argument to purely scan. For example, if you switch out list with Control.Foldl.vector then it will output vectors of lines instead of lists.
To learn more, you can read the documentation for the pipes and foldl libraries.
The problem is that the : data constructor can only be used to append an element to the beginning of the list. When you write let arr=arr:str, you are using it to put an element at the end of the list. Instead, you can either construct your list backwards like this let arr=str:arr or use the ++ operator to append it to the end of the list like this let arr=arr++[str].
so I'm creating a program that will pick one of two libraries (audio.lhs or video.lhs) and will return a pdf with a list ordered and filtered by a given category:
mymain = do {putStrLn "What do you wanna search, Video or Audio?";
tipo <- getLine;
if tipo == "Audio"
then do {
a <- readFile "audio.lhs" ;
let text = splitEvery 7 (splitRegex (mkRegex "\t") a)
list = map musicFile text
select = filter ((>1000) .size) list
orderList = sortBy (comparing title)
dir = Dir orderList
hs = "import Dir\nimport TeX\nimport System.Cmd"
++ "\ntoTeX= do { writeFile \"out.tex\" $ prettyprint dat ;"
++ "system \"pdflatex out\"}"
++ "\ndat="
++ show dir
in do { writeFile "dat.hs" hs ;
putStrLn "\nOk.\nNow load \'dat.hs\' and run \'toTeX\'\n"
}}...
Everything is running but now i need that the functions
select = filter ((>1000) .size) list
and
orderList = sortBy (comparing title)
instead of working with values that are given by me, i want them to work with values choosen by the user of the program (inputs), so if he wants to filter files that are >2000 or <500 is his choice and same with the category,size or title or another thing.
My data structure is
data File = File {
filename :: String ,
size :: Int ,
filetype :: String ,
copyright :: String ,
title :: String ,
artist :: String ,
year :: String } deriving Show
and
musicFile :: [String] -> File
musicFile [name, size, tipo, copy, title, artist, year] = File name (read size) tipo copy title artist year
Any help would be gladly appreciated.
Thanks in advance.
The simplest mechanism available in Haskell for parsing strings is the Read typeclass. Instances of this class have enough functionality to implement
read :: (Read a) => String -> a
readLn :: (Read a) => IO a
either of which should be enough to get you started on your way to reading an Int (which is an instance of Read) from input.