Reading from file and creating a data in haskell - list

I have been trying to a read file creating data in haskell. i have the following data type constructed:
data Participant = Participant {name:: String, age:: Int,
country:: String, }
The text file i have is in this format
Jack 21 England
Natalie 20 France
Sophie 24 France
Each word corresponds to name age and country respectivly. I want to read the file and create a list of participants. But IO:String seem to be a pain in the neck. Do you have a suitable soliton for this problem.

IO isolation is not a defect, but a feature of Haskell, and one of its clear success.
If you want to serialize data to file, and load data back into your program, you don't have to do that by hand. Use any serializing library, there are many of them. You can also use the great Aeson library, that can convert your data in JSON and load them back.
If you want to do that by hand for any reason, you must first define your own file format, and have it unambiguous. For example, what happens on your example if name contains a space ?
Then you should define a function that can parse one line of your file format and produce a Participant readParticipant :: String -> Maybe Participant. Notice the maybe, because if the string is ill-formated, your function won't be able to create a participant out of its hat, so it will produce a Nothing.
Then you can have a function that parse a list of participants (notice the plural) readParticipants :: String -> [Participants]. No Maybe here because the list itself allows for failure.
Now you can have a tiny IO function that will read the file content, and run readParticipants on it.
readParticipantsIO :: IO [Participants]
readParticipantsIO = readParticipants <$> readFile "participants.data"
-- alternative definition, exact same behaviour in this case
readParticipantsIO :: IO [Participants]
readParticipantsIO = do
content <- readFile "participants.data"
return $ readParticipants content

The lines and words functions can be used to split the string into a list of strings, and then you can package these strings into a list of Participant.
import Text.Read (readMaybe)
data Participant = Participant { name :: String, age :: Int, country :: String }
parseParticipants :: String -> [Participant]
parseParticipants fileContents = do
[name', age', country'] <- words <$> lines fileContents
Just age'' <- return (readMaybe age')
return (Participant { name = name', age = age'', country = country' })
main :: IO ()
main = do
participants <- parseParticipants <$> readFile "yourfile.txt"
-- do things with the participants
return ()

Related

Iterate a column values in a Stream dataframe and assign each value to a common list using Scala and Spark

I have the following Stream Dataframe
+------------------------------------+
|______sentence______________________|
| Representative is a scientist |
| Norman did a good job in the exam |
| you want to go on shopping? |
--------------------------------------
I have list as follows
val myList
as the final output i need myList contain above three sentences in the stream dataframe
output
myList = [Representative is a scientist, Norman did a good job in the exam, you want to go on shopping? ]
I tried the following which gives stream error
val myList = sentenceDataframe.select("sentence").rdd.map(r => r(0)).collect.toList
Error thrown with above method
org.apache.spark.sql.AnalysisException: Queries with streaming sources
must be executed with writeStream.start()
Please note that above method work with normal datframe but not with stream dataframe.
Is there a way to iterate through each row of the stream dataframe and assign the row value into the common list using scala and spark ?
That sounds like a very weird Use-Case as the stream could theoretically never end. Are you sure you are not just looking for common spark DataFrames?
if that is not the case what you can do is use Accumulators and sparks streaming foreachBatch sink. I used a simple socket connection to demonstrate this. You can start a simple socket server under e.g. ubuntu with nc -lp 3030 and just past message there to the stream, the resulting DataFrame will have a schema of [value: String]
val acc = spark.sparkContext.collectionAccumulator[String]
val stream = spark.readStream.format("socket").option("host", "localhost").option("port", "3030").load()
val query = stream.writeStream.foreachBatch((df: DataFrame, l: Long) => {
df.collect.foreach(v => acc.add(v(0).asInstanceOf[String]))
}).start()
...
// For some reason you are stopping the stream here
query.stop()
val myList = acc.value
Now one question you might have is why are we using Accumulators and not just an ArrayBuffer. ArrayBuffers would work locally but on a cluster the code in foreachBatch might be executed on a total different node. That means it would not have any effect and thats also the reason Accumulators exist in the first place (see https://spark.apache.org/docs/latest/rdd-programming-guide.html#accumulators)

Haskell - Saving string input into a list

I'm a Haskell beginner and I am doing a small file for a project that should take the input of interaction data for groups of two people and save it to a list to be output at the end. I have done my best to implement this but it seems like the program hits the "stop" case no matter what is input. Any help or advice would be appreciated.
import Data.List
import Text.Read
main :: IO ()
main = do
putStrLn "This program is a means to record interactions between individuals during the COVID-19 pandemic."
putStrLn "Please enter your interactions in this format: 'x interacted with y'"
inputs <- getUserInputs
putStr "input: "
putStrLn ("list sequence " ++ show (inputs))
parseInput :: String -> Maybe String
parseInput input = if input == "stop" then Nothing else (readMaybe input):: Maybe String
getUserInputs :: IO [String]
getUserInputs = do
input <- getLine
case parseInput input of
Nothing -> return []
Just aString -> do
moreinputs <- getUserInputs
return (aString : moreinputs)
Show and Read are intended to produce and consume the representation of a value as a Haskell expression. That’s why when you call show on a String, it produces a quoted string:
> show "beans"
"\"beans\""
Therefore Read expects the string to be quoted as well, so readMaybe is always returning Nothing in your code because you’re not supplying quotes:
> readMaybe "beans" :: Maybe String
Nothing
> readMaybe "\"beans\"" :: Maybe String
Just "beans"
Therefore the fix is simple: remove the call to readMaybe and just return the string directly:
parseInput1 :: String -> Maybe String
parseInput1 input = if input == "stop"
then Nothing
else Just input
Which, as a matter of style preference, you could also write with guards, pattern matching, or the Maybe monad instead of if:
parseInput2 input
| input == "stop" = Nothing
| otherwise = Just input
parseInput3 "stop" = Nothing
parseInput3 input = Just input
import Control.Monad (guard)
parseInput4 input = do
-- ‘guard’ returns ‘Nothing’,
-- short-circuiting the ‘do’ block,
-- if its condition is ‘False’.
guard (input /= "stop")
pure input
Read and Show are fine for simple programs, particularly when you’re learning Haskell, but in larger applications it’s helpful to use them mostly for debug input and output and reading input you’ve already validated. Parsing and pretty-printing libraries are preferable for more involved parsing and producing human-readable output, respectively; megaparsec and prettyprinter are good default choices in that area.

How to add an element to a tuple in a list in haskell?

I have the following types and database:-
type Title = String
type Singer = [String]
type Year = Int
type Fan = String
type Fans = [Fan]
type Song = (Title, Signer, Year, Fans)
type Database = [Song]
songDatabase :: Database
songDatabase = [("Wrapped up", ["Olly Murs"], 2014, ["Garry", "Dave", "Zoe", "Kevin", "Emma"]),
("Someone Like you", ["Adele"], 2011, ["Bill", "Jo", "Garry", "Kevin", "Olga", "Liz"]),
("Drunk in Love", ["Beyonce", "Jay Z"], 2014, ["tom", "Lucy"])]
I want to add a fan to the last tuple in the list. Do i do this through using addToAl or is there other methods i can use?
Do i have to search for the data and delete it and then add the data I want? or is there a way to just add for example, "John" to the fans in the someone like you tuple.
You can't add a tuple to an existing list, all values in Haskell are immutable. Instead, you can construct a new list that contains the values you want. The best way to accomplish this is to first write a function that can add a fan to a single film:
addFan :: Fan -> Song -> Song
addFan fan (title, singer, year, fans) = ???
Then you can write a function updates a particular song in the database:
addFanInDB :: Fan -> Title -> Database -> Database
addFanInDB fan songTitle [] = []
addFanInDB fan songTitle (song:db) = ???
Since this looks very much like homework, I'm not going to give you a full solution since that defeats the purpose of the assignment. You'll need to fill in the ??? yourself.

Haskell: Scan Through a List and Apply A Different Function for Each Element

I need to scan through a document and accumulate the output of different functions for each string in the file. The function run on any given line of the file depends on what is in that line.
I could do this very inefficiently by making a complete pass through the file for every list I wanted to collect. Example pseudo-code:
at :: B.ByteString -> Maybe Atom
at line
| line == ATOM record = do stuff to return Just Atom
| otherwise = Nothing
ot :: B.ByteString -> Maybe Sheet
ot line
| line == SHEET record = do other stuff to return Just Sheet
| otherwise = Nothing
Then, I would map each of these functions over the entire list of lines in the file to get a complete list of Atoms and Sheets:
mapper :: [B.ByteString] -> IO ()
mapper lines = do
let atoms = mapMaybe at lines
let sheets = mapMaybe to lines
-- Do stuff with my atoms and sheets
However, this is inefficient because I am maping through the entire list of strings for every list I am trying to create. Instead, I want to map through the list of line strings only once, identify each line as I am moving through it, and then apply the appropriate function and store these values in different lists.
My C mentality wants to do this (pseudo code):
mapper' :: [B.ByteString] -> IO ()
mapper' lines = do
let atoms = []
let sheets = []
for line in lines:
| line == ATOM record = (atoms = atoms ++ at line)
| line == SHEET record = (sheets = sheets ++ ot line)
-- Now 'atoms' is a complete list of all the ATOM records
-- and 'sheets' is a complete list of all the SHEET records
What is the Haskell way of doing this? I simply can't get my functional-programming mindset to come up with a solution.
First of all, I think that the answers others have supplied will work at least 95% of the time. It's always good practice to code for the problem at hand by using appropriate data types (or tuples in some cases). However, sometimes you really don't know in advance what you're looking for in the list, and in these cases trying to enumerate all possibilities is difficult/time-consuming/error-prone. Or, you're writing multiple variants of the same sort of thing (manually inlining multiple folds into one) and you'd like to capture the abstraction.
Fortunately, there are a few techniques that can help.
The framework solution
(somewhat self-evangelizing)
First, the various "iteratee/enumerator" packages often provide functions to deal with this sort of problem. I'm most familiar with iteratee, which would let you do the following:
import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Maybe
-- first, you'll need some way to process the Atoms/Sheets/etc. you're getting
-- if you want to just return them as a list, you can use the built-in
-- stream2list function
-- next, create stream transformers
-- given at :: B.ByteString -> Maybe Atom
-- create a stream transformer from ByteString lines to Atoms
atIter :: Enumeratee [B.ByteString] [Atom] m a
atIter = I.mapChunks (catMaybes . map at)
otIter :: Enumeratee [B.ByteString] [Sheet] m a
otIter = I.mapChunks (catMaybes . map ot)
-- finally, combine multiple processors into one
-- if you have more than one processor, you can use zip3, zip4, etc.
procFile :: Iteratee [B.ByteString] m ([Atom],[Sheet])
procFile = I.zip (atIter =$ stream2list) (otIter =$ stream2list)
-- and run it on some data
runner :: FilePath -> IO ([Atom],[Sheet])
runner filename = do
resultIter <- enumFile defaultBufSize filename $= enumLinesBS $ procFile
run resultIter
One benefit this gives you is extra composability. You can create transformers as you like, and just combine them with zip. You can even run the consumers in parallel if you like (although only if you're working in the IO monad, and probably not worth it unless the consumers do a lot of work) by changing to this:
import Data.Iteratee.Parallel
parProcFile = I.zip (parI $ atIter =$ stream2list) (parI $ otIter =$ stream2list)
The result of doing so isn't the same as a single for-loop - this will still perform multiple traversals of the data. However, the traversal pattern has changed. This will load a certain amount of data at once (defaultBufSize bytes) and traverse that chunk multiple times, storing partial results as necessary. After a chunk has been entirely consumed, the next chunk is loaded and the old one can be garbage collected.
Hopefully this will demonstrate the difference:
Data.List.zip:
x1 x2 x3 .. x_n
x1 x2 x3 .. x_n
Data.Iteratee.zip:
x1 x2 x3 x4 x_n-1 x_n
x1 x2 x3 x4 x_n-1 x_n
If you're doing enough work that parallelism makes sense this isn't a problem at all. Due to memory locality, the performance is much better than multiple traversals over the entire input as Data.List.zip would make.
The beautiful solution
If a single-traversal solution really does make the most sense, you might be interested in Max Rabkin's Beautiful Folding post, and Conal Elliott's followup work (this too). The essential idea is that you can create data structures to represent folds and zips, and combining these lets you create a new, combined fold/zip function that only needs one traversal. It's maybe a little advanced for a Haskell beginner, but since you're thinking about the problem you may find it interesting or useful. Max's post is probably the best starting point.
I show a solution for two types of line, but it is easily extended to five types of line by using a five-tuple instead of a two-tuple.
import Data.Monoid
eachLine :: B.ByteString -> ([Atom], [Sheet])
eachLine bs | isAnAtom bs = ([ {- calculate an Atom -} ], [])
| isASheet bs = ([], [ {- calculate a Sheet -} ])
| otherwise = error "eachLine"
allLines :: [B.ByteString] -> ([Atom], [Sheet])
allLines bss = mconcat (map eachLine bss)
The magic is done by mconcat from Data.Monoid (included with GHC).
(On a point of style: personally I would define a Line type, a parseLine :: B.ByteString -> Line function and write eachLine bs = case parseLine bs of .... But this is peripheral to your question.)
It is a good idea to introduce a new ADT, e.g. "Summary" instead of tuples.
Then, since you want to accumulate the values of Summary you came make it an istance of Data.Monoid. Then you classify each of your lines with the help of classifier functions (e.g. isAtom, isSheet, etc.) and concatenate them together using Monoid's mconcat function (as suggested by #dave4420).
Here is the code (it uses String instead of ByteString, but it is quite easy to change):
module Classifier where
import Data.List
import Data.Monoid
data Summary = Summary
{ atoms :: [String]
, sheets :: [String]
, digits :: [String]
} deriving (Show)
instance Monoid Summary where
mempty = Summary [] [] []
Summary as1 ss1 ds1 `mappend` Summary as2 ss2 ds2 =
Summary (as1 `mappend` as2)
(ss1 `mappend` ss2)
(ds1 `mappend` ds2)
classify :: [String] -> Summary
classify = mconcat . map classifyLine
classifyLine :: String -> Summary
classifyLine line
| isAtom line = Summary [line] [] [] -- or "mempty { atoms = [line] }"
| isSheet line = Summary [] [line] []
| isDigit line = Summary [] [] [line]
| otherwise = mempty -- or "error" if you need this
isAtom, isSheet, isDigit :: String -> Bool
isAtom = isPrefixOf "atom"
isSheet = isPrefixOf "sheet"
isDigit = isPrefixOf "digits"
input :: [String]
input = ["atom1", "sheet1", "sheet2", "digits1"]
test :: Summary
test = classify input
If you have only 2 alternatives, using Either might be a good idea. In that case combine your functions, map the list, and use lefts and rights to get the results:
import Data.Either
-- first sample function, returning String
f1 x = show $ x `div` 2
-- second sample function, returning Int
f2 x = 3*x+1
-- combined function returning Either String Int
hotpo x = if even x then Left (f1 x) else Right (f2 x)
xs = map hotpo [1..10]
-- [Right 4,Left "1",Right 10,Left "2",Right 16,Left "3",Right 22,Left "4",Right 28,Left "5"]
lefts xs
-- ["1","2","3","4","5"]
rights xs
-- [4,10,16,22,28]

Lookup tables in OCaml

I would like to create a lookup table in OCaml. The table will have 7000+ entries that, upon lookup (by int), return a string. What is an appropriate data structure to use for this task? Should the table be externalized from the base code and if so, how does one go about "including" the lookup table to be accessible from his/her program?
Thanks.
If the strings are addressed using consecutive integers you could use an array.
Otherwise you can use a hash table (non-functional) or a Map (functional). To get started with the Map try:
module Int =
struct
type t = int
let compare = compare
end ;;
module IntMap = Map.Make(Int) ;;
If the table is too large to store in memory, you could store it in an external database and use bindings to dbm, bdb, sqlite,...
let table : (int,string) Hashtbl.t = Hashtbl.create 8192
To store the table in a separate file (e.g. as an array), simply create a file strings.ml with the content:
let tbl = [|
"String 0";
"String 1";
"String 2";
...7000 more...
|]
Compile this with:
ocamlc -c strings.ml
As explained in the manual, this defines a module Strings that other Ocaml modules can reference. For example, you can start a toplevel:
ocaml strings.cmo
And lookup a string by accessing a particular position in the array:
Strings.tbl.(1234) ;;