Haskell :: Parsing Line of Text into a List of List - list

Hi guys,
I'm parsing a text file which look like this:
114.474998474121 15.7440004348755 25.806999206543 -873 172 182 188
114.46199798584 15.7419996261597 25.8799991607666 -1396 180 192 205
And wish it can read as like this:
[[114.475,15.744,25.807,-873.0,172.0,182.0,188.0],
[114.462,15.742,25.88,-1396.0,180.0,192.0,205.0]]
Currently my code for this text parsing doesn't give that. Here is my code:
main = do
text <- readFile "mytext.txt"
let
pcVal = map read (words text) :: [Float]
print pcVal
return ()
This code parsed all the text as a single list like this:
[114.475,15.744,25.807,-873.0,172.0,182.0,188.0,
114.462,15.742,25.88,-1396.0,180.0,192.0,205.0]
I couldn't find how to take the whole single line (in text file) as a list, and the second line as another list till end of the file. Appreciate if somebody have experience in this. Thanks.

You can use the lines function; for example map words $ lines text.
This would also be a good time for a helper function, ie
let parse :: String -> [Float]
parse line = map read $ words line
pcVal = map parse $ lines text

Thanks Dan for suggesting lines as the solution. Here is the code that works fine with my question:
main = do
text <- readFile "mytext.txt"
let
readparse txtLines = map read txtLines :: [Float]
parse txtLines = map (words) (lines txtLines)
pcval = map readparse (parse text)
print pcVal
return ()
Hope it help others too. Thanks and cheers!

Related

How to differentiate between space and tab while performing file operations in ocaml

I have a dumped(.rem) file with 3 entries per line, separated by tabs - "\t" as shown below.
Hello World Ocaml
I like Ocaml
To read from this file, the type is passed in a cast(attrbs) along with the file like this:
type attrbs = list (string * string * string);
let chi = (open_in file : attrbs) in
let v = input_value chi in close_in chi
Now, I get a list in "v", which I use further. In fact, it also works if the entries are separated by space.
This works fine if all the 3 entries in a row do not contain any spaces within themselves. I would like to use another file which has the first entry as a string with spaces, second entry as a string without spaces, and third entry as any string as shown below:
This is with spaces Thisiswithoutspaces Thisissomestring
Another one with spaces Anotheronewithoutspaces AnotherString
If I use the code mentioned, since it does not differentiate between space and tab, it takes only the first three words - "This", "is", and "with". I want it to include the spaces and consider "This is with spaces" as an entire string.
I tried searching the web, but couldn't find any solution for it.
Update:
The issue was with the way I read them. If I use specific formats like "%s %s %s", they will work only if we add the # character like "%s#\t%s#\t%s". It is given under the title: "Scanning indications in format strings" in https://caml.inria.fr/pub/docs/manual-ocaml/libref/Scanf.html. The issue is solved.
Glad you managed to do this yourself.
However, I wouldn't recommend using Scanf for that. You can do this:
match String.split_on_char '\t' (input_line chi) with
| [a;b;c] -> ...
| exception End_of_file -> ...
| l_wrong_size -> ...
This way, you are not only sure to not rely on the quirky behavior of Scanf, but you can also easily specify what to do on malformed input.
The issue was with the way I read them. If I use specific formats like "%s %s %s", they will work only if we add the # character like "%s#\t%s#\t%s". It is given under the title: "Scanning indications in format strings" in https://caml.inria.fr/pub/docs/manual-ocaml/libref/Scanf.html. The issue is solved.

Replace the whole string if it contains specific letters/character

Replace the whole string if it contains specific letters/character…
I have a text file (myFile.txt) that contains multiple lines, for example:
The hotdog
The goal
The goat
What I want to do is the following:
If any word/string in the file contains the characters 'go' then, replace it with a brand new word/string ("boat"), so the output would look like this:
The hotdog
The boat
The boat
How can I accomplish this in Python 2.7?
It sounds like you want something like this:
with open('myFile.txt', 'r+') as word_bank:
new_lines = []
for line in word_bank:
new_line = []
for word in line.strip().split():
if 'go' in word:
new_line.append('boat')
else:
new_line.append(word)
new_lines.append('%s\n' % ' '.join(new_line))
word_bank.truncate(0)
word_bank.seek(0)
word_bank.writelines(new_lines)
Open the file for reading and writing, iterate through it splitting each line into component words and looking for instances of 'go' to replace. Keep in list because you do not want to modify something you're iterating over. You will have a bad time. Once constructed, truncate the file (erase it) and write what you came up with. Notice I switched to sticking an explicit '\n' on the end because writelines will not do that for you.

Change values in Python file (tab-delimited list)

I have read a *.INP file into Python. Here is the code I used:
import csv
r = csv.reader(open('T_JAC.INP')) # Here your csv file
lines = [l for l in r]
print lines[23]
print lines[26]
The first print statement produces ['9E21\t\texthere (text) text alphabets text alphanumeric'].
The second print statement produces ['4E15\t\texthere (text) text alphabets text alphanumeric'].
I need to change the numbers 7E21 and 4E15. I need to change them to values from a list fil_replace = [9E21,6E15].i.e. I need to replace 7E21 to 9E21 and I need to change 4E21 to 6E21.
Is there a way to replace these numbers?
Something with str.replace should work (as long as you read r in as a string), albeit not the most efficient solution:
r.replace('7E21', '9E21')
file = open('YAC.IN', 'w')
file.write(r)
file.close()
If you're looking for a way to just replace the values 'in place' in the file unfortunately it's not possible. The entire file has to be read in, modified, then re-written.

Reading specific characters from a file in Python

Suppose I want to read file in this format:
2
300 234 2 3
23444
If I use readline() it iterates over the entire line. What I want is for it to read only the numbers nothing else. How should I do this??
You can use re module.
import re
numbers = re.findall('[0-9]+', readline())
It will return all numbers as a list.
Use readline() to get the entire line as a string, then split the string using split(), which will return a list of strings (in your case, numbers) in the line.
Example:
line = yourFile.readline()
numList = line.split()
Now numList contains the numbers that were on that line.
Source: https://docs.python.org/2/library/stdtypes.html#str.split

Haskell - syntax in do blocks (using IO)

The compiler says
The last statement in a 'do' construct must be an expression:
rmax <- getInteger
when attempting to load a file containing the following snippets of code:
getInteger :: IO Integer
getInteger = readLn
main :: IO ()
main = do
putStrLn "specify upper limit of results"
rmax <- getInteger
if rmax `notElem` mot
then do putStrLn "run again and enter a multiple of 10"
else do print pAllSorted
What does it (the compiler message) mean, and why does it occur here? (whereas it doesn't in:)
main = do
line <- getLine
if null line
then return ()
else do
putStrLn $ reverseWords line
main
reverseWords :: String -> String
reverseWords = unwords . map reverse . words
(above example taken from http://learnyouahaskell.com/input-and-output)
Your indentation is probably messed up because of mixed tabs and spaces. In fact, there appears to be a stray tab in the code snippet in your question, which I'm assuming you pasted directly from your source file.
Most likely, GHC is interpreting the tabs differently from how your editor displays them, so it thinks the do block ends after the line in question.
As a rule of thumb, it's best to use only spaces in Haskell. The language defines very specific rules for interpreting tabs that most code editors don't agree with, but spaces are unambiguous and consistent.