Splitting a string at every 2 newline characters in haskell [duplicate] - regex

This question already has answers here:
What is the best way to split a string by a delimiter functionally?
(9 answers)
Closed 8 months ago.
My input looks like
abc
a
b
c
abc
abc
abc
abc
I need a function that would split it into something like
[ "abc"
, "a\nb\nc"
, "abc\nabc\nabc\nabc"]
I've tried using regexes, but
I can't import Text.Regex itself
Module Text.Regex.Base does not export splitStr

It's generally a bad idea to use regex in such cases, since it's less readable then pure and concise code, that can be used here.
For example using foldr, the only case where we should add new string into lists of strings is the case where last seen element and current element are newline's:
split :: FilePath -> IO [String]
split path = do
text <- readFile path
return $ foldr build [[]] (init text)
where
build c (s:ls) | s == [] || head s /= '\n' || c /= '\n' = (c:s) : ls
| otherwise = [] : tail s : ls
This code produces the aforementioned result when it is given file with aforementioned content.

Related

Combining string modification and concatenation in haskell

So my problem is to take a string in haskell and to modify it so that if there are certain characters, they are changed to other characters, and I have created a helper function to do this, however there is one case where if the character is '!' then it become '!!!111oneone', so i figure to do this you would need to concatenate the current string with '!!111oneone', the trouble is that my function was working with chars however to do this we would need to work with the string, how would you combine this, ie a helper to modify the chars if necessary and implementing the conversion if there is a '!'.
Here is what i have so far
convert :: String -> String
convert [] = []
convert (x:xs) =
| x == '!' = !helper
| otherwise = converthelper x
Assuming your helper is something like
helper :: Char -> String
helper '!' = "!!!111oneone"
helper c = [c]
then you can use concatMap to map helper over each character in your string, and then concatenate the results into a single string.
convert :: String -> String
convert = concatMap helper
-- convert msg = concatMap helper msg
The trick is that your helper promotes every character to a list of characters; most characters just become the corresponding one-character string, but ! becomes something more.
(Note that concatMap forms the basis of the Monad instance for lists. You could also write convert msg = msg >>= helper.)

Regex statement to replace all underscore with space [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I can't use methods like Replace so I need a Regex statement that will replace underscores and add a space instead.
I thought that /([^_])/ would at least return the string without the underscore but it only returns certain strings with the first character.
Sample String x is:
val x = "this_string_contains_Underscore_characters."
Use the below command on this string x:
x.split("_").mkString(" ")
or Use replaceAll:
x.replaceAll("_", " ")
In Scala REPL:
scala> val x = "this_string_contains_Underscore_characters."
x: String = this_string_contains_Underscore_characters.
scala> x.split("_").mkString(" ")
res28: String = this string contains Underscore characters.
scala> x.replaceAll("_", " ")
res50: String = this string contains Underscore characters.

Finding permutations using regular expressions

I need to create a regular expression (for program in haskell) that will catch the strings containing "X" and ".", assuming that there are 4 "X" and only one ".". It cannot catch any string with other X-to-dot relations.
I have thought about something like
[X\.]{5}
But it catches also "XXXXX" or ".....", so it isn't what I need.
That's called permutation parsing, and while "pure" regular expressions can't parse permutations it's possible if your regex engine supports lookahead. (See this answer for an example.)
However I find the regex in the linked answer difficult to understand. It's cleaner in my opinion to use a library designed for permutation parsing, such as megaparsec.
You use the Text.Megaparsec.Perm module by building a PermParser in a quasi-Applicative style using the <||> operator, then converting it into a regular MonadParsec action using makePermParser.
So here's a parser which recognises any combination of four Xs and one .:
import Control.Applicative
import Data.Ord
import Data.List
import Text.Megaparsec
import Text.Megaparsec.Perm
fourXoneDot :: Parsec Dec String String
fourXoneDot = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = [a, b, c, d, e]
x = char 'X'
dot = char '.'
I'm applying the mkFive function, which just stuffs its arguments into a five-element list, to four instances of the x parser and one dot, combined with <||>.
ghci> parse fourXoneDot "" "XXXX."
Right "XXXX."
ghci> parse fourXoneDot "" "XX.XX"
Right "XXXX."
ghci> parse fourXoneDot "" "XX.X"
Left {- ... -}
This parser always returns "XXXX." because that's the order I combined the parsers in: I'm mapping mkFive over the five parsers and it doesn't reorder its arguments. If you want the permutation parser to return its input string exactly, the trick is to track the current position within the component parsers, and then sort the output.
fourXoneDotSorted :: Parsec Dec String String
fourXoneDotSorted = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = map snd $ sortBy (comparing fst) [a, b, c, d, e]
x = withPos (char 'X')
dot = withPos (char '.')
withPos = liftA2 (,) getPosition
ghci> parse fourXoneDotSorted "" "XX.XX"
Right "XX.XX"
As the megaparsec docs note, the implementation of the Text.Megaparsec.Perm module is based on Parsing Permutation Phrases; the idea is described in detail in the paper and the accompanying slides.
The other answers look quite complicated to me, given that there are only five strings in this language. Here's a perfectly fine and very readable regex for this:
\.XXXX|X\.XXX|XX\.XX|XXX\.X|XXXX\.
Are you attached to regex, or did you just end up at regex because this was a question you didn't want to try answering with applicative parsers?
Here's the simplest possible attoparsec implementation I can think of:
parseDotXs :: Parser ()
parseDotXs = do
dotXs <- count 5 (satisfy (inClass ".X"))
let (dots,xS) = span (=='.') . sort $ dotXs
if (length dots == 1) && (length xS == 4) then do
return ()
else do
fail "Mismatch between dots and Xs"
You may need to adjust slightly depending on your input type.
There are tons of fancy ways to do stuff in applicative parsing land, but there is no rule saying you can't just do things the rock-stupid simple way.
Try the following regex :
(?<=^| )(?=[^. ]*\.)(?=(?:[^X ]*X){4}).{5}(?=$| )
Demo here
If you have one word per string, you can simplify the regex by this one :
^(?=[^. \n]*\.)(?=(?:[^X \n]*X){4}).{5}$
Demo here

haskell read a file and convert it map of list

input file is txt :
000011S\n
0001110\n
001G111\n
0001000\n
Result is:
[["0","0","0","0","1","1","S"], ["0","0","0","1","1","1","0"] [...]]
Read a text file with
file <- openFile nameFile ReadMode
and the final output
[["a","1","0","b"],["d","o","t","2"]]
is a map with list of char
try to:
convert x = map (map read . words) $ lines x
but return [[string ]]
As it could do to return the output I want? [[Char]],
is there any equivalent for word but for char?
one solution
convert :: String -> [[String]]
convert = map (map return) . lines
should do the trick
remark
the return here is a neat trick to write \c -> [c] - wrapping a Char into a singleton list as lists are a monad
how it works
Let me try to explain this:
lines will split the input into lines: [String] which each element in this list being one line
the outer map (...) . lines will then apply the function in (...) to each of this lines
the function inside: map return will again map each character of a line (remember: a String is just a list of Char) and will so apply return to each of this characters
now return here will just take a character and put it into a singleton list: 'a' -> [a] = "a" which is exactly what you wanted
your example
Prelude> convert "000011S\n0001110\n001G111\n0001000\n"
[["0","0","0","0","1","1","S"]
,["0","0","0","1","1","1","0"]
,["0","0","1","G","1","1","1"]
,["0","0","0","1","0","0","0"]]
concerning your comment
if you expect convert :: String -> [[Char]] (which is just String -> [String] then all you need is convert = lines!
[[Char]] == [String]
Prelude> map (map head) [["a","1","0","b"],["d","o","t","2"]]
["a10b","dot2"]
will fail for empty Strings though.
or map concat [[...]]

Haskell - Concat a list of strings

Im trying to create a list of strings using some recursion.
Basically i want to take a part of a string up to a certain point. Create a list from that and then process the rest of the string through recursion.
type DocName = FilePath
type Line = (Int,String)
type Document = [Line]
splitLines :: String -> Document
splitLines [] = []
splitLines str | length str == 0 = []
| otherwise = zip [0..(length listStr)] listStr
where
listStr = [getLine] ++ splitLines getRest
getLine = (takeWhile (/='\n') str)
getRest = (dropWhile (=='\n') (dropWhile (/='\n') str))
Thats what i got. But it just concats the strings back together since they are list of characters themselves. But i want to create a list of strings.
["test","123"] if the input was "test\n123\n"
Thanks
If you try to compile your code, you'll get an error message telling you that in the line
listStr = [getLine] ++ splitLines getRest
splitLines getRest has type Document, but it should have type [String]. This is easy enough to understand, since [getLine] is a list of strings (well a list of one string) and so it can only be concatenated with another list of strings, not a list of int-string-tuples.
So to fix this we can use map to replace each int-string-tuple in the Document with only the string to get a list of strings, i.e.:
listStr = [getLine] ++ map snd (splitLines getRest)
After changing the line to the above your code will compile and run just fine.
But it just concats the strings back together since they are list of characters themselves.
I'm not sure why you think that.
The reason your code did not compile was because of the type of splitLines as I explained above. Once you fix that error, the code behaves exactly as you want it to, returning a list of integer-string-tuples. At no point are strings concatenated.
Well, if you wrote this just to practice recursion then it is fine once you fix error mentioned by sepp2k. But in real code, I would prefer -
splitLines str = zip [0..] (lines str)
Or even
splitLines = zip [0..] . lines