replacement / substition with Haskell regex libraries - regex

Is there a high-level API for doing search-and-replace with regexes in Haskell? In particular, I'm looking at the Text.Regex.TDFA or Text.Regex.Posix packages. I'd really like something of type:
f :: Regex -> (ResultInfo -> m String) -> String -> m String
so, for example, to replace "dog" with "cat" you could write
runIdentity . f "dog" (return . const "cat") -- :: String -> String
or do more advanced things with the monad, like counting occurrences, etc.
Haskell documentation for this is pretty lacking. Some low-level API notes are here.

How about the Text.Regex.subRegex in package regex-compat?
Prelude> import Text.Regex (mkRegex, subRegex)
Prelude> :t mkRegex
mkRegex :: String -> Regex
Prelude> :t subRegex
subRegex :: Regex -> String -> String -> String
Prelude> subRegex (mkRegex "foo") "foobar" "123"
"123bar"

I don't know of any existing function that creates this functionality, but I think that I'd end up using something like the AllMatches [] (MatchOffset, MatchLength) instance of RegexContent to simulate it:
replaceAll :: RegexLike r String => r -> (String -> String) -> String -> String
replaceAll re f s = start end
where (_, end, start) = foldl' go (0, s, id) $ getAllMatches $ match re s
go (ind,read,write) (off,len) =
let (skip, start) = splitAt (off - ind) read
(matched, remaining) = splitAt len matched
in (off + len, remaining, write . (skip++) . (f matched ++))
replaceAllM :: (Monad m, RegexLike r String) => r -> (String -> m String) -> String -> m String
replaceAllM re f s = do
let go (ind,read,write) (off,len) = do
let (skip, start) = splitAt (off - ind) read
let (matched, remaining) = splitAt len matched
replacement <- f matched
return (off + len, remaining, write . (skip++) . (replacement++))
(_, end, start) <- foldM go (0, s, return) $ getAllMatches $ match re s
start end

Based on #rampion's answer, but with the typo fixed so it doesn't just <<loop>>:
replaceAll :: Regex -> (String -> String) -> String -> String
replaceAll re f s = start end
where (_, end, start) = foldl' go (0, s, id) $ getAllMatches $ match re s
go (ind,read,write) (off,len) =
let (skip, start) = splitAt (off - ind) read
(matched, remaining) = splitAt len start
in (off + len, remaining, write . (skip++) . (f matched ++))

You can use replaceAll from the Data.Text.ICU.Replace module.
Prelude> :set -XOverloadedStrings
Prelude> import Data.Text.ICU.Replace
Prelude Data.Text.ICU.Replace> replaceAll "cat" "dog" "Bailey is a cat, and Max is a cat too."
"Bailey is a dog, and Max is a dog too."

maybe this approach fit you.
import Data.Array (elems)
import Text.Regex.TDFA ((=~), MatchArray)
replaceAll :: String -> String -> String -> String
replaceAll regex new_str str =
let parts = concat $ map elems $ (str =~ regex :: [MatchArray])
in foldl (replace' new_str) str (reverse parts)
where
replace' :: [a] -> [a] -> (Int, Int) -> [a]
replace' new list (shift, l) =
let (pre, post) = splitAt shift list
in pre ++ new ++ (drop l post)

For doing “search-and-replace” with “more advanced things with the monad, like counting occurrences, etc,” I recommend Replace.Megaparsec.streamEditT.
See the package README for specific examples of how to count occurrences.

Related

Haskell Text Parser Combinators to parse a Range Greedily like Regex range notation

In regex you can acquire a range of a parse by doing something like \d{1,5}, which parses a digit 1 to 5 times greedily. Or you do \d{1,5}? to make it lazy.
How would you do this in Haskell's Text.ParserCombinators.ReadP?
My attempt gave this:
rangeParse :: Read a => ReadP a -> [Int] -> ReadP [a]
rangeParse parse ranges = foldr1 (<++) $ fmap (\n -> count n $ parse) ranges
Which if you do it like rangeParse (satisfy isDigit) ([5,4..1]) will perform a greedy parse of digits 1 to 5 times. While if you swap the number sequent to [1..5], you get a lazy parse.
Is there a better or more idiomatic way to do this with parser combinators?
update: the below is wrong - for example
rangeGreedy 2 4 a <* string "aab", the equivalent of regexp a{2,4}aab, doesn't match. The questioner's solution gets this right. I won't delete the answer just yet in case it keeps someone else from making the same mistake.
=========
This isn't a complete answer, just a possible way to write the greedy
version. I haven't found a nice way to do the lazy version.
Define a left-biased version of option that returns Maybes:
greedyOption :: ReadP a -> ReadP (Maybe a)
greedyOption p = (Just <$> p) <++ pure Nothing
Then we can do up to n of something with a replicateM of them:
upToGreedy :: Int -> ReadP a -> ReadP [a]
upToGreedy n p = catMaybes <$> replicateM n (greedyOption p)
To allow a minimum count, do the mandatory part separately and append
it:
rangeGreedy :: Int -> Int -> ReadP a -> ReadP [a]
rangeGreedy lo hi p = (++) <$> count lo p <*> upToGreedy (hi - lo) p
The rest of my test code in case it's useful for anyone:
module Main where
import Control.Monad (replicateM)
import Data.Maybe (catMaybes)
import Text.ParserCombinators.ReadP
main :: IO ()
main = mapM_ go ["aaaaa", "aaaab", "aaabb", "aabbb", "abbbb", "bbbbb"]
where
go = print . map fst . readP_to_S test
test :: ReadP [String]
test = ((++) <$> rangeGreedy 2 4 a <*> many aOrB) <* eof
where
a = char 'a' *> pure "ay"
aOrB = (char 'a' +++ char 'b') *> pure "ayorbee"

Haskell split string on last occurence

Is there any way I can split String in Haskell on the last occurrence of given character into 2 lists?
For example I want to split list "a b c d e" on space into ("a b c d", "e").
Thank you for answers.
I'm not sure why the solutions suggested are so complicated. Only one two traversals are needed:
splitLast :: Eq a => a -> [a] -> Either [a] ([a],[a])
splitLast c' = foldr go (Left [])
where
go c (Right (f,b)) = Right (c:f,b)
go c (Left s) | c' == c = Right ([],s)
| otherwise = Left (c:s)
Note this is total and clearly signifies its failure. When a split is not possible (because the character specified wasn't in the string) it returns a Left with the original list. Otherwise, it returns a Right with the two components.
ghci> splitLast ' ' "hello beautiful world"
Right ("hello beautiful","world")
ghci> splitLast ' ' "nospaceshere!"
Left "nospaceshere!"
Its not beautiful, but it works:
import Data.List
f :: Char -> String -> (String, String)
f char str = let n = findIndex (==char) (reverse str) in
case n of
Nothing -> (str, [])
Just n -> splitAt (length str - n -1) str
I mean f 'e' "a b c d e" = ("a b c d ", "e"), but I myself wouldn't crop that trailing space.
I would go with more pattern matching.
import Data.List
splitLast = contract . words
where contract [] = ("", "")
contract [x] = (x, "")
contract [x,y] = (x, y)
contract (x:y:rest) = contract $ intercalate " " [x,y] : rest
For long lists, we just join the first two strings with a space and try the shorter list again. Once the length is reduced to 2, we just return the pair of strings.
(x, "") seemed like a reasonable choice for strings with no whitespace, but I suppose you could return ("", x) instead.
It's not clear that ("", "") is the best choice for empty strings, but it seems like a reasonable alternative to raising an error or changing the return type to something like Maybe (String, String).
I can propose the following solution:
splitLast list elem = (reverse $ snd reversedSplit, reverse $ fst reversedSplit)
where
reversedSplit = span (/= elem) $ reverse list
probably not the fastest one (two needless reverses) but I like it's simplicity.
If you insist on removing the space we're splitting on, you can go for:
import qualified Data.List as List
splitLast list elem = splitAt (last $ List.elemIndices elem list) list
however, this version assumes that there will be at least one element matching the pattern. If you don't like this assumption, the code gets slightly longer (but no double-reversals here):
import qualified Data.List as List
splitLast list elem = splitAt index list where
index = if null indices then 0 else last indices
indices = List.elemIndices elem list
Of course, choice of splitting at the beginning is arbitrary and probably splitting at the end would be more intuitive for you - then you can simply replace 0 with length list
My idea is to split at every occurrence and then separate the initial parts from the last part.
Pointed:
import Control.Arrow -- (&&&)
import Data.List -- intercalate
import Data.List.Split -- splitOn
breakOnLast :: Eq a => a -> [a] -> ([a], [a])
breakOnLast x = (intercalate x . init &&& last) . splitOn x
Point-free:
liftA2 (.) ((&&& last) . (. init) . intercalate) splitOn
(.) <$> ((&&&) <$> ((.) <$> pure init <*> intercalate) <*> pure last) <*> splitOn

string to list of lists of rhyming words

My goal is a function which takes a sentence and returns a list of lists with the words rhyming (rhyming = last 3 chars are equal).
Example: "Six sick hicks nick six slick bricks with picks and sticks." ->
[[Six,six],[sick,nick,slick],[hicks,bricks,picks,sticks],[with]]
This is my code so far (bsort is bubblesort):
rhymeWords:: String -> [[String]]
rhymeWords "" = []
rhymeWords xs = bsort (words (reverse xs))
I do not know how to translate it into code but I would like to take the first three chars of the first string and put them into a list. Then take the next String and test if it is equal to the first. If true put the second string into the first list otherwise create a second list. Then move on to the third string, each time testing with previous lists.
Can anyone please help me?
The following code groups rhymes as requested, although it converts all characters to lower case.
import Data.List (sort)
import Data.Char (toLower)
rhymeWords:: String -> [[String]]
rhymeWords "" = []
rhymeWords xs = [map reverse g | g <- groupRhymes (sortRhymes xs) []]
where sortRhymes xs = sort $ map reverse (words [toLower x | x <- xs])
groupRhymes :: [String] -> [[String]] -> [[String]]
groupRhymes [] acc = acc
groupRhymes (x:xs) acc = case acc of
[] -> groupRhymes xs [[x]]
_ -> if take 3 x == take 3 (head (last acc))
then groupRhymes xs ((init acc) ++ [(last acc) ++ [x]])
else groupRhymes xs (acc ++ [[x]])
Example result:
hymeWords "Six sick hicks nick six slick bricks with picks and sticks"
[["and"],["with"],["slick","nick","sick"],["hicks","picks","bricks","sticks"],["six","six"]]
Note that the example input doesn't have a period at the end of the sentence, because the last word would include it and break the sorting. You'll need to fiddle a bit with presented code if you need to pass sentences with a period.
When you have to group items together, you can use Data.List's grouping higher order functions. With groupBy you can easily solve your problem just by writing your grouping function. In your case, you want to group words that rhyme together. You just have to write the function rhyming:
rhyming :: String -> String -> Bool
rhyming word1 word2 = last3 (lower word1) == last3 (lower word2)
where
last3 = take 3 . reverse -- if you wanted `last3` to return the last three characters in order, you'd just have to apply `reverse` to the result, but that's unnecessary here
lower = map toLower
So your rhymeWords function can be written like so:
import Data.List (groupBy, sort)
import Data.Char (toLower)
rhyming :: String -> String -> Bool
rhyming word1 word2 = last3 (lowercase word1) == last3 (lowercase word2)
where
last3 = take 3 . reverse
lowercase = map toLower
rhymeWords :: String -> [[String]]
rhymeWords = groupBy rhyming . map reverse . sort . map reverse . words
The map reverse . sort . map reverse thing is needed since groupBy groups elements that are next to another. It groups words that are likely to rhyme together.

Haskell - Printing a list of tuples

I have a list of tuples in the form:
[(String, Int)]
How can I print this to display like:
String : Int
String : Int
String : Int
...
I am very new to Haskell so please make it as clear as possible. Thank you!
Update: Here's how the code of my program now looks:
main = do
putStrLn "********* Haskell word frequency counter *********"
putStrLn ""
conts <- readFile "text.txt"
let lowConts = map toLower conts
let counted = countAllWords (lowConts)
let sorted = sortTuples (counted)
let reversed = reverse sorted
putStrLn "Word : Count"
mapM_ (printTuple) reversed
-- Counts all the words.
countAllWords :: String -> [(String, Int)]
countAllWords fileContents = wordsCount (toWords (noPunc fileContents))
-- Splits words and removes linking words.
toWords :: String -> [String]
toWords s = filter (\w -> w `notElem` ["and","the","for"]) (words s)
-- Remove punctuation from text String.
noPunc :: String -> String
noPunc xs = [ x | x <- xs, not (x `elem` ",.?!-:;\"\'") ]
-- Counts, how often each string in the given list appears.
wordsCount :: [String] -> [(String, Int)]
wordsCount xs = map (\xs -> (head xs, length xs)) . group . sort $ xs
-- Sort list in order of occurrences.
sortTuples :: [(String, Int)] -> [(String, Int)]
sortTuples sort = sortBy (comparing snd) sort
printTuple :: Show a => [(String, a)] -> IO ()
printTuple xs = forM_ xs (putStrLn . formatOne)
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
It returns this error to me:
fileToText.hs:18:28:
Couldn't match type ‘(String, Int)’ with ‘[(String, a0)]’
Expected type: [[(String, a0)]]
Actual type: [(String, Int)]
In the second argument of ‘mapM_’, namely ‘reversed’
In a stmt of a 'do' block: mapM_ (printTuple) reversed
Thanks for any help!
let's start by formatting one item:
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
now you can use this function (for example) with forM_ from Control.Monad to print it to the screen like this (forM_ because we want to be in the IO-Monad - because we are going to use putStrLn):
Prelude> let test = [("X1",4), ("X2",5)]
Prelude> import Control.Monad (forM_)
Prelude Control.Monad> forM_ test (putStrLn . formatOne)
X1 : 4
X2 : 5
in a file you would use it like this:
import Control.Monad (forM_)
printTuples :: Show a => [(String, a)] -> IO ()
printTuples xs = forM_ xs (putStrLn . formatOne)
formatOne :: Show a => (String, a) -> String
formatOne (s,i) = s ++ " : " ++ show i
compiling file
overall here is a version of your code that will at least compile (cannot test it without the text file ;) )
import Control.Monad (forM_)
import Data.Char (toLower)
import Data.List (sort, sortBy, group)
import Data.Ord (comparing)
main :: IO ()
main = do
putStrLn "********* Haskell word frequency counter *********"
putStrLn ""
conts <- readFile "text.txt"
let lowConts = map toLower conts
let counted = countAllWords lowConts
let sorted = sortTuples counted
let reversed = reverse sorted
putStrLn "Word : Count"
printTuples reversed
-- Counts all the words.
countAllWords :: String -> [(String, Int)]
countAllWords = wordsCount . toWords . noPunc
-- Splits words and removes linking words.
toWords :: String -> [String]
toWords = filter (\w -> w `notElem` ["and","the","for"]) . words
-- Remove punctuation from text String.
noPunc :: String -> String
noPunc xs = [ x | x <- xs, x `notElem` ",.?!-:;\"\'" ]
-- Counts, how often each string in the given list appears.
wordsCount :: [String] -> [(String, Int)]
wordsCount = map (\xs -> (head xs, length xs)) . group . sort
-- Sort list in order of occurrences.
sortTuples :: [(String, Int)] -> [(String, Int)]
sortTuples = sortBy $ comparing snd
-- print one tuple per line separated by " : "
printTuples :: Show a => [(String, a)] -> IO ()
printTuples = mapM_ (putStrLn . formatTuple)
where formatTuple (s,i) = s ++ " : " ++ show i
I also removed the compiler warnings and HLINTed it (but skipped the Control.Arrow stuff - I don't think head &&& length is more readable option here)

haskell list character frequenceies

I am having trouble with an assignment question!
Write the function
freq2 :: String -> -> [(Int,[Char])]
Like freq, the function freq2 counts frequency of occurrence of alphabetic characters.
Given the string:
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness
I need to end up with:
[(1,"qv"), (2,"gm"), (3,"cfpwy"), (4,"b"), (5,"u"), (6,"do"),(8,"s"), (9,"ln"), (10,"i"), (12,"r"), (13,"h"), (16,"a"),(22,"t"), (28,"e")]
So far I can get to:
[('q',1),('v',1),('g',2),('m',2),('c',3),('f',3),('p',3),('w',3),('y',3),('b',4),('u',5),('d',6),('o',6),('s',8),('l',9),('n',9),('i',10),('r',12),('h',13),('a',16),('t',22),('e',28)]
Using:
freq2 :: String -> [(Char,Int)]
freq2 input = result2
where
lower_case_list = L.map C.toLower input
filtered_list = L.filter C.isAlpha lower_case_list
result = L.map (\a -> (L.head a, L.length a)) $ L.group $ sort filtered_list
result2 = sortBy (compare `on` snd) result
Is there an easy way to get to the last stage or to do the whole thing, possibly using library functions? Or can you please provide some direction on how to finish off this question?
Thanks
Something like this appended to your solution should work:
result3 = map (\xs#((_,x):_) -> (x, map fst xs)) $ L.groupBy ((==) `on` snd) result2
My preference would be to use a Map for these types of problems though:
import qualified Data.Map as Map
import qualified Data.Char as C
import qualified Data.Tuple as T
string = filter C.isAlpha $ map C.toLower "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness"
swapMapWith f = Map.fromListWith f . map T.swap . Map.toList
freq2 :: String -> [(Int, String)]
freq2 = Map.toList . swapMapWith (++) . foldl (\agg c -> Map.insertWith (+) [c] 1 agg) Map.empty
Method 1:
import needed modules
import Data.Char
import Data.List
filter out uninterested characters and convert the rest to lower case
toLowerAlpha :: String -> String
toLowerAlpha = map toLower . filter isAlpha
sort first, then group, after that the length of each group is the frequency of character in that group
elemFreq :: (Ord a) => [a] -> [(Int, a)]
elemFreq = map (\l -> (length l, head l)) . group . sort
sort and group as step 2, but according to frequency at here, then combine all those characters that have the same frequencies
groupByFreq :: (Integral a, Ord b) => [(a, b)] -> [[(a, b)]]
groupByFreq = groupBy (onFreq (==)) . sortBy (onFreq compare)
where onFreq op (f1,_) (f2,_) = op f1 f2
collectByFreq :: (Integral a) => [[(a, b)]] -> [(a, [b])]
collectByFreq = map (\ls -> (fst . head $ ls, map snd ls))
sequence the above functions will give the required function
freq2 = collectByFreq . groupByFreq . elemFreq . toLowerAlpha
Method 2:
import needed modules
import qualified Data.Char as Char
import qualified Data.Map as Map
filter out uninterested characters and convert the rest to lower case
toLowerAlpha :: String -> String
toLowerAlpha = map Char.toLower . filter Char.isAlpha
create a map, key and value are character and corresponding frequency, respectively
toFreqMap :: (Ord a, Num b) => [a] -> Map.Map a b
toFreqMap = foldr (\c -> Map.insertWith (+) c 1) Map.empty
convert the map created in step 2 to another map, using frequency as key, and characters have that frequency as value
toFreqCol :: (Ord a, Ord b) => Map.Map a b -> Map.Map b [a]
toFreqCol = Map.foldrWithKey (\k a m -> Map.insertWith (++) a [k] m) Map.empty
sequence the above functions will give the required function
freq2 = Map.toAscList . toFreqCol . toFreqMap . toLowerAlpha