Split a [Char] by each character in Haskell - list

I'm new to Haskell and I would like some help into how to split a sentence into separate characters.
Example: "Test sentence" == ["T","e","s","t"," ","s","e","n","t","e","n","c","e"]
I checked everywhere but cant find a solution that don't require import of separate modules and stuff.
Many thanks in advance!

In Haskell a String is a type synonym for [Char].
If you really want to turn a [Char] into a lists of one-character Strings (you probably don't):
charStrs :: String -> [String]
charStrs = fmap pure
charStrs "hello world" -- ["h","e","l","l","o"," ","w","o","r","l","d"]
edit updated to pure

Related

How can I use this regex in Haskell?

I'm trying to make a simple Haskell program that will take any line that looks like someFilenameHere0035.xml and returns 0035. My sample input file, input.txt, would look like this:
someFilenameHere0035.xml
anotherFilenameHere4465.xml
And running: cat input.txt | runhaskell getID.hs should return:
0035
4465
I'm having so much difficulty figuring this out. Here's what I have so far:
import Text.Regex.PCRE
getID :: String -> [String]
getID str = str =~ "([0-9]+)\\.xml" :: [String]
main :: IO ()
main = interact $ unlines . getID
But I get an error message I don't understand at all:
• No instance for (RegexContext Regex String [String])
arising from a use of ‘=~’
• In the expression: str =~ "([0-9]+)\\.xml" :: [String]
In an equation for ‘getID’:
getID str = str =~ "([0-9]+)\\.xml" :: [String] (haskell-stack-ghc)
I feel like I'm really close, but I don't know where to go from here. What am I doing wrong?
First off you only want the number part so we can get rid of the \\.xml.
The regex-pcre library defines an instance for RegexContext Regex String String but not RegexContext Regex String [String] hence the error.
So if we change the type signature to String -> String then that error is taken care of.
unlines expects [String] so to test what we had at this point I wrote a quick function that wraps its argument in a list (there's probably a nicer way to do that but that's not the point of the question):
toList :: a -> [a]
toList a = [a]
Running your command with main = interact $ unlines . toList . getID output 0035, so we're almost there.
getID is passed a String of the file contents, these are conveniently separated by the \n character. So we can use splitOn "\n" from the Data.List.Split library to get our list of .xml files.
Then we simply need to map getID over that list (toList is no longer needed).
This gives us:
import Text.Regex.PCRE
import Data.List.Split
getID :: String -> String
getID str = str =~ "([0-9]+)"
main :: IO ()
main = interact $ unlines . map getID . splitOn "\n"
This gives me the desired output when I run your command.
Hopefully this helps :)

Remove all emojis from a string in Haskell

I made a Mastodon / Twitter <--> IRC bot a while back. It's been working great, but someone complained that when people use emojis on mastodon (which seems to happen a lot in some usernames ..) it breaks his terminal.
I was wondering if there is a way to remove those from the ByteStrings before sending them to IRC (or at least provide an option to do so), googling a bit I found this : removing emojis from a string in Python
Looks like \U0001F600-\U0001F64F should be the emoji range if I understand it correctly, but I've never been big with regex. Any easy-ish way to translate that to Haskell ? I've tried reading up a bit on regex but I only get "lexical error in string/character literal at character 'U'" when I try, I assume that syntax must be a python thing.
Thanks
Unicode characters are represented by a single backslash, followed by an optional x for hexadecimal, o for octal and none for decimal number representing the character [0]:
putStrLn "\x1f600" -- 😀
Here, \x is a prefix for the hexadecimal representation of the first emoji character in Unicode.
You can now remove the emojis using RegExp or you could simply do:
emojis = concat [['\x1f600'..'\x1F64F'],
['\x1f300'..'\x1f5ff'],
['\x1f680'..'\x1f6ff'],
['\x1f1e0'..'\x1f1ff']]
someString = "hello 🙋"
removeEmojis = filter (`notElem` emojis)
putStrLn . removeEmojis $ someString -- "hello "
[0] Haskell Language 2010: Lexical Structure#Character and String Literals
Not a emoji or unicode expert, but this seems to work:
isEmoji :: Char -> Bool
isEmoji c = let uc = fromEnum c
in uc >= 0x1F600 && uc <= 0x1F64F
str = "😁wew😁"
As Daniel Wagner points out, this can be made even better:
isEmoji :: Char -> Bool
isEmoji c = c >= '\x1F600' && c <= '\x1F64F'
Demo in ghci:
λ> str
"\128513wew\128513"
λ> filter isEmoji str
"\128513\128513"
λ> filter (not . isEmoji) str
"wew"
Explanation: fromEnum function converts the character to the corresponding Int value defined by the Unicode. I just check for the unicode range of emoji in the function to determine if it's actually an emoji.

how to combine map and replicate on an array of integers

For a school exercise i need to generate a series of symbols with a given array of numbers. given is [3,3,2,1] output "+===+===+==+=+".
My approach would be to use map and replicate "=" on the array then intercalate "+" and finally concat the array to a single string.
My solution is something like this (while standing knee deep in errors)
printLine arr = map (replicate "=") arr >>> intercalate '*' >>> concat
what is the correct syntax? or shouldn't i use map at all?
you are on the right track, you just mixed up the functions a bit:
replicate will take a number n and repeat the second argument n-times into a list (so you just got the order wrong - you could use flip or an aux. function like I did bellow)
you have to watch out if you want Char or String ('=' VS "=" for example) - read the type-definitions (try :t intercalate or Hoogle) carefully and remember: String ~ [Char]!
intercalate actually does the concatenation so you don't need concat at all
Here is a almost working version:
eqSigns :: Int -> String
eqSigns n = replicate n '='
mixIn :: [Int] -> String
mixIn = intercalate "+" . map eqSigns
try it and see if you get the missing parts in there ;)
here is the version with flip instead:
mixIn :: [Int] -> String
mixIn = intercalate "+" . map (flip replicate '=')
PS: are you coming from some ML/F# background?

haskell regex substitution

Despite the ridiculously large number of regex matching engines for Haskell, the only one I can find that will substitute is Text.Regex, which, while decent, is missing a few thing I like from pcre. Are there any pcre-based packages which will do substitution, or am I stuck with this?
I don't think "just roll your own" is a reasonable answer to people trying to get actual work done, in an area where every other modern language has a trivial way to do this. Including Scheme. So here's some actual resources; my code is from a project where I was trying to replace "qql foo bar baz qq" with text based on calling a function on the stuff inside the qq "brackets", because reasons.
Best option: pcre-heavy:
let newBody = gsub [re|\s(qq[a-z]+)\s(.*?)\sqq\s|] (unWikiReplacer2 titles) body in do
[snip]
unWikiReplacer2 :: [String] -> String -> [String] -> String
unWikiReplacer2 titles match subList = case length subList > 0 of
True -> " --" ++ subList!!1 ++ "-- "
False -> match
Note that pcre-heavy directly supports function-based replacement, with any
string type. So nice.
Another option: pcre-light with a small function that works but isn't exactly
performant:
let newBody = replaceAllPCRE "\\s(qq[a-z]+)\\s(.*?)\\sqq\\s" (unWikiReplacer titles) body in do
[snip]
unWikiReplacer :: [String] -> (PCRE.MatchResult String) -> String
unWikiReplacer titles mr = case length subList > 0 of
True -> " --" ++ subList!!1 ++ "-- "
False -> PCRE.mrMatch mr
where
subList = PCRE.mrSubList mr
-- A very simple, very dumb "replace all instances of this regex
-- with the results of this function" function. Relies on the
-- MatchResult return type.
--
-- https://github.com/erantapaa/haskell-regexp-examples/blob/master/RegexExamples.hs
-- was very helpful to me in constructing this
--
-- I also used
-- https://github.com/jaspervdj/hakyll/blob/ea7d97498275a23fbda06e168904ee261f29594e/src/Hakyll/Core/Util/String.hs
replaceAllPCRE :: String -- ^ Pattern
-> ((PCRE.MatchResult String) -> String) -- ^ Replacement (called on capture)
-> String -- ^ Source string
-> String -- ^ Result
replaceAllPCRE pattern f source =
if (source PCRE.=~ pattern) == True then
replaceAllPCRE pattern f newStr
else
source
where
mr = (source PCRE.=~ pattern)
newStr = (PCRE.mrBefore mr) ++ (f mr) ++ (PCRE.mrAfter mr)
Someone else's fix: http://0xfe.blogspot.com/2010/09/regex-substitution-in-haskell.html
Another one, this time embedded in a major library: https://github.com/jaspervdj/hakyll/blob/master/src/Hakyll/Core/Util/String.hs
Another package for this purpose: https://hackage.haskell.org/package/pcre-utils
Update 2020
I totally agree with #rlpowell that
I don't think "just roll your own" is a reasonable answer to people trying to get actual work done, in an area where every other modern language has a trivial way to do this.
At the time of this writing, there is also Regex.Applicative.replace for regex substitution, though it's not Perl-compatible.
For pattern-matching and substitution with parsers instead of regex, there is Replace.Megaparsec.streamEdit
The regular expression API in regex-base is generic to the container of characters to match. Doing some kind of splicing generically to implements substitution would be very hard to make efficient. I did not want to provide a crappy generic routine.
Writing a small function to do the substitution exactly how you want is just a better idea, and it can be written to match your container.

Regular expressions versus lexical analyzers in Haskell

I'm getting started with Haskell and I'm trying to use the Alex tool to create regular expressions and I'm a little bit lost; my first inconvenience was the compile part. How I have to do to compile a file with Alex?. Then, I think that I have to import into my code the modules that alex generates, but not sure. If someone can help me, I would be very greatful!
You can specify regular expression functions in Alex.
Here for example, a regex in Alex to match floating point numbers:
$space = [\ \t\xa0]
$digit = 0-9
$octit = 0-7
$hexit = [$digit A-F a-f]
#sign = [\-\+]
#decimal = $digit+
#octal = $octit+
#hexadecimal = $hexit+
#exponent = [eE] [\-\+]? #decimal
#number = #decimal
| #decimal \. #decimal #exponent?
| #decimal #exponent
| 0[oO] #octal
| 0[xX] #hexadecimal
lex :-
#sign? #number { strtod }
When we match the floating point number, we dispatch to a parsing function to operate on that captured string, which we can then wrap and expose to the user as a parsing function:
readDouble :: ByteString -> Maybe (Double, ByteString)
readDouble str = case alexScan (AlexInput '\n' str) 0 of
AlexEOF -> Nothing
AlexError _ -> Nothing
AlexToken (AlexInput _ rest) n _ ->
case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)
A nice consequence of using Alex for this regex matching is that the performance is good, as the regex engine is compiled statically. It can also be exposed as a regular Haskell library built with cabal. For the full implementation, see bytestring-lexing.
The general advice on when to use a lexer instead of a regex matcher would be that, if you have a grammar for the lexemes you're trying to match, as I did for floating point, use Alex. If you don't, and the structure is more ad hoc, use a regex engine.
Why do you want to use alex to create regular expressions?
If all you want is to do some regex matching etc, you should look at the regex-base package.
If it is plain Regex you want, the API is specified in text.regex.base. Then there are the implementations text.regex.Posix , text.regex.pcre and several others. The Haddoc documentation is a bit slim, however the basics are described in Real World Haskell, chapter 8. Some more indepth stuff is descriped in this SO question.