There is a regular expression matching quoted substrings: "/\"(?:[^\"\\]|\\.)*\"/" (originally /"(?:[^"\\]|\\.)*"/, see Here). Tested on regex101, it works.
With TDFA, it's syntax:
*** Exception: Explict error in module Text.Regex.TDFA.String : Text.Regex.TDFA.String died:
parseRegex for Text.Regex.TDFA.String failed:"/"(?:[^"\]|\.)*"/" (line 1, column 4):
unexpected "?"
expecting empty () or anchor ^ or $ or an atom
Is there a way co correct it?
Test string: Is big "problem", no?
Expected result: "problem"
UPD:
This is full context:
removeQuotedSubstrings :: String -> [String]
removeQuotedSubstrings str =
let quoteds = concat (str =~ ("/\"(?:[^\"\\]|\\.)*\"/" :: String) :: [[String]])
in quoteds
No improvement, just an acceptable solution, albeit lacking in elegance:
import qualified Data.Text as T
import Text.Regex.TDFA
-- | Removes all double quoted substrings, if any, from a string.
--
-- Examples:
--
-- >>> removeQuotedSubstrings "alfa"
-- "alfa"
-- >>> removeQuotedSubstrings "ngoro\"dup\"lai \"ming\""
-- "ngoro lai "
removeQuotedSubstrings :: String -> String
removeQuotedSubstrings str =
let quoteds = filter (('"' ==) . head)
$ concat (str =~ ("\"(\\.|[^\"\\])*\"" :: String) :: [[String]])
in T.unpack $ foldr (\quoted acc -> T.replace (T.pack quoted) " " acc)
(T.pack str) quoteds
Yes, the final purpose has always been to remove the quoted substrings.
I'm trying to use regex-pcre but regex-base contains too many overloads for RegexContext so I don't know which one should I use for the task at hand.
I want to match a string against (foo)-(bar)|(quux)-(quux)(q*u*u*x*) regular expression the following way:
myMatch :: String -> Maybe (String, String, Maybe String)
Sample output:
myMatch "dfjdjk" should be Nothing as there is no match
myMatch "foo-bar" should be Just ("foo", "bar", Nothing) as there's no third capture group in the first alternative
myMatch "quux-quuxqu" should be Just ("quux", "quux", Just "qu")
myMatch "quux-quux" should be Just ("quux", "quux", Just "") as the third capture group is present but empty
It's not an assignment, I'm just baffled with how https://github.com/erantapaa/haskell-regexp-examples/blob/master/RegexExamples.hs don't contain code paths for situations where there are no matches or no capture groups
A way of achieving it is using getAllTextSubmatches:
import Text.Regex.PCRE
myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [String] of
[] -> Nothing
[_, g1, g2, "", "", ""] -> Just (g1, g2, Nothing)
[_, "", "", g3, g4, g5] -> Just (g3, g4, Just g5)
When getAllTextSubmatches has [String] as return type, it returns an empty list if there is no match, or a list with all capturing groups (where index 0 is the whole match) of the first match.
Alternatively, if a matched group may be empty and you cannot pattern match on the empty string, you can use [(String, (MatchOffset, MatchLength))] as return type of getAllTextSubmatches and pattern match MatchOffset with -1 to identify unmatched groups:
myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [(String, (MatchOffset, MatchLength))] of
[] -> Nothing
[_, (g1, _), (g2, _), (_, (-1, _)), (_, (-1, _)), (_, (-1, _))] -> Just (g1, g2, Nothing)
[_, (_, (-1, _)), (_, (-1, _)), (g3, _), (g4, _), (g5, _)] -> Just (g3, g4, Just g5)
Now, if that looks too verbose:
{-# LANGUAGE PatternSynonyms #-}
pattern NoMatch = ("", (-1, 0))
myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [(String, (MatchOffset, MatchLength))] of
[] -> Nothing
[_, (g1, _), (g2, _), NoMatch, NoMatch, NoMatch] -> Just (g1, g2, Nothing)
[_, NoMatch, NoMatch, (g3, _), (g4, _), (g5, _)] -> Just (g3, g4, Just g5)
To distinguish when there is no match, use =~~ so that it will place the result in a Maybe monad. It will use fail to return Nothing if there are no matches.
myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = do
let regex = "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)"
groups <- getAllTextSubmatches <$> str =~~ regex :: Maybe [String]
case groups of
[_, g1, g2, "", "", ""] -> Just (g1, g2, Nothing)
[_, "", "", g3, g4, g5] -> Just (g3, g4, Just g5)
Use regex-applicative
myMatch = match re
re = foobar <|> quuces where
foobar = (,,) <$> "foo" <* "-" <*> "bar" <*> pure Nothing
quuces = (,,)
<$> "quux" <* "-"
<*> "quux"
<*> (fmap (Just . mconcat) . sequenceA)
[many $ sym 'q', many $ sym 'u', many $ sym 'u', many $ sym 'x']
or, with ApplicativeDo,
re = foobar <|> quuces where
foobar = do
foo <- "foo"
_ <- "-"
bar <- "bar"
pure (foo, bar, Nothing)
quuces = do
quux1 <- "quux"
_ <- "-"
quux2 <- "quux"
quux3 <- fmap snd . withMatched $
traverse (many . sym) ("quux" :: [Char])
-- [many $ sym 'q', many $ sym 'u', many $ sym 'u', many $ sym 'x']
pure (quux1, quux2, Just quux3)
Background
Let say I have several Regex here.
import Text.Regex
openTag = mkRegex "<([A-Z][A-Z0-9]*)\\b[^>]*>"
closeTag = mkRegex "</\\1>"
any = mkRegex "(.*?)"
Problem
openTag ++ any ++ closeTag <-- Just for illustration purpose
How can I merge them? To be specific, a Regex -> Regex -> Regex function. Alternatively, convert a Regex back to String would be good.
openTag ++ "hello" ++ closeTag <-- Just for illustration purpose
Thus, I can create my own Regex -> String -> Regex function ultimately.
Workaround
Manipulate the string literals.
import Text.Regex
openTag = "<([A-Z][A-Z0-9]*)\\b[^>]*>"
closeTag = "</\\1>"
any = "(.*?)"
tagWithAny = mkRegex $ openTag ++ any ++ closeTag
tagWith :: String -> Regex
tagWith s = mkRegex $ openTag ++ s ++ closeTag
Regex type in the Text.Regex is essentially a C pointer:
data Regex = Regex (ForeignPtr CRegex) CompOption ExecOption
AFAIK there is no way to recover the string representation of the posix regex, after it has been compiled. regcomp 3 man page.
If you’d like to operate on regular expression algebraically, wrap then in your own type to postpone the compiling or use for example regex-applicative.
Lets say I have a list of type integer [blah;blah;blah;...] and i don't know the size of the lis and I want to pattern match and not print the first element of the list. Is there any way to do this without using a if else case or having a syntax error?
because all i'm trying to do is parse a file tha looks like a/path/to/blah/blah/../file.c
and only print the path/to/blah/blah
for example, can it be done like this?
let out x = Printf.printf " %s \n" x
let _ = try
while true do
let line = input_line stdin in
...
let rec f (xpath: string list) : ( string list ) =
begin match Str.split (Str.regexp "/") xpath with
| _::rest -> out (String.concat "/" _::xpath);
| _ -> ()
end
but if i do this i have a syntax error at the line of String.concat!!
String.concat "/" _::xpath doesn't mean anything because _ is pattern but not a value. _ can be used in the left part of a pattern matching but not in the right part.
What you want to do is String.concat "/" rest.
Even if _::xpath were correct, String.concat "/" _::xpath would be interpreted as (String.concat "/" _)::xpath whereas you want it to be interpreted as String.concat "/" (_::xpath).
How would I do regex matching in Erlang?
All I know is this:
f("AAPL" ++ Inputstring) -> true.
The lines that I need to match
"AAPL,07-May-2010 15:58,21.34,21.36,21.34,21.35,525064\n"
In Perl regex: ^AAPL,* (or something similar)
In Erlang?
Use the re module, e.g.:
...
String = "AAPL,07-May-2010 15:58,21.34,21.36,21.34,21.35,525064\n",
RegExp = "^AAPL,*",
case re:run(String, RegExp) of
{match, Captured} -> ... ;
nomatch -> ...
end,
...