How would I do regex matching in Erlang?
All I know is this:
f("AAPL" ++ Inputstring) -> true.
The lines that I need to match
"AAPL,07-May-2010 15:58,21.34,21.36,21.34,21.35,525064\n"
In Perl regex: ^AAPL,* (or something similar)
In Erlang?
Use the re module, e.g.:
...
String = "AAPL,07-May-2010 15:58,21.34,21.36,21.34,21.35,525064\n",
RegExp = "^AAPL,*",
case re:run(String, RegExp) of
{match, Captured} -> ... ;
nomatch -> ...
end,
...
Related
This question already has an answer here:
Working regex fails when using Scala pattern matching
(1 answer)
Closed 5 years ago.
I have written the following code in scala:
val regex_str = "([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
which returns "other", but if I take off the leading underscore:
val regex_str = "([a-z]+)(\\d+)".r
"abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
I get "found". How can I find any ([a-z]+)(\\d+) instead of just at the beginning? I am used to other regex languages where you use a ^ to specify beginning of the string, and the absence of that just gets all matches.
Scala regex patterns default as "anchored", i.e. bound to beginning and end of target string.
You'll get the expected match with this.
val regex_str = "([a-z]+)(\\d+)".r.unanchored
Hi May be you need something like this,
val regex_str = "[^>]([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => println(s"found $a $n")
case _ => println("other")
}
This will avoid the first character from your string.
Hope this helps!
The unapplySeq of the Regex tries to capture the whole input by default (treats the pattern as if it was between ^ and $).
There are two ways to capture inside the input:
use .* before and after the captures: val regex_str = ".*([a-z]+)(\\d+).*".r
do the same with .unanchored: val regex_str = "([a-z]+)(\\d+)".r.unanchored
Otherwise scala treats regular expression anchors the same way as in other languages; this one is an exception made for semantic reasons.
The regex extractor in scala pattern-matching attempts to match the entire string. If you want to skip some junk-characters in the beginning and in the end, prepend a . with a reluctant quantifier to the regex:
val regex_str = ".*?([a-z]+)(\\d+).*".r
val result = "_!+<>__abc123_%$" match {
case regex_str(a, n) => s"found a = '$a', n = '$n'"
case _ => "no match"
}
println(result)
This outputs:
found a = 'abc', n = '123'
Otherwise, don't use the pattern match with the extractor, use "...".r.findAllIn to find all matches.
I have this verbose code that does shortcircuit Regex extraction / matching in Scala. This attempts to match a string with the first Regex, if that doesn't match, it attempts to match the string with the second Regex.
val regex1 : scala.util.matching.Regex = "^a(b)".r
val regex2 : scala.util.matching.Regex = "^c(d)".r
val s = ?
val extractedGroup1 : Option[String] = s match { case regex1(v) => Some(v) case _ => None }
val extractedGroup2 : Option[String] = s match { case regex2(v) => Some(v) case _ => None}
val extractedValue = extractedGroup1.orElse(extractedGroup2)
Here are the results:
s == "ab" then extractedValue == "b"
s == "cd" then extractedValue == "c"
s == "gg" then extractedValue == None.
My question is how can we combine the two regex into a single regex with the regex or operator, and still use Scala extractors. I tried this, but it's always the None case.
val regex : scala.util.matching.Regex = "^a(b)$ | ^c(d)$".r
val extractedValue: s match { case regex(v) => Some(v) case _ => None }
Don't use quality of life spaces within the regex, although they feel very scala-esque, they might be taken literary and your program expects that there should be a whitespace after the endOfString or space before the startOfString, which is obviously never the case. Try ^(?:a(b)|c(d))$, which is the same thing you did without repeating ^ and $.
Your own ^a(b)$|^c(d)$ can work too (if you remove the whitespaces).
Also, do you really get c out of cd? Judging by your regex, you should be getting d, if we're talking about capture groups.
Also, note that you're extracting capture groups. If you combine the regexes, an extracted d will be $2, while b would be $1.
Background
Let say I have several Regex here.
import Text.Regex
openTag = mkRegex "<([A-Z][A-Z0-9]*)\\b[^>]*>"
closeTag = mkRegex "</\\1>"
any = mkRegex "(.*?)"
Problem
openTag ++ any ++ closeTag <-- Just for illustration purpose
How can I merge them? To be specific, a Regex -> Regex -> Regex function. Alternatively, convert a Regex back to String would be good.
openTag ++ "hello" ++ closeTag <-- Just for illustration purpose
Thus, I can create my own Regex -> String -> Regex function ultimately.
Workaround
Manipulate the string literals.
import Text.Regex
openTag = "<([A-Z][A-Z0-9]*)\\b[^>]*>"
closeTag = "</\\1>"
any = "(.*?)"
tagWithAny = mkRegex $ openTag ++ any ++ closeTag
tagWith :: String -> Regex
tagWith s = mkRegex $ openTag ++ s ++ closeTag
Regex type in the Text.Regex is essentially a C pointer:
data Regex = Regex (ForeignPtr CRegex) CompOption ExecOption
AFAIK there is no way to recover the string representation of the posix regex, after it has been compiled. regcomp 3 man page.
If you’d like to operate on regular expression algebraically, wrap then in your own type to postpone the compiling or use for example regex-applicative.
How to find a exact match using regular expression in Ocaml? For example, I have a code like this:
let contains s1 s2 =
let re = Str.regexp_string s2
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
where s2 is "_X_1" and s1 feeds strings like "A_1_X_1", "A_1_X_2", ....and so on to the function 'contains'. The aim is to find the exact match when s1 is "A_1_X_1". But the current code finds match even when s1 is "A_1_X_10", "A_1_X_11", "A_1_X_100" etc.
I tried with "[_x_1]", "[_X_1]$" as s2 instead of "_X_1" but does not seem to work. Can somebody suggest what can be wrong?
You can use the $ metacharacter to match the end of the line (which, assuming the string doens't contain multiple lines, is the end of the string). But you can't put that through Str.regexp_string; that just escapes the metacharacters. You should first quote the actual substring part, and then append the $, and then make a regexp from that:
let endswith s1 s2 =
let re = Str.regexp (Str.quote s2 ^ "$")
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
Str.match_end is what you need:
let ends_with patt str =
let open Str in
let re = regexp_string patt in
try
let len = String.length str in
ignore (search_backward re str len);
match_end () == len
with Not_found -> false
With this definition, the function works as you require:
# ends_with "_X_1" "A_1_X_10";;
- : bool = false
# ends_with "_X_1" "A_1_X_1";;
- : bool = true
# ends_with "_X_1" "_X_1";;
- : bool = true
# ends_with "_X_1" "";;
- : bool = false
A regex will match anywhere in the input, so the behaviour you see is normal.
You need to anchor your regex: ^_X_1$.
Also, [_x_1] will not help: [...] is a character class, here you ask the regex engine to match a character which is x, 1 or _.
I am looking for a Haskell function that returns the capturing groups of all matches of a given regex.
I have been looking at Text.Regex, but couldn't find anything there.
Now I am using this workaround which seems to work:
import Text.Regex
findNext :: String -> Maybe (String, String, String, [String] ) -> [ [String] ]
findNext pattern Nothing = []
findNext pattern (Just (_, _, rest, matches) ) =
case matches of
[] -> (findNext pattern res)
_ -> [matches] ++ (findNext pattern res)
where res = matchRegexAll (mkRegex pattern) rest
findAll :: String -> String -> [ [String] ]
findAll pattern str = findNext pattern (Just ("", "", str, [] ) )
Result:
findAll "x(.)x(.)" "aaaxAxaaaxBxaaaxCx"
[["A","a"],["B","a"]]
Question:
Did I miss something in Text.Regex?
Is there a Haskell regex library that implements a findAll function?
You can use the =~ operator from Text.Regex.Posix:
Prelude> :mod + Text.Regex.Posix
Prelude Text.Regex.Posix> "aaaxAxaaaxBxaaaxCx" =~ "x(.)x(.)" :: [[String]]
[["xAxa","A","a"],["xBxa","B","a"]]
Note the explicit [[String]] type. Try replacing it with Bool, Int, String and see what happens. All types that you can use in this context are listed here. Also see this tutorial.