Javascript regex to match type annotations - regex

I'm trying to match type annotations from a string of parameters:
foo: string, bar:number, baz: Array<string>
my initial pattern was working fine for primitives:
:\s*\w+
but it's not capturing arrays, so I tried an alternation, but it's not working:
:\s*\w+|:\s*\w+<\w+>
end result should be:
foo, bar, baz

You can make the part with the brackets optional and replace the matches with an empty string leaving the desired result:
:\s*\w+(?:<\w+>)?
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
console.log(s.replace(/:\s*\w+(?:<\w+>)?/g, ''));
Or match the parts using a capturing group
(\w+):\s*\w
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
let matches = Array.from(s.matchAll(/(\w+):\s*\w/g), m => m[1]);
console.log(matches.join(", "));

Related

Regex array of named group matches

I would like to get an array of all captured group matches in chronological order (the order they appear in in the input string).
So for examples with the following regex:
(?P<fooGroup>foo)|(?P<barGroup>bar)
and the following input:
foo bar foo
I would like to get something that resembles the following output:
[("fooGroup", (0,3)), ("barGroup", (4,7)), ("fooGroup", (8,11))]
Is this possible to do without manually sorting all matches?
I don't know what you mean by "without manually sorting all matches," but this Rust code produces the output you want for this particular style of pattern:
use regex::Regex;
fn main() {
let pattern = r"(?P<fooGroup>foo)|(?P<barGroup>bar)";
let haystack = "foo bar foo";
let mut matches: Vec<(String, (usize, usize))> = vec![];
let re = Regex::new(pattern).unwrap();
// We skip the first capture group, which always corresponds
// to the entire pattern and is unnamed. Otherwise, we assume
// every capturing group has a name and corresponds to a single
// alternation in the regex.
let group_names: Vec<&str> =
re.capture_names().skip(1).map(|x| x.unwrap()).collect();
for caps in re.captures_iter(haystack) {
for name in &group_names {
if let Some(m) = caps.name(name) {
matches.push((name.to_string(), (m.start(), m.end())));
}
}
}
println!("{:?}", matches);
}
The only real trick here is to make sure group_names is correct. It's correct for any pattern of the form (?P<name1>re1)|(?P<name2>re2)|...|(?P<nameN>reN) where each reI contains no other capturing groups.

Extract date from string using Regex.named_capture

I would like to take a string like "My String 2022-01-07" extract the date part into a named capture.
I've tried the following regex, but it only works when there's an exact match:
# Does not work
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
# Works
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "2021-01-01")
%{"date" => "2021-01-01"}
I've also tried this without luck:
iex> Regex.named_captures(~r/([a-zA-Z0-9 ]+?)(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
Is there a way to use named captures to extract the date part of a string when you don't care about the characters surrounding the date?
I think I'm looking for a regex that will work like this:
iex> Regex.named_captures(REGEX???, "My String 2021-01-01 Other Parts")
%{"date" => "2021-01-01"}
You want
Regex.named_captures(~r/(?<date>\$?\d{4}-\d{2}-\d{2})/, "My String 2021-01-01")
Your regex - (?<date>\$?(\d{4}-\d{2}-\d{2})?) - represents a named capturing group with date as a name and a \$?(\d{4}-\d{2}-\d{2})? as a pattern. The \$?(\d{4}-\d{2}-\d{2})? pattern matches
\$? - an optional $ char
(\d{4}-\d{2}-\d{2})? - an optional sequence of four digits, -, two digits, -, two digits.
Since the pattern is not anchored (does not have to match the whole string) and both consecutive pattern parts are optional and thus can match an empty string, the ~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/ regex **matches the first empty location (empty string) at the start of the "My String 2021-01-01" string.
Rule of thumb: If you do not want to match an empty string, make sure your pattern contains obligatory patterns, that must match at least one char.
Extract Date only:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4})");
Iterable<RegExpMatch> matches = dateRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output:
1/19/2023
Extract Date and time:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateTimeRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}:\d{2} [AP]M)");
Iterable<RegExpMatch> matches = dateTimeRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output: 1/19/2023 9:29:11 AM

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Pattern matching extract String Scala

I want to extract part of a String that match one of the tow regex patterns i defined:
//should match R0010, R0100,R0300 etc
val rPat="[R]{1}[0-9]{4}".r
// should match P.25.01.21 , P.27.03.25 etc
val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r
When I now define my method to extract the elements as:
val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
case rPat(el)=>println(el) // print R0100
case _ => println("no match")}
And test it eg with:
val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt)//prints "no match" but should print P.25.01.21
val rSt= "R0010 test test 3,870"
matcher(rSt) //prints also "no match" but should print R0010
//check if regex is wrong
val pHead="P.25.01.21"
pHead.matches(pPat.toString)//returns true
val rHead="R0010"
rHead.matches(rPat.toString)//return true
I'm not sure if the regex expression are wrong but the matches method works on the elements. So what is wrong with the approach?
When you use pattern matching with strings, you need to bear in mind that:
The .r pattern you pass will need to match the whole string, else, no match will be returned (the solution is to make the pattern .r.unanchored)
Once you make it unanchored, watch out for unwanted matches: R[0-9]{4} will match R1234 in CSR123456 (solutions are different depending on what your real requirements are, usually word boundaries \b are enough, or negative lookarounds can be used)
Inside a match block, the regex matching function requires a capturing group to be present if you want to get some value back (you defined it as el in pPat(el) and rPat(el).
So, I suggest the following solution:
val rPat="""\b(R\d{4})\b""".r.unanchored
val pPat="""\b(P\.\d{2}\.\d{2}\.\d{2})\b""".r.unanchored
val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
case rPat(el)=>println(el) // print R0100
case _ => println("no match")
}
Then,
val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt) // => P.25.01.21
val pSt2_bad=" CP.2334565.01124.212 - Hello whats going on?"
matcher(pSt2_bad) // => no match
val rSt= "R0010 test test 3,870"
matcher(rSt) // => R0010
val rSt2_bad = "CSR00105 test test 3,870"
matcher(rSt2_bad) // => no match
Some notes on the patterns:
\b - a leading word boundary
(R\d{4}) - a capturing group matching exactly 4 digits
\b - a trailing word boundary
Due to the triple quotes used to define the string literal, there is no need to escape the backslashes.
Introduce groups in your patterns:
val rPat=".*([R]{1}[0-9]{4}).*".r
val pPat=".*([P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}).*".r
...
scala> matcher(pSt)
P.25.01.21
scala> matcher(rSt)
R0010
If code is written in the following way, the desired outcome will be generated. Reference API documentation followed is http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html
//should match R0010, R0100,R0300 etc
val rPat="[R]{1}[0-9]{4}".r
// should match P.25.01.21 , P.27.03.25 etc
val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r
def main(args: Array[String]) {
val pSt=" P.25.01.21 - Hello whats going on?"
val pPatMatches = pPat.findAllIn(pSt);
pPatMatches.foreach(println)
val rSt= "R0010 test test 3,870"
val rPatMatches = rPat.findAllIn(rSt);
rPatMatches.foreach(println)
}
Please, let me know if that works for you.

How to manage partial matching with regex?

I have a regex like this:
val myregex = "This is a (.*) text for (.*) and other thing like .*".r
If I run :
> val myregex(a,b) = "This is a test text for something and other thing like blah blah"
a: String = test
b: String = something
it is ok, and it fails is b is missing:
> val myregex(a,b) = "This is a test text for and other thing like blah blah"
scala.MatchError: This is a test text for and other thing like blah blah (of class java.lang.String)
... 33 elided
Is there a way to keep for example the value a and replace b with a fallback value (and viceversa)? Or the only solution is splitting the regex in two distincts regexs?
Your original regex requires 2 consecutive spaces between for and and.
You may change your regex to actually match the string with an optional pattern by wrapping the space and the subsequent (.*) pattern with a non-capturing group and apply the ? quantifier to it making it optional:
val myregex = "This is a (.*) text for(?: (.*))? and other thing like .*".r
val x = "This is a test text for and other thing like blah blah"
x match {
case myregex(a, b) => print(s"${a} -- ${b}");
case _ => print("none")
}
// => test -- null
See the online Scala demo. Here, there is a match, but b is just null since the second capturing group did not participate in the match (and did not get initialized).
Or the only solution is splitting the regex in two distincts regexs?
This is the only solution. Your best bet is probably to use pattern matching:
("This is a test text for something", "and other thing like blah blah") match {
case (r1(a), r2(b)) => (a, b)
case (r1(a), _) => (a, "fallback")
}