Extract date from string using Regex.named_capture - regex

I would like to take a string like "My String 2022-01-07" extract the date part into a named capture.
I've tried the following regex, but it only works when there's an exact match:
# Does not work
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
# Works
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "2021-01-01")
%{"date" => "2021-01-01"}
I've also tried this without luck:
iex> Regex.named_captures(~r/([a-zA-Z0-9 ]+?)(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
Is there a way to use named captures to extract the date part of a string when you don't care about the characters surrounding the date?
I think I'm looking for a regex that will work like this:
iex> Regex.named_captures(REGEX???, "My String 2021-01-01 Other Parts")
%{"date" => "2021-01-01"}

You want
Regex.named_captures(~r/(?<date>\$?\d{4}-\d{2}-\d{2})/, "My String 2021-01-01")
Your regex - (?<date>\$?(\d{4}-\d{2}-\d{2})?) - represents a named capturing group with date as a name and a \$?(\d{4}-\d{2}-\d{2})? as a pattern. The \$?(\d{4}-\d{2}-\d{2})? pattern matches
\$? - an optional $ char
(\d{4}-\d{2}-\d{2})? - an optional sequence of four digits, -, two digits, -, two digits.
Since the pattern is not anchored (does not have to match the whole string) and both consecutive pattern parts are optional and thus can match an empty string, the ~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/ regex **matches the first empty location (empty string) at the start of the "My String 2021-01-01" string.
Rule of thumb: If you do not want to match an empty string, make sure your pattern contains obligatory patterns, that must match at least one char.

Extract Date only:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4})");
Iterable<RegExpMatch> matches = dateRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output:
1/19/2023
Extract Date and time:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateTimeRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}:\d{2} [AP]M)");
Iterable<RegExpMatch> matches = dateTimeRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output: 1/19/2023 9:29:11 AM

Related

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

Dart Regex: Only allow dot and numbers

I need to format the price string in dart.
String can be: ₹ 2,19,990.00
String can be: $1,114.99
String can be: $14.99
What I tried:
void main() {
String str = "₹ 2,19,990.00";
RegExp regexp = RegExp("(\\d+[,.]?[\\d]*)");
RegExpMatch? match = regexp.firstMatch(str);
str = match!.group(1)!;
print(str);
}
What my output is: 2,19
What my output is: 1,114
What my output is: 14.99
Expected output: 219990.00
Expected output: 1114.99
Expected output: 14.99 (This one is correct because there is no comma)
The simplest solution would be to replace all non-digit/non-dot characters with nothing.
The most efficient way to do that is:
final re = RegExp(r"[^\d.]+");
String sanitizeCurrency(String input) => input.replaceAll(re, "");
You can't do it by matching because a match is always contiguous in the source string, and you want to omit the embedded ,s.
You can use this regex for search:
^\D+|(?<=\d),(?=\d)
And replace with an empty string i.e. "".
RegEx Details:
^: Start
\D+: Match 1+ non-digit characters
|: OR
(?<=\d),(?=\d): Match a comma if it surrounded with digits on both sides
RegEx Demo
Code: Using replaceAll method:
str = str.replaceAll(RegExp(r'^\D+|(?<=\d),(?=\d)'), '');

How to return/print matches on a string in RegEx in Flutter/Dart? [duplicate]

This question already has an answer here:
How to put all regex matches into a string list
(1 answer)
Closed 1 year ago.
I want to return a pattern through regEx in flutter every time it' found, I tested using the Regex operation it worked on the same string, returning the match after that included match 'text:' to '}' letters, but it does not print the matches in the flutter application.
The code I am using:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"text:(.+?(?=}))");
print("allMatches : "+exp.allMatches(myString).toString());
The output print statement is printing I/flutter ( 5287): allMatches : (Instance of '_RegExpMatch', Instance of '_RegExpMatch')
instead of text: PM
Following is the screenshot of how it is parsing on regexr.com
Instead of using a non greedy match with a lookahead, I would suggest using a negated character class matching any char except } in capture group 1, and match the } after the group to prevent some backtracking.
\b(text:[^}]+)}
You can loop the result from allMatches and print group 1:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"\b(text:[^}]+)}");
for (var m in exp.allMatches(myString)) {
print(m[1]);
}
Output
text: PM
You need to use map method to retrieve the string from the matches:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"text:(.+?(?=}))");
final matches = exp.allMatches(myString).map((m) => m.group(0)).toString();
print("allMatches : $matches");

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Match a literal string but allow certain characters to be missing

In the end I decided to strip out the invalid characters from the "haystack" as this is not possible with standard regex
I have to capture the following "Capture... Test: Something", the literal string I have to match with is "Capture... Test Something"
The issue is that the match failed due to : missing, the : could be one of a few characters (*, /, ?, :, ", <, >, |) that have been previously stripped out from the literal string "Capture... Test Something"
How would I allow the capture of a literal string but allow the few characters listed above not to match?
Note: The only thing I can to use to match with is "Capture... Test Something", and In the end I need to return a match of "Capture... Test: Something"
I'm unable to modify "Capture... Test Something"
I'm trying to use http://kodi.wiki/view/Scrapers to match for a title
If you have an input string to match against, you can construct a regular expression out of it, by first escaping the string, and then putting optional quantifiers after the characters you want to make optional:
var search = "Capture... Test: Something";
var input = "Capture... Test Something";
search = Regex.Escape(search);
search = Regex.Replace(search, #"[*/?:""<>|]", "$0?");
var match = Regex.Match(input, search);
Check the demo here.
Another approach would be to strip all the optional characters from both strings and then check these:
var search = "Capture... Test: Something";
var input = "Capture... Test Something";
search = Regex.Replace(search, #"[*/?:""<>|]", string.Empty);
input = Regex.Replace(input, #"[*/?:""<>|]", string.Empty);
var index = input.IndexOf(search);
Demo