Match a literal string but allow certain characters to be missing - regex

In the end I decided to strip out the invalid characters from the "haystack" as this is not possible with standard regex
I have to capture the following "Capture... Test: Something", the literal string I have to match with is "Capture... Test Something"
The issue is that the match failed due to : missing, the : could be one of a few characters (*, /, ?, :, ", <, >, |) that have been previously stripped out from the literal string "Capture... Test Something"
How would I allow the capture of a literal string but allow the few characters listed above not to match?
Note: The only thing I can to use to match with is "Capture... Test Something", and In the end I need to return a match of "Capture... Test: Something"
I'm unable to modify "Capture... Test Something"
I'm trying to use http://kodi.wiki/view/Scrapers to match for a title

If you have an input string to match against, you can construct a regular expression out of it, by first escaping the string, and then putting optional quantifiers after the characters you want to make optional:
var search = "Capture... Test: Something";
var input = "Capture... Test Something";
search = Regex.Escape(search);
search = Regex.Replace(search, #"[*/?:""<>|]", "$0?");
var match = Regex.Match(input, search);
Check the demo here.
Another approach would be to strip all the optional characters from both strings and then check these:
var search = "Capture... Test: Something";
var input = "Capture... Test Something";
search = Regex.Replace(search, #"[*/?:""<>|]", string.Empty);
input = Regex.Replace(input, #"[*/?:""<>|]", string.Empty);
var index = input.IndexOf(search);
Demo

Related

Extract date from string using Regex.named_capture

I would like to take a string like "My String 2022-01-07" extract the date part into a named capture.
I've tried the following regex, but it only works when there's an exact match:
# Does not work
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
# Works
iex> Regex.named_captures(~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "2021-01-01")
%{"date" => "2021-01-01"}
I've also tried this without luck:
iex> Regex.named_captures(~r/([a-zA-Z0-9 ]+?)(?<date>\$?(\d{4}-\d{2}-\d{2})?)/, "My String 2021-01-01")
%{"date" => ""}
Is there a way to use named captures to extract the date part of a string when you don't care about the characters surrounding the date?
I think I'm looking for a regex that will work like this:
iex> Regex.named_captures(REGEX???, "My String 2021-01-01 Other Parts")
%{"date" => "2021-01-01"}
You want
Regex.named_captures(~r/(?<date>\$?\d{4}-\d{2}-\d{2})/, "My String 2021-01-01")
Your regex - (?<date>\$?(\d{4}-\d{2}-\d{2})?) - represents a named capturing group with date as a name and a \$?(\d{4}-\d{2}-\d{2})? as a pattern. The \$?(\d{4}-\d{2}-\d{2})? pattern matches
\$? - an optional $ char
(\d{4}-\d{2}-\d{2})? - an optional sequence of four digits, -, two digits, -, two digits.
Since the pattern is not anchored (does not have to match the whole string) and both consecutive pattern parts are optional and thus can match an empty string, the ~r/(?<date>\$?(\d{4}-\d{2}-\d{2})?)/ regex **matches the first empty location (empty string) at the start of the "My String 2021-01-01" string.
Rule of thumb: If you do not want to match an empty string, make sure your pattern contains obligatory patterns, that must match at least one char.
Extract Date only:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4})");
Iterable<RegExpMatch> matches = dateRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output:
1/19/2023
Extract Date and time:
void main() {
String inputString = "Your String 1/19/2023 9:29:11 AM";
RegExp dateTimeRegex = new RegExp(r"(\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}:\d{2} [AP]M)");
Iterable<RegExpMatch> matches = dateTimeRegex.allMatches(inputString);
for (RegExpMatch m in matches) {
print(m.group(0));
}
}
This will output: 1/19/2023 9:29:11 AM

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Dart Regex: Only allow dot and numbers

I need to format the price string in dart.
String can be: ₹ 2,19,990.00
String can be: $1,114.99
String can be: $14.99
What I tried:
void main() {
String str = "₹ 2,19,990.00";
RegExp regexp = RegExp("(\\d+[,.]?[\\d]*)");
RegExpMatch? match = regexp.firstMatch(str);
str = match!.group(1)!;
print(str);
}
What my output is: 2,19
What my output is: 1,114
What my output is: 14.99
Expected output: 219990.00
Expected output: 1114.99
Expected output: 14.99 (This one is correct because there is no comma)
The simplest solution would be to replace all non-digit/non-dot characters with nothing.
The most efficient way to do that is:
final re = RegExp(r"[^\d.]+");
String sanitizeCurrency(String input) => input.replaceAll(re, "");
You can't do it by matching because a match is always contiguous in the source string, and you want to omit the embedded ,s.
You can use this regex for search:
^\D+|(?<=\d),(?=\d)
And replace with an empty string i.e. "".
RegEx Details:
^: Start
\D+: Match 1+ non-digit characters
|: OR
(?<=\d),(?=\d): Match a comma if it surrounded with digits on both sides
RegEx Demo
Code: Using replaceAll method:
str = str.replaceAll(RegExp(r'^\D+|(?<=\d),(?=\d)'), '');

Regular expresion with a specific character and without another

I'm trying to implement the escape character functionality in a macro generator I'm writing in Dart. For example, I would like the program to grab all the occurrences of '&param' in my string and replace it with 'John', unless the '&' character is preceded with the escape character '\'. Example: "My name is &param and my parameter is called \&param." -> "My name is John and my parameter is called &param". What would be the regular expression to catch all the substrings that contain the '&', then my parameter's name, and without the preceding '\'?
It's possible to match that, even avoiding escapes of backslashes, as:
var re = RegExp(r"(?<!(?:^|[^\\])(?:\\{2})*\\)&\w+");
This uses negative lookbehind to find a & followed by word-characters, and not preceded by an odd number of backslashes.
More likely, you want to also recognize double-backslashes and convert them to single-backslashes. That's actually easier if you try to find all matches, because then you know all preceding double-backslashes are part of an earlier match:
var re = RegExp(r"\\\\|(?<!\\)&\w+");
This, when used as re.allMatches will find all occurrences of \\ and &word where the latter is not preceded by an odd number of backslashes.
var _re = RegExp(r"\\\\|(?<!\\)&(\w+)");
String template(String input, Map<String, String> values) {
return input.replaceAllMapped(_re, (m) {
var match = m[0]!;
if (match == r"\\") return r"\";
var replacement = values[m[1]!];
if (replacement != null) return replacement;
// do nothing for undefined words.
return match;
});
}
(You might also want to allow something like &{foo} if parameters can occur next to other characters, like &{amount)USD).
To keep the character before &param when it matches a non-backslash character you need to use so called capturing groups. These are are subexpressions of a regular expression inside parentheses. To use capturing groups in Dard you need to use the method replaceAllMapped. We also have the case when the template starts with &param and in this case we match at the beginning of the string instead.
Try this:
void main() {
final template = 'My name is &param and my parameter is called \\&param.';
final populatedTemplate = template.replaceAllMapped(RegExp(r'(^|[^\\])&param\b'), (match) {
return '${match.group(1)}John';
});
final result = populatedTemplate.replaceAll(RegExp(r'\\&param\b'), 'John');
print(result);
}

Regex to capture first substring with the following properties

I'm looking for a regex to capture the first substring with the following properties:
The substring contains no lowercase letters or symbols
The substring is immediately preceded by "..."
The substring is immdiately followed by "...\n
For example, I'd like to capture "FOO BAR" in the following
"...this is TEXT...\n that...\nI DON'T CARE ABOUT...\nbut I do care about...FOO BAR...\nNothing else matters."
Use this:
// This will target only capitol letters and numbers and spaces
// it will also capture the first occurrence only
/\.\.\.[A-Z0-9 ]+\.\.\.\n/
Here is an example usage:
var string = "...this is TEXT...\n that...\nI DON'T CARE ABOUT...\nbut I do care about...FOO BAR...\nNothing else matters.";
var regex = new RegExp(/\.\.\.[A-Z0-9 ]+\.\.\.\n/);
var res = regex.exec(string);
var result = res[0].substring(3, res[0].length - 4); // strip out the ... and \n
console.log(result);