Insert characters (hyphens) between matches in RegEx - regex

I am using Regular Expressions to find very simple patterns.
However, I want to insert a hyphen character between the matches.
I'm very familiar with writing RegEx Match patterns, but struggling with how to use RegEx replace to insert characters.
My RegEx is:
(\d{1,2})([A-Z]{1,3})(_)?(\d{3,4})
which matches:
03EM0109
03EM0112
03EM0151
3V204
02SDV_0900
I would like the output, using RegEx Replace, to input hyphens between the matches to give me:
03-EM-0109
03-EM-0112
03-EM-0151
3-V-204
02-SDV-0900
I tried changing the RegEx and entering numbered capture groups for null patterns between, but when using a replace function this returns only hyphens. Presumably because the null capture group is not actually capturing anything?
Using:
(\d{1,2})()([A-Z]{1,3})()(_)?()(\d{3,4})
And replacing with $2-$4-$5-
Returns 3 hyphens - - -
Could someone please help....

If you use the RegExp (\d{1,2})([A-Z]{1,3})_?(\d{3,4}), and replace with $1-$2-$3 then it seems to produce the desired results. I removed the capture group around the underscore

Related

regex to negate from matched group

I am trying to use regex to match anything but "id":digits part
I have come up with this "(\b(id":)(\d+)\b)" to find the id:byDigits pattern, but I need to negate that but haven't been able to get around it.
[{"age":1,"id":123,"value":"14"},
{"age":1,"id":4214,"value":"4324"},
{"age":3,"id":4244,"value":"545"}]
Any help is appreciated.
Simplest option is to capture the rest of the string into groups and use it in the substituion as below
Demo: https://regex101.com/r/cRVA5C/2/
Pattern: ^([\s\S]*?)\s*"id":\d+,?\s*([\s\S]*?)$
Breakdown:
([\s\S]*?): match any number of any characters before and after "id":. Capture it into groups \1 and \2
\s*"id":\d+,?\s*: match "id"=\d+, optionally preceded by spaces and optionally followed by spaces and ,.
In substituition, use \1\2, to get the desired output.
Note: Regex may not be the ideal tool for parsing JSON.

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.
The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

Regular Expression extract substring (.)*

I am trying to match with the following regex.
\d{11}(.*)
Which is any 11 digits followed by a string. I want to extract the tailing string whatever it is.
I used RE2::FullMatch but it gives the first half (the 11 digits). How to get the sub-string matched with (.*) ?
string subStr
RE2::FullMatch("<sip:+19073381121#216.67.108.201:5060;user=phone>;npi=ISDN",(<sip:\+(\d{11}))(.*), &subStr);
I am trying to extract everything starting from # in above string. Basically I want what matches to (.*) but the above function returns <sip:+19073381121.
I am not very familiar with regex but I looked at different APIs to extract substrings and found this one usefull
Remove the extra capturing groups from your regular expression.
<sip:\+\d{11}(.*)
To get the sub string matched with (.*) use $1. That is the first capturing group that you specified with the brackets.