Insert a character when capturing group - regex

I want to select a group out of a given string and insert a character in position 5 of that group.
Input String: xxx123456789yyy
Expression: ^x{3}(?<serialno>\d{5}\d{4})y{3}$
Output (serialno): 123456789
Now I want the serialno group to contain a 'A' between 5 and 6, so that I get '12345A6789' instead of 123456789'. The character is always an 'A' and I want to do this in one Regular Expression.
Is it possible to do this with match or do I have to call match and replace?

You can't alter a string with a match, so you'll need to use preg_replace:
$output = preg_replace('/^x{3}(\d{5})(\d{4})y{3}$/', '$1A$2', $input);

Related

How to get text that is before and after of a matched group in a regex expression

I have following regex that matches any number in the string and returns it in the group.
^.*[^0-9]([0-9]+).*$  $1
Is there a way I can get the text before and after of the matched group i.e. also as my endgoal is to reconstruct the string by replacing the value of only the matched group.
For e.g. in case of this string /this_text_appears_before/73914774/this_text_appears_after, i want to do something like $before_text[replaced_text]$after_text to generate a final result of /this_text_appears_before/[replaced_text]/this_text_appears_after
You only need a single capture group, which should capture the first part instead of the digits:
^(.*?[^0-9])[0-9]+
Regex demo
In the replacement use group 1 followed by your replacement text \1[replaced_text]
Example
pattern = r"^(.*?[^0-9])[0-9]+"
s = "/this_text_appears_before/73914774/this_text_appears_after"
result = re.sub(pattern, r"\1[replaced_text]", s)
if result:
print (result)
Output
/this_text_appears_before/[replaced_text]/this_text_appears_after
Other options for the example data can be matching the /
^(.*?/)[0-9]+
Or if you want to match the first 2 occurrences of the /
^(/[^/]+/)[0-9]+

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Select part of line in regular expression

I have this string:
#1#http://test.ir:8080/srvSC.svc#1#
#2#http://test.ir:8081/srvSC.svc#2#
#3#http://test.ir:8082/srvSC.svc#3#
#4#http://test.ir:8083/srvSC.svc#4#
#5#http://test.ir:8084/srvSC.svc#5#
#6#http://test.ir:8085/srvSC.svc#6#
I want to select all #1# #2# ... so in order to i wrote this expression : ^(^\#.\#) but it just select first line.How could i select first #.# and last of #.#?
You can use
^(#\d+#)(.+)\1$
That will capture the first #s in a group, repeat any characters, and then match the same characters that were matched in the first group. The string you want will be in the second captured group.
https://regex101.com/r/7Er0Ch/5

Regex: set capture to fixed string

I want to match the string a but I want my capture patten to be b.
To capture a named as id I can easily do:
(?<id>a)
but I want id to be b when the original string was just a. i.e I want the capture to be characters that aren't in the original string.
For example, in PHP it would look something like:
preg_match('/your_magic/', 'a', $matches);
print $matches['id'] == 'b'; // true
There is no way to get anything in to a capturing group which isn't in the input string. Capturing groups are (at least in Perl) partially represented as start/end positions of the original input string.
If the value you want the capturing group to get is in the input string you can do that using lookarounds. The desired string has to be after the match if your regex flavor has a limited lookbehind (like PHP).
For example:
preg_match('/a(?=.*(?<id>b))/', 'a foo b', $matches);
print "matched '$matches[0]', id is '$matches[id]'";
Output:
matched 'a', id is 'b'

Select first text between two expressions

I want to return the first "abcd" part of the text below.
00abcd126456\x 00abcd126456\x
I want to select all text between the first " 00" and the first (6 digits + "\x"). Every string starts with " 00".
I've been experementing with:
^ 00(.*)\d{6}\\x
but it obviously selects the whole string.
Please help.
Use a non-greedy quantifier:
^ 00(.*?)\d{6}\\x
*? will only match as few as possible characters to allow the match to succeed, instead of * which will match as many characters as possible.
If you don't want to fiddle around with the capturing group you can also use lookaround:
(?<=^ 00).*?(?=\d{6}\\x)
Quick PowerShell test:
PS> ' 00abcd126456\x 00abcd126456\x' -match '(?<=^ 00).*?(?=\d{6}\\x)'; $Matches
True
Name Value
---- -----
0 abcd