Single RegEx to catch multiple options and replace with their corresponding replacements - regex

The problem goes like this:
value match: 218\d{3}(\d{4})#domain.com replace with 10\1 to get 10 followed by last 4 digits
for example 2181234567 would become 104567
value match: 332\d{3}(\d{4})#domain.com replace with 11\1 to get 11 followed by last 4 digits
for example 3321234567 would become 114567
value match: 420\d{3}(\d{4})#domain.com replace with 12\1 to get 12 followed by last 4 digits
..and so on
for example 4201234567 would become 124567
Is there a better way to catch different values and replace with their corresponding replacements in a single RegEx than creating multiple expressions?
Like (218|332|420)\d{3}(\d{4})#domain.com to replace 10\4|11\4|12\4) and get just their corresponding results when matched.
Edit: Didn't specify the use case: It's for my PBX, that just uses RegEx to match patterns and then replace it with the values I want it to go out with. No code. Just straight up RegEx in the GUI.
Also for personal use, if I can get it to work with Notepad++

Ctrl+H
Find what: (?:(218)|(332)|(420))\d{3}(\d{4})(?=#domain\.com)
Replace with: (?{1}10$4)(?{2}11$4)(?{3}12$4)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
(218) # group 1, 218
| # OR
(332) # group 2, 332
| # OR
(420) # group 3, 420
) # end group
\d{3} # 3 digits
(\d{4}) # group 4, 4 digits
(?=#domain\.com) # positive lookahead, make sure we have "#domain.com" after
# that allows to keep "#domain.com"
# if you want to remove it from the result, just put "#domain\.com"
# without lookahead.
Replacement:
(?{1} # if group 1 exists
10 # insert "10"
$4 # insert content of group 4
) # endif
(?{2}11$4) # same as above
(?{3}12$4) # same as above
Screenshot (before):
Screenshot (after):

I don't think you can use a single regular expression to conditionally replace text as per your example. You either need to chain multiple search & replace, or use a function that does a lookup based on the first captured group (first three digits).
You did not specify the language used, regular expressions vary based on language. Here is a JavaScript code snippet that uses the function with lookup approach:
var str1 = '2181234567#domain.com';
var str2 = '3321234567#domain.com';
var str3 = '4201234567#domain.com';
var strMap = {
'218': '10',
'332': '11',
'420': '12'
// add more as needed
};
function fixName(str) {
var re = /(\d{3})\d{3}(\d{4})(?=\#domain\.com)/;
var result = str.replace(re, function(m, p1, p2) {
return strMap[p1] + p2;
});
return result;
}
var result1 = fixName(str1);
var result2 = fixName(str2);
var result3 = fixName(str3);
console.log('str1: ' + str1 + ', result1: ' + result1);
console.log('str2: ' + str2 + ', result2: ' + result2);
console.log('str3: ' + str3 + ', result3: ' + result3);
Output:
str1: 2181234567#domain.com, result1: 104567#domain.com
str2: 3321234567#domain.com, result2: 114567#domain.com
str3: 4201234567#domain.com, result3: 124567#domain.com

#Toto has a nice answer, and there is another method if the operator (?{1}...) is not available (but thanks, Toto, I did not know this feature of NotePad++).
More details on my answer here: https://stackoverflow.com/a/63676336/1287856
Append to the end of the doc:
,218=>10,332=>11,420=>12
Search for:
(218|332|420)\d{3}(\d{4})(?=#domain.com)(?=[\s\S]*,\1=>([^,]*))
Replace with
\3\2
watch in action:

Related

Regular expression to search for digits after a decimal place

I'm trying to write a regular expression that can match a decimal (and the digits after) of a dollar value. For example, I want to match $1.00 , $1,100.89 (includes values in the thousands with commas). It cannot match any digits that are not preceded by a $ character. There values are also not the only pieces of text in this file.
So far, I've tried a few things that haven't quite gotten me there:
\.+[\d]+ (highlights the decimal and every digit after the decimal point, but not what we want because it includes non-dollar values like 1.00)
\$+[\d+\.]+ highlights the whole value of the dollar except the 1,250
(\$\d+\.+\d+)|\$\d+\,+\d+\.+\d+ highlights the whole value of anything with a dollar sign
Anyone have an idea?
I looked at your problem and I believe I have a solution.
You could use the regex below to search for the last two decimals.
^\$[\d,]+\.((?:\d){2})
You can see it in action here
Use:
^\$[\d,]+\.(\d\d)$
Explanation:
^ # beginning of string
\$ # $ sign
[\d,]+ # 1 or more digit or comma
\. # a dot
(\d\d) # group 1, 2 digits
$ # end of string
var test = [
'$100.00',
'$1,100.89',
'$123',
'123.45',
];
console.log(test.map(function (a) {
m = a.match(/^\$[\d,]+\.(\d\d)$/);
if (m)
return a + ' : ' + m[1];
else
return a + ' : no match';
}));
You could use the non matching group selector (?:) to isolate only the group you want. I've come up with this regex and it seams to do what you are looking for
^(?:\$[,\d]+)(?:\.([\d]{2}))
const regex = /^(?:\$[,\d]+)(?:\.([\d]{2}))/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => regex.exec(item)[1]);
console.log(result);
You could test more cases here
EDIT :
Here is an example on how to replace only the last digit.
I'm using the same concept as the other one, only this time i'm not keeping the digit. I'm going to use $1 to get the group i want in the new string.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1.50'));
console.log(result);
Notice here that the $1 in the replace function refer to the first group matching group of the regex. This way, we can get it back an "insert" it into our final string.
Here I've choosen .50 as a replace string, but you could use what ever.
P.S. I know this might be confusing because we are talking about dollar, so here is an example where we replace the final digit with a word.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1 this is a word'));
console.log(result);

Regex - capture multiple groups and combine them multiple times in one string

I need to combine some text using regex, but I'm having a bit of trouble when trying to capture and substitute my string. For example - I need to capture digits from the start, and add them in a substitution to every section closed between ||
I have:
||10||a||ab||abc||
I want:
||10||a10||ab10||abc10||
So I need '10' in capture group 1 and 'a|ab|abc' in capture group 2
I've tried something like this, but it doesn't work for me (captures only one [a-z] group)
(?=.*\|\|(\d+)\|\|)(?=.*\b([a-z]+\b))
I would achieve this without a complex regular expression. For example, you could do this:
input = "||10||a||ab||abc||"
parts = input.scan(/\w+/) # => ["10", "a", "ab", "abc"]
parts[1..-1].each { |part| part << parts[0] } # => ["a10", "ab10", "abc10"]
"||#{parts.join('||')}||"
str = "||10||a||ab||abc||"
first = nil
str.gsub(/(?<=\|\|)[^\|]+/) { |s| first.nil? ? (first = s) : s + first }
#=> "||10||a10||ab10||abc10||"
The regular expression reads, "match one or more characters in a pipe immediately following two pipes" ((?<=\|\|) being a positive lookbehind).

Regex: Capturing repeating group of groups (Perl)

In Perl, I am trying to capture the words as tokens from the following example strings (there will always be at least one word):
"red" ==> $1 = 'red';
"red|white" ==> $1 = 'red'; $2 = 'white';
"red|white|blue" ==> $1 = 'red'; $2 = 'white'; $3 = 'blue';
etc.
The pattern I see here is: WORD, followed by n sets of "|WORD" [n >= 0]
So from that, I have:
/(\w+)((?:\|)(\w+)*)/
Which, to my understanding will always match the first WORD, and if a |WORD pair exists, capture that as many times as needed.
This doesn't work though, and I've tried several versions like:
/^(\w+)(\|(\w+))*$/
... what am I missing?
Your first regex is actually wrong — the * is in the wrong place — but I'll focus on your second regex, which is correct:
/^(\w+)(\|(\w+))*$/
The problem is that this regex has three capture groups: (\w+), (\|(\w+)), and (\w+). So it will populate, at most, three match variables: $1, $2, and $3. Each match variable corresponds to a single corresponding capture group. Which is not what you want.
What you should do instead is use split:
my #words = split /\|/, "red|white|blue";
# now $words[0] is 'red', $words[1] is 'white', $words[2] is 'blue'

Optional replacement in a Regular Expression

I am creating a regular expression, in VBA that uses the JS flavor of RegEx. Here is the issue I have ran into:
Current RegEx:
(^6)(?:a|ab)?
I have a 6 followed by either nothing, an 'a' or 'ab'.
In the case of a 6 followed by nothing I want to return just the 6 using $1
In the case of a 6 followed by an 'a' or 'ab' I want to return 6B
So I need that 'B' to be optional, contingent on there being an 'a' or 'ab'.
Something to the effect of : $1B?
That of course does not work. I only want the B if the 'a' or 'ab' is present, otherwise just the $1.
Is this possible to do in a single regex pattern? I could just have 2 separate patterns, one looking for only a 6 and the other for 6'a'or'ab'... but my actual regex patterns are much more complicated and I might need several patterns to cover some of them...
Thanks for looking.
I don't think your question is clearly defined--for example, I don't know why you need a replace--but from what I can infer, something like the following may work for you:
target = "6ab"
result = ""
With New RegExp
.Pattern = "^(6)(?:a(b?))?"
Set matches = .Execute(target)
If Not matches Is Nothing Then
Set mat = matches(0)
result = mat.SubMatches(0)
If mat.SubMatches.Count > 1 Then
result = result & UCase(mat.SubMatches(1))
End If
End If
Debug.Print result
End With
You basically inspect the capture groups to determine whether or not there was a hit on the b capture. Whereas you used a|ab, I think and optional b (b?) is more to the point. It's probably stylistic more than anything.
As I mentioned in my comment, there is no way to tell a regex engine to choose between literal alternatives in the replacement string. Thus, all you can do is to access Submatches to check for values that you get there, and return appropriate values.
Note that a regex you have should have 2 capturing groups, or at least a capturing group where you do not know the exact text (the (ab?)).
Here is my idea in code:
Function RxCondReplace(ByVal str As String) As String
RxCondReplace = ""
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "^6(ab?)?"
Set objMatches = objRegExp.Execute(str)
Set objMatch = objMatches.Item(0) ' Only 1 match as .Global=False
If objMatch.SubMatches.Item(0) = "a" Or _ ' check if 1st group equals "a"
objMatch.SubMatches.Item(0) = "ab" Then ' check if 1st group equals "ab"
RxCondReplace = "6B"
ElseIf objMatch.SubMatches.Item(1) = "" Then ' check if 2nd group is empty
RxCondReplace = "6"
End If
End Function
' Calling the function above
Sub CallConditionalReplace()
Debug.Print RxCondReplace("6") ' => 6
Debug.Print RxCondReplace("6a") ' => 6B
Debug.Print RxCondReplace("6ab") ' => 6B
End Sub

Convert string with preg_replace in PHP

I have this string
$string = "some words and then #1.7 1.7 1_7 and 1-7";
and I would like that #1.7/1.7/1_7 and 1-7 to be replaced by S1E07.
Of course, instead of "1.7" is just an example, it could be "3.15" for example.
I managed to create the regular expression that would match the above 4 variants
/\#\d{1,2}\.\d{1,2}|\d{1,2}_\d{1,2}|\d{1,2}-\d{1,2}|\d{1,2}\.\d{1,2}/
but I cannot figure out how to use preg_replace (or something similar?) to actually replace the matches so they end up like S1E07
You need to use preg_replace_callback if you need to pad 0 if the number less than 10.
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$string = preg_replace_callback('/#?(\d+)[._-](\d+)/', function($matches) {
return 'S'.$matches[1].'E'.($matches[2] < 10 ? '0'.$matches[2] : $matches[2]);
}, $string);
You could use this simple string replace:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/', 'S${1}E${2}', $string);
But it would not yield zero-padded numbers for the episode number:
// some words and then S1E7 S1E7 S1E7 and S1E7
You would have to use the evaluation modifier:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/e', '"S".str_pad($1, 2, "0", STR_PAD_LEFT)."E".str_pad($2, 2, "0", STR_PAD_LEFT)', $string);
...and use str_pad to add the zeroes.
// some words and then S01E07 S01E07 S01E07 and S01E07
If you don't want the season number to be padded you can just take out the first str_pad call.
I believe this will do what you want it to...
/\#?([0-9]+)[._-]([0-9]+)/
In other words...
\#? - can start with the #
([0-9]+) - capture at least one digit
[._-] - look for one ., _ or -
([0-9]+) - capture at least one digit
And then you can use this to replace...
S$1E$2
Which will put out S then the first captured group, then E then the second captured group
You need to put brackets around the parts you want to reuse ==> capture them. Then you can access those values in the replacement string with $1 (or ${1} if the groups exceed 9) for the first group, $2 for the second one...
The problem here is that you would end up with $1 - $8, so I would rewrite the expression into something like this:
/#?(\d{1,2})[._-](\d{1,2})/
and replace with
S${1}E${2}
I tested it on writecodeonline.com:
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$result = preg_replace('/#?(\d{1,2})[._-](\d{1,2})/', 'S${1}E${2}', $string);