RegEx targeted replace with Named Captures - regex

Given
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
I can use
$line -match '^\{(?<type>[a-z]+)(-\[(?<target>(C|F|CF))\])?(\[(?<tab>\d+)\])?\}_(?<string>.*)'
And $matches['tab'] will correctly have a value of 3. However, if I then want to increment that value, without also affecting the [3] in the string section things get more complicated. I can use $tabIndex = $line.indexOf("[$tab]") to get the index of the first occurrence, and I can also use $newLine = ([regex]"\[$tab\]").Replace($line, '[4]', 1) to only replace the first occurrence. But I wonder, is there a way to get at the this more directly? It's not strictly necessary, as I will only ever want to replace things within the initial {}_, which has a very consistent form, so replacing first instance works, just wondering if I am missing out on a more elegant solution, which also might be needed in a different situation.

I would change the regex a bit, because mixing Named captures with Numbered captures is not recommended, so it becomes this:
'^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)'
You could then use it like below to replace the tab value:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$newTabValue = 12345
$line -replace '^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)', "{`${type}-[`${target}][$newTabValue]}_`${string}"
The result of this will be:
{initError-[cf][12345]}_Invalid nodes(s): [3]
Regex details:
^ Assert position at the beginning of the string
\{ Match the character “{” literally
(?<type> Match the regular expression below and capture its match into backreference with name “type”
[a-z] Match a single character in the range between “a” and “z”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
- Match the character “-” literally
\[ Match the character “[” literally
(?<target> Match the regular expression below and capture its match into backreference with name “target”
[CF] Match a single character present in the list “CF”
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
\[ Match the character “[” literally
(?<tab> Match the regular expression below and capture its match into backreference with name “tab”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
\} Match the character “}” literally
_ Match the character “_” literally
(?<string> Match the regular expression below and capture its match into backreference with name “string”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

An alternative way of increasing the first number in the brackets is using the -Split operator to access the number you want to change:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$NewLine = $line -split "(\d+)"
$NewLine[1] = [int]$newLine[1] + 1
-join $NewLine
Output:
{initError-[cf][4]}_Invalid nodes(s): [3]

Related

Powershell adding CR at the end of regex match group

I'm gettting a CR between the regex match and the ','. What's going on?
$r_date ='ExposeDateTime=([\w /:]{18,23})'
$v2 = (Select-String -InputObject $_ -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) + ',';
Example of output:
9/25/2018 8:45:19 AM[CR],
Original String:
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
Try this:
$original = #"
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
"#
$r_date ='ExposeDateTime=([\d\s/:]+(?:(?:A|P)M)?)'
$v2 = (Select-String -InputObject $original -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) -join ','
Regex details:
ExposeDateTime= Match the characters “ExposeDateTime=” literally
( Match the regular expression below and capture its match into backreference number 1
[\d\s/:] Match a single character present in the list below
A single digit 0..9
A whitespace character (spaces, tabs, line breaks, etc.)
One of the characters “/:”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
A Match the character “A” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
P Match the character “P” literally
)
M Match the character “M” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
if your input is a multiline string stored in $Original, then this rather simpler regex seems to do the job. [grin] it uses a named capture group and the multiline regex flag to capture the string after ExposedDateTime= and before the next line ending.
$Original -match '(?m)ExposeDateTime=(?<Date>.+)$'
$Matches.Date
output ...
9/25/2018 8:45:19 AM

regex in Perl to replace content containing double equal signs

I need a regex in Perl to turn this:
(== doc_url html/arbitrary_file_name.html ==)
into this:
(/doc_assets/legacy/html/arbitrary_file_name.html)
I've tried all kinds of things. My current attempt looks like this:
$content =~ s!\=\= doc_url ([\w\W]+?)\=\=!/doc_assets/legacy/$1!gis;
(In this particular attempt, I'm just letting the enclosing parentheses remain, since that doesn't change from the input to the output.)
Anyway, nothing is working for me. I assume it's the == throwing things off. Any help will be greatly appreciated.
I guess you need something like:
s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg
i.e.:
#!/usr/bin/perl
$subject = "(== doc_url html/arbitrary_file_name.html ==)";
$subject =~ s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg;
print $subject;
#(/doc_assets/legacy/html/arbitrary_file_name.html)
Ideone Demo
Regex Explanation:
.*?doc_url (.*?/.*?) .*
Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ don’t match at line breaks; Numbered capture
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character string “doc_url ” literally (case sensitive) «doc_url »
Match the regex below and capture its match into backreference number 1 «(.*?/.*?)»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “/” literally «/»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “ ” literally « »
Match any single character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
(/doc_assets/legacy/$1)
Insert the character string “(/doc_assets/legacy/” literally «(/doc_assets/legacy/»
Insert the text that was last matched by capturing group number 1 «$1»
Insert the character “)” literally «)»

Javascript transformation

Is there any simple way to transform:
"<A[hello|home]>"
to:
"hello|home"
Thanks!
Apart from the clever advice in the comments to simply remove certain characters, if you are unable to remove these characters because they are present elsewhere in the text and do want to match that format, here is a way to do it with regex:
Search: <\w+\[([^|]*\|[^\]]*)\]>
Replace: \1 or $1 depending on editor or regex engine.
See the Substitution pane at the bottom of the demo.
Explanation
<\w+\[([^|]*\|[^\]]*)\]>
Match the character “<” literally <
Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the character “[” literally \[
Match the regex below and capture its match into backreference number 1 ([^|]*\|[^\]]*)
Match any character that is NOT a “|” [^|]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “|” literally \|
Match any character that is NOT a “]” [^\]]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “]” literally \]
Match the character “>” literally >
\1
Insert the backslash character \
Insert the character “1” literally 1

RegEx that matches a string of numbers in a particular format?

I need a regular expression that will tell if a string is in the following format. The groups of numbers must be comma delimited. Can contain a range of numbers separated by a -
300, 200-400, 1, 250-300
The groups can be in any order.
This is what I have so far, but it's not matching the entire string. It's only matching the groups of numbers.
([0-9]{1,3}-?){1,2},?
Try this one:
^(?:\d{1,3}(?:-\d{1,3})?)(?:,\s*\d{1,3}(?:-\d{1,3})?|$)+
Since you didn't specify the number ranges I leave this to you. In any case you should do math with regex :)
Explanation:
"
^ # Assert position at the beginning of the string
(?: # Match the regular expression below
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
, # Match the character “,” literally
\\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
- # Match the character “-” literally
\\d # Match a single digit 0..9
{1,3} # Between one and 3 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$
This should work:
/^([0-9]{1,3}(-[0-9]{1,3})?)(,\s?([0-9]{1,3}(-[0-9]{1,3})?))*$/
You need some repetition:
(?:([0-9]{1,3}-?){1,2},?)+
To ensure that the numbers are correct, i.e. that you don't match numbers like 010, you might want to change the regex slightly. I also changed the range part of the regex, so that you don't match things like 100-200- but only 100 or 100-200, and added support for whitespaces after the comma (optional):
(?:(([1-9]{1}[0-9]{0,2})(-[1-9]{1}[0-9]{0,2})?){1,2},?\s*)+
Also, depending on what you want to capture, you might want to change the capturing brackets () to non capturing ones (?:)
UPDATE
A revised version based on the latest comments:
^\s*(?:(([1-9][0-9]{0,2})(-[1-9][0-9]{0,2})?)(?:,\s*|$))+$
([0-9-]+),\s([0-9-]+),\s([0-9-]+),\s([0-9-]+)
Try this regular expression
^(([0-9]{1,3}-?){1,2},?\s*)+$

How do I display this regex result in javascript?

Regular expressions aren't exactly my strong suit. I got a regex for validating international phone numbers here. The validation bit works for me but I don't understand how I can take the regex result and use it to format the number. My question is how do I figure out, from the regex, what the groupings are that I can use to display?
var intl1RegexObj = /^((\+)?[1-9]{1,2})?([-\s\.])?((\(\d{1,4}\))|\d{1,4})(([-\s\.])?[0-9]{1,12}){1,2}$/;
if (IntlRegexObj.test(businessPhoneValue))
{
var formattedPhoneNumber = businessPhoneValue.replace(IntlRegexObj, "($1)");
// display formatted result
}
After simplifying that mess of a regex:
if (subject.match(/^((?:\+)?[1-9]{1,2})?[\-\s.]?((?:\(\d{1,4}\))|\d{1,4})([\-\s.]?\d{1,12}){1,2}$/)) {
// Successful match
}
There are now only 3 capturing groups.
First one $1 is easy, the country code with an optional +.
Then you have the local area code, basically 1-4 numbers with / without parentheses optionally prefixed by [-\s.]. That's $2
Finally you have your the actual phone number which can be from 1 to 24 numbers, including optional space or dot or minus sign [-\s.]
More detailed explanation:
"
^ # Assert position at the beginning of the string
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
\+ # Match the character “+” literally
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[1-9] # Match a single character in the range between “1” and “9”
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?: # Match the regular expression below
\( # Match the character “(” literally
\d # Match a single digit 0..9
{1,4}# Between one and 4 times, as many times as possible, giving back as needed (greedy)
\) # Match the character “)” literally
)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\d # Match a single digit 0..9
{1,4} # Between one and 4 times, as many times as possible, giving back as needed (greedy)
)
( # Match the regular expression below and capture its match into backreference number 3
[-\s.] # Match a single character present in the list below
# The character “-”
# A whitespace character (spaces, tabs, line breaks, etc.)
# The character “.”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
{1,12} # Between one and 12 times, as many times as possible, giving back as needed (greedy)
){1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"
This regex is whoefully inadequate. As I go to your link, even a couple of the ones listed in non-match will match with this regex. The regex is purely an overlap of possibilities by the look of the groupings that happen to be capture groupings. And any sense of parsing out real parts of the number are sadly destroyed with this regex.
Expanded, it looks like this:
^
(
(\+)?
[1-9]{1,2}
)?
([-\s\.])?
(
(
\(\d{1,4}\)
)
|
\d{1,4}
)
(
([-\s\.])?
[0-9]{1,12}
){1,2}
$
I even tried to forumulate a proper capture grouping for its parts and sadly it shows the problems.
^
(?: \+ )?
( [1-9]{1,2} |) # Capt Group 1, international code (or not)
(?| # Branch Reset
\( (\d{1,4}) \) # Capure Group 2, area code
| (\d{1,4})
)
(?:[-\s.])?
( # Capt Group 3, the rest ########-########
[0-9]{1,12}
[-\s.]?
[0-9]{1,12}?
)
$
There might be something better out there, but this is just a validation wonder that doesen't really work correctly for the most part to do even that.
Regular expressions are not used to format anything. They just tell you if the string you are validating abides by the regular expression's rules. Example would be in a form where a user is entering a phone number. If the string they enter into the form doesn't match the regular expression then the form's validation which uses the regular expression to check the string will say something like, "Phone number is not in correct format."