.NET regex with quote and space - regex

I'm trying to create a regex to match this:
/tags/ud617/?sort=active&page=2" >2
So basically, "[number]" is the only dynamic part:
/tags/ud617/?sort=active&page=[number]" >[number]
The closest I've been able to get (in PowerShell) is:
[regex]::matches('/tags/ud617/?sort=active&page=2" >2
','/tags/ud617/\?sort=active&page=[0-9]+')
But this doesn't provide me with a full match of the dynamic string.
Ultimately, I'll be creating a capture group:
/tags/ud617/?sort=active&page=([number])

Seems easy enough:
$regex = '/tags/ud617/\?sort=active&page=(\d+)"\s>2'
'/tags/ud617/?sort=active&page=2" >2' -match $regex > $nul
$matches[1]
2

[regex]::matches('/tags/ud617/?sort=active&page=3000 >2','/tags/ud617/\?sort=active&page=(\d+) >(\d+)')
Outputs:
Groups : {/tags/ud617/?sort=active&page=3000 >2, 3000, 2}
Success : True
Captures : {/tags/ud617/?sort=active&page=3000 >2}
Index : 0
Length : 41
Value : /tags/ud617/?sort=active&page=3000 >2
This captures the page value and the number after the greater than i.e. 2

Related

Powershell Regex match statement

Trying to get nxxxxx number as the output from below input,
uniqueMember: uid=n039833,ou=people,ou=networks,o=test,c=us
uniqueMember: uid=N019560, ou=people, ou=Networks, o=test, c=Us
Tried,
[Regex]::Matches($item, "uid=([^%]+)\,")
but this gives,
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 14
Length : 43
Value : uid=N018315,ou=people,ou=Networks,o=test,
Success : True
Name : 1
Captures : {1}
Index : 18
Length : 38
Value : N018315,ou=people,ou=Networks,o=test
Some help with improving the match statement appreciated ..
You can use
[Regex]::Matches($s, "(?<=uid=)[^,]+").Value
To save in an object variable:
$matches = [Regex]::Matches($s, "(?<=uid=)[^,]+").Value
Output:
n039833
N019560
Details:
(?<=uid=) - a positive lookbehind that requires uid= text to appear immediately to the left of the current location
[^,]+ - one or more chars other than a comma.
You can use a capture group and prevent matching , and if you don't want to match % you can also exclude that.
$s = "uniqueMember: uid=n039833,ou=people,ou=networks,o=test,c=us\nuniqueMember: uid=N019560, ou=people, ou=Networks, o=test, c=Us"
[regex]::Matches($s,'uid=([^,]+)') | Foreach-Object {$_.Groups[1].Value}
Output
n039833
N019560
Note that in the current pattern there should be a trailing comma present. If that is not ways the case, you can omit matching that from the pattern. If you only want to exclude matching a comma, the pattern will be:
uid=([^,]+)

Regex to capture a variable number of items?

I am trying to use a regex to capture values from SPACE delimited items. Yes, I know that I could use [string]::Split() or -split. The goal is to use a regex in order fit it into the regex of another, larger regex.
There are a variable number of items in the string. In this example there are four (4). The resulting $Matches variable has the full string for all Value members. I also tried the regex '^((.*)\s*)+', but that resulted in '' for all except the first .\value.txt
How can I write a regex to capture a variable number of items.
PS C:\src\t> $s = 'now is the time'
PS C:\src\t> $m = [regex]::Matches($s, '^((.*)\s*)')
PS C:\src\t> $m
Groups : {0, 1, 2}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 15
Value : now is the time
ValueSpan :
PS C:\src\t> $m.Groups.Value
now is the time
now is the time
now is the time
PS C:\src\t> $PSVersionTable.PSVersion.ToString()
7.2.2
You can use [regex]::Match() to find the first matching substring, then call NextMatch() to advance through the input string until no further matches can be made.
I've taken the liberty of simplifying the expression to \S+ (consecutive non-whitespace characters):
$string = 'now is the time'
$regex = [regex]'\S+'
$match = $regex.Match($string)
while($match.Success){
Write-Host "Match at index [$($match.Index)]: '$($match.Value)'"
# advance to the next match, if any
$match = $match.NextMatch()
}
Which will print:
Match at index [0]: 'now'
Match at index [4]: 'is'
Match at index [7]: 'the'
Match at index [11]: 'time'
Mathias' answer shows an iterative approach to retrieving all matches, which may or may not be needed.
Building on your own attempt to use [regex]::Matches(), the solution is as simple as:
$s = 'now is the time'
[regex]::Matches($s, '\S+').Value # -> #('now', 'is', 'the', 'time')
As noted, \S+ matches any non-empty run (+) of non-whitespace characters (\S).
Thanks to member-access enumeration, accessing the .Value property on the method call's result, which is a collection of System.Text.RegularExpressions.Match instances, each instance's .Value property is returned, which in the case of two or more instances results in an array of values.
I guess the following will work for you
[^\s]+
[^\s] means "not a space"
+ means 1 or more characters

Using Regex for complicated naming convention

I'm writing a Computer naming convention validation PowerShell script. I'm following a documentation my client provided. My first path I went down was to break each computer name into bits and process each part, but I seem to have issues near the end. I was thinking it would be easier to just use one regex to ensure its valid. Here is what I am trying to do
Naming rules:
Required: 1st is an alpha character; must be: A,B, or C
Required: 2nd s an alpha character; must be: L,K, or H
Required: The next 4 characters must be alpha and match either: DTLB,SOCK, or NUUB
Required: The next 6 characters are digits but the first part can either be: 1, 8, 13, or 83; the rest doesn't matter as long as they are digits
Optional: The next two characters can be alpha or alpha numeric but must be either: PE, Y1, or AC
Here are some tests with a regex that I understand, which is basic but works. It doesn't validate the actual characters or position, it just checks the first 12 characters and if there are two alphas characters at the end:
$regex = '^\w{12}(?:\w{2})$|^\w{12}$'
'AKDTLB123456' -match $regex; $Matches
True
'ALSOCK834561PE' -match $regex; $Matches
True
What I am trying to do is split these up into named parts and determine if the value matches the right area like (this is an example):
$regex = '(<type>^\w{1}[ABC])(?<form>\w{1,1}[LKH])(<locale>\w{4,4}[DTLB|SOCK|NUUB])(<identifier>\d{1,1}[1|8]|<identifier>\d{1,2}[13|83])(<unique>\d+)(<role>\w{2,2}[PE|AC]$|<role>\w{1}[Y]\d{1}$)'
My goal is to get it to output like this:
Example 1
'AKDTLB893456' -match $regex; $Matches
True
Name Value
---- -----
type A
form K
locale DTLB
identifier 1
Unique 23456
0 AKDTLB893456
Example 2:
'ALSOCK123456PE' -match $regex; $Matches
True
Name Value
---- -----
type A
form L
locale SOCK
identifier 1
Unique 23456
0 ALSOCK123456PE
Example 3
'ALSUCK123456PE' -match $regex; $Matches
False
Name Value
---- -----
type A
form L
locale <--not matched
identifier 1
Unique 23456
0 ALSuCK123456PE
The best I can do is:
$regex = '(^\w{1})(\w{1,1})(\w{4,4})(\d{2}[13|83]|\d{1}[1|2|3])(\d{4,5})(\w{2,2}$|$)'
'ALSOCK124561PE' -match $regex; $Matches
True
Name Value
---- -----
6 PE
5 4561
4 12
3 SOCK
2 L
1 A
0 ALSOCK834561PE
However this doesn't check for if the name is just 112345 not 13
I have been all over the internet and using the regex online tool, but am unable to come up with a solution. This may not be possible to do both. I find if the value is false, its not outputting the one that match and not the rest. Is there a way to get the "why it doesn't match" as an output?
Any ideas?
You may use
^(?<type>[ABC])(?<form>[LKH])(?<locale>DTLB|SOCK|NUUB)(?<identifier>[18](3)?)(?<unique>\d{4}(?(1)|\d))(?<role>PE|Y1|AC)?$
Note: If the pattern must be matched in a case sensitive way, replace -match with -cmatch.
See the .NET regex demo (do not test at regex101.com!)
Details
^ - start of a string
(?<type>[ABC]) - Group "type": A, B or C
(?<form>[LKH]) - Group "type": L, K or H
(?<locale>DTLB|SOCK|NUUB) - Group "locale": DTLB, SOCK or NUUB
(?<identifier>[18](3)?) - Group "identifier": 1 or 8 and then an optional 3 digit captured into Group 1
(?<unique>\d{4}(?(1)|\d)) - Group "unique": four digits and if Group 1 did not match, one more digit is required to match then
(?<role>PE|Y1|AC)? - an optional Group "role": PE, Y1 or AC
$ - end of string.
The second part of your question asked if you could determine where the match failed. You can do something similar to below to test each part of the regex in sections starting from the left:
$strings = 'AKDTLB893456','ALSOCK123456PE','ALSuCK123456PE','ALSOCK123456R2'
$regexes = [ordered]#{}
$regexes.Add('type','^(?<type>[ABC])')
$regexes.Add('form','(?<form>[LKH])')
$regexes.Add('locale','(?<locale>DTLB|SOCK|NUUB)')
$regexes.Add('identifier','(?<identifier>[18])')
$regexes.Add('unique','(?<unique>\d{5})')
$regexes.Add('optional','(?<optional>PE|Y1|AC)?$')
foreach ($string in $strings) {
$test = [text.stringbuilder]''
$regexes.GetEnumerator() | Foreach-Object {
$null = $test.Append($_.Value)
if (!($string -match $test.ToString())) {
"$String failed at $($_.Key)"
Continue
}
}
$matches
}
Explanation:
$regexes is an ordered hash table where we can append some key-value pairs. Order is good here because we want to test matching from left to right in your string.
With a [Text.StringBuilder] object, we can append strings with the Append() method. The idea is to append the new regex string you want to test. If $string continues to match through each value of the hash table, $matches will output. Otherwise, the failed match will cause the current, testing $regex key value to output.
Note that this will not perform as well as a single -match operation. You could test the entire match first and only perform sectional testing when a $false is returned. That will increase performance.

Change 3rd octet of IP in string format using PowerShell

Think I've found the worst way to do this:
$ip = "192.168.13.1"
$a,$b,$c,$d = $ip.Split(".")
[int]$c = $c
$c = $c+1
[string]$c = $c
$newIP = $a+"."+$b+"."+$c+"."+$d
$newIP
But what is the best way? Has to be string when completed. Not bothered about validating its a legit IP.
Using your example for how you want to modify the third octet, I'd do it pretty much the same way, but I'd compress some of the steps together:
$IP = "192.168.13.1"
$octets = $IP.Split(".") # or $octets = $IP -split "\."
$octets[2] = [string]([int]$octets[2] + 1) # or other manipulation of the third octet
$newIP = $octets -join "."
$newIP
You can simply use the -replace operator of PowerShell and a look ahead pattern. Look at this script below
Set-StrictMode -Version "2.0"
$ErrorActionPreference="Stop"
cls
$ip1 = "192.168.13.123"
$tests=#("192.168.13.123" , "192.168.13.1" , "192.168.13.12")
foreach($test in $tests)
{
$patternRegex="\d{1,3}(?=\.\d{1,3}$)"
$newOctet="420"
$ipNew=$test -replace $patternRegex,$newOctet
$msg="OLD ip={0} NEW ip={1}" -f $test,$ipNew
Write-Host $msg
}
This will produce the following:
OLD ip=192.168.13.123 NEW ip=192.168.420.123
OLD ip=192.168.13.1 NEW ip=192.168.420.1
OLD ip=192.168.13.12 NEW ip=192.168.420.12
How to use the -replace operator?
https://powershell.org/2013/08/regular-expressions-are-a-replaces-best-friend/
Understanding the pattern that I have used
The (?=) in \d{1,3}(?=.\d{1,3}$) means look behind.
The (?=.\d{1,3}$ in \d{1,3}(?=.\d{1,3}$) means anything behind a DOT and 1-3 digits.
The leading \d{1,3} is an instruction to specifically match 1-3 digits
All combined in plain english "Give me 1-3 digits which is behind a period and 1-3 digits located towards the right side boundary of the string"
Look ahead regex
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
CORRECTION
The regex pattern is a look ahead and not look behind.
If you have PowerShell Core (v6.1 or higher), you can combine -replace with a script block-based replacement:
PS> '192.168.13.1' -replace '(?<=^(\d+\.){2})\d+', { 1 + $_.Value }
192.168.14.1
Negative look-behind assertion (?<=^(\d+\.){2}) matches everything up to, but not including, the 3rd octet - without considering it part of the overall match to replace.
(?<=...) is the look-behind assertion, \d+ matches one or more (+) digits (\d), \. a literal ., and {2} matches the preceding subexpression ((...)) 2 times.
\d+ then matches just the 3rd octet; since nothing more is matched, the remainder of the string (. and the 4th octet) is left in place.
Inside the replacement script block ({ ... }), $_ refers to the results of the match, in the form of a [MatchInfo] instance; its .Value is the matched string, i.e. the 3rd octet, to which 1 can be added.
Data type note: by using 1, an implicit [int], as the LHS, the RHS (the .Value string) is implicitly coerced to [int] (you may choose to use an explicit cast).
On output, whatever the script block returns is automatically coerced back to a string.
If you must remain compatible with Windows PowerShell, consider Jeff Zeitlin's helpful answer.
For complete your method but shortly :
$a,$b,$c,$d = "192.168.13.1".Split(".")
$IP="$a.$b.$([int]$c+1).$d"
function Replace-3rdOctet {
Param(
[string]$GivenIP,
[string]$New3rdOctet
)
$GivenIP -match '(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})' | Out-Null
$Output = "$($matches[1]).$($matches[2]).$New3rdOctet.$($matches[4])"
Return $Output
}
Copy to a ps1 file and dot source it from command line, then type
Replace-3rdOctet -GivenIP '100.201.190.150' -New3rdOctet '42'
Output: 100.201.42.150
From there you could add extra error handling etc for random input etc.
here's a slightly different method. [grin] i managed to not notice the answer by JeffZeitlin until after i finished this.
[edit - thanks to JeffZeitlin for reminding me that the OP wants the final result as a string. oops! [*blush*]]
what it does ...
splits the string on the dots
puts that into an [int] array & coerces the items into that type
increments the item in the targeted slot
joins the items back into a string with a dot for the delimiter
converts that to an IP address type
adds a line to convert the IP address to a string
here's the code ...
$OriginalIPv4 = '1.1.1.1'
$TargetOctet = 3
$OctetList = [int[]]$OriginalIPv4.Split('.')
$OctetList[$TargetOctet - 1]++
$NewIPv4 = [ipaddress]($OctetList -join '.')
$NewIPv4
'=' * 30
$NewIPv4.IPAddressToString
output ...
Address : 16908545
AddressFamily : InterNetwork
ScopeId :
IsIPv6Multicast : False
IsIPv6LinkLocal : False
IsIPv6SiteLocal : False
IsIPv6Teredo : False
IsIPv4MappedToIPv6 : False
IPAddressToString : 1.1.2.1
==============================
1.1.2.1

Convert string with preg_replace in PHP

I have this string
$string = "some words and then #1.7 1.7 1_7 and 1-7";
and I would like that #1.7/1.7/1_7 and 1-7 to be replaced by S1E07.
Of course, instead of "1.7" is just an example, it could be "3.15" for example.
I managed to create the regular expression that would match the above 4 variants
/\#\d{1,2}\.\d{1,2}|\d{1,2}_\d{1,2}|\d{1,2}-\d{1,2}|\d{1,2}\.\d{1,2}/
but I cannot figure out how to use preg_replace (or something similar?) to actually replace the matches so they end up like S1E07
You need to use preg_replace_callback if you need to pad 0 if the number less than 10.
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$string = preg_replace_callback('/#?(\d+)[._-](\d+)/', function($matches) {
return 'S'.$matches[1].'E'.($matches[2] < 10 ? '0'.$matches[2] : $matches[2]);
}, $string);
You could use this simple string replace:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/', 'S${1}E${2}', $string);
But it would not yield zero-padded numbers for the episode number:
// some words and then S1E7 S1E7 S1E7 and S1E7
You would have to use the evaluation modifier:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/e', '"S".str_pad($1, 2, "0", STR_PAD_LEFT)."E".str_pad($2, 2, "0", STR_PAD_LEFT)', $string);
...and use str_pad to add the zeroes.
// some words and then S01E07 S01E07 S01E07 and S01E07
If you don't want the season number to be padded you can just take out the first str_pad call.
I believe this will do what you want it to...
/\#?([0-9]+)[._-]([0-9]+)/
In other words...
\#? - can start with the #
([0-9]+) - capture at least one digit
[._-] - look for one ., _ or -
([0-9]+) - capture at least one digit
And then you can use this to replace...
S$1E$2
Which will put out S then the first captured group, then E then the second captured group
You need to put brackets around the parts you want to reuse ==> capture them. Then you can access those values in the replacement string with $1 (or ${1} if the groups exceed 9) for the first group, $2 for the second one...
The problem here is that you would end up with $1 - $8, so I would rewrite the expression into something like this:
/#?(\d{1,2})[._-](\d{1,2})/
and replace with
S${1}E${2}
I tested it on writecodeonline.com:
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$result = preg_replace('/#?(\d{1,2})[._-](\d{1,2})/', 'S${1}E${2}', $string);