How to increase a number in a string that also contains letters? - regex

I want to perform arithmetic operations on numbers in a string.
For example: SGJKR67 should become SGJKR68.
Another one: NYSC34 should become NYSC35.
Only numbers are changed, in this example both are increased by one.

Using regex and Capturing Groups can solve your problem:
$reg = [regex]::new("([A-Z]+)(\d+)")
$m = $reg.Match("SGJKR67")
$digits = $m.Groups[2] # will be 67
$digits = $digit + 1; # or apply anything you want
$result = "$($m.Groups[1])$digits" # will be SGJKR and 68.
You will have 3 groups for your matches:
The whole "word".
The letters
the digits.

In PowerShell Core (v6.1+), you can use the -replace operator:
with a regex (regular expression) for matching the embedded numbers
\d+ is a sequence of one or more (+) digits (\d)
and a script block ({ ... }) as the replacement operand, which allows you to dynamically determine replacement strings on a per-match basis:
Inside the script block, which is called for every match, automatic variable $_ contains a [System.Text.RegularExpressions.Match] instance with information about the match at hand; in the simplest case, $_.Value returns the matched text.
PS> 'SGJKR67', 'NYSC34' -replace '\d+', { 1 + [int] $_.Value }
SGJKR68
NYSC35
In Windows PowerShell, where script-block replacement operands aren't supported, you must use the .NET [regex] type's static .Replace() method directly:
PS> 'SGJKR67', 'NYSC34' | ForEach-Object {
[regex]::Replace($_, '\d+', { param($m) 1 + [int] $m.Value })
}
SGJKR68
NYSC35
Note: Unlike -replace, [regex]::Match() doesn't support passing an array of input strings, hence the use of a ForEach-Object call; inside its script block ({ ... }), $_ refers to the input string at hand.
The approach is fundamentally the same, except that the match at hand (the [System.Text.RegularExpressions.Match] instance) is passed as an argument to the script block, which parameter declaration param($m) captures in variable $m.

You have to separate the numbers from the string, calculate the new number and return everything as string.
[System.String]$NumberInString = 'SGJKR68'
[System.String]$String = $NumberInString.Substring(0, 5)
[System.Int32]$Int = $NumberInString.Substring(5, 2)
$Int += 1
[System.String]$NewNumberInString = ($String + $Int)
$NewNumberInString

Related

Powershell Regex question. Escape parenthesis

Been beating my head around this one all day and I'm getting close but not quite getting there. I have a small subset of my much larger script for just the regex part. Here is the script so far:
$CCI_ID = #(
"003417 AR-2.1"
"003425 AR-2.9"
"003392 AP-1.12"
"009012 APP-1(21).1"
)
[regex]::matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2}[-][\d][\(?\){0,1}[.][\d]{1,})') |
ForEach-Object {
if($_.Groups[1].Value.length -gt 0){
write-host $('CCI-' + $_.Groups[1].Value.trim())}
else{$_.Groups[2].Value.trim()}
}
CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
PP-1(21
CCI-1
The output is correct for all but the last one. It should be:
CCI-009012
APP-1(21).1
Thanks for any advice.
Instead of describing and quantifying the (optional) opening and closing parenthesis separately, group them together and then make the whole group optional:
(?:\(\d+\))?
The whole pattern thus ends up looking like:
[regex]::Matches($CCI_ID, '(\d{1,})|([a-zA-Z]{2,3}[-][\d](?:\(\d+\))?[.][\d]{1,})')
In your pattern you are using an alternation | but looking at the example data you can match 1 or more whitespaces after it instead.
If there is a match for the pattern, the group 1 value already contains 1 or more digits so you don't have to check for the Value.length
The pattern with the optional digits between parenthesis:
\b(\d+)\s+([a-zA-Z]{2,}-\d(?:\(\d+\))?\.\d+)\b
See a regex101 demo.
$CCI_ID = #(
"003417 AR-2.1"
"003425 AR-2.9"
"003392 AP-1.12"
"009012 APP-1(21).1"
)
[regex]::matches($CCI_ID, '\b(\d+)\s+([a-zA-Z]{2,}-\d(?:\(\d+\))?\.\d+)\b') |
ForEach-Object {
write-host $( 'CCI-' + $_.Groups[1].Value.trim() )
write-host $_.Groups[2].Value.trim()
}
Output
CCI-003417
AR-2.1
CCI-003425
AR-2.9
CCI-003392
AP-1.12
CCI-009012
APP-1(21).1
As you experiencing here, Regex expressions might become very complex and unreadable.
Therefore it is often an good idea to view your problem from two different angles:
Try matching the part(s) you want, or
Try matching the part(s) you don't want
In your case it is probably easier to match the part that you don't want: the delimiter, the space, and split your string upon that, which is apparently want to achieve:
$CCI_ID | Foreach-Object {
$Split = $_ -Split '\s+', 2
'CCI-' + $Split[0]
$Split[1]
}
$_ -Split '\s+', 2, Splits the concerned string based on 1 or more white-spaces (where you might also consider a literal space: -Split ' '). The , 2 will prevent the the string to split in more than 2 parts. Meaning that the second part will not be further split even if it contains a spaces.

how to get the value in a string using regex

I am confused on how to get the number in the string below. I crafted a regex
$regex = '(?s)^.+\bTotal Members in Group: ([\d.]+).*$'
I need only the number part 2 inside a long string a line reads Total Members in Group: 2.
My $regex returns me the entire line but what i really need is the number.
The number is random
cls
$string = 'Total Members in Group: 2'
$membersCount = $string -replace "\D*"
$membersCount
One more way:
cls
$string = 'Total Members in Group: 2'
$membersCount = [Regex]::Match($string, "(?<=Group:\s*)\d+").Value
$membersCount
Fors1k's helpful answer shows elegant solutions that bypass the need for a capture group ((...)) - they are the best solutions for the specific example in the question.
To answer the general question as to how to extract substrings
if and when capture groups are needed, the PowerShell-idiomatic way is to:
Either: Use -match, the regular-expression matching operator with a single input string: if the -match operation returns $true, the automatic $Matches variable reflects what the regex captured, with property (key) 0 containing the full match, 1 the first capture group's match, and so on.
$string = 'Total Members in Group: 2'
if ($string -match '(?s)^.*\bTotal Members in Group: (\d+).*$') {
# Output the first capture group's match
$Matches.1
}
Note:
-match only ever looks for one match in the input.
Direct use of the underlying .NET APIs is required to look for all matches, via [regex]::Matches() - see this answer for an example.
While -match only populates $Matches with a single input string (with an array, it acts as a filter and returns the sub-array of matching input strings), you can use a switch statement with -Regex to apply -match behavior to an array of input strings; here's a simplified example (outputs '1', '2', '3'):
switch -Regex ('A1', 'A2', 'A3') {
'A(.)' { $Matches.1 }
}
Or: Use -replace, the regular-expression-based string replacement operator, to match the entire input string and replace it with a reference to what the capture group(s) of interest captured; e.g, $1 refers to the first capture group's value.
$string = 'Total Members in Group: 2'
$string -replace '(?s)^.*\bTotal Members in Group: (\d+).*$', '$1'
Note:
-replace, unlike -match, looks for all matches in the input
-replace also supports an array of input strings, in which case each array element is processed separately (-match does too, but in that case it does not populate $Matches; as stated, switch can remedy that).
A caveat re -replace is that if the regex does not match, the input string is returned as-is

Regular expression to match exactly and only n times

If I have the lines:
'aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf'
'asfdcacttaskdfjcacttklasdjf'
'cksjdfcacttlkasdjf'
I want to match them by the number of times a repeating subunit (cactt) occurs. In other words, if I ask for n repeats, I want matches that contain n and ONLY n instances of the pattern.
My initial attempt was implemented in perl and looks like this:
sub MATCHER {
print "matches with $_ CACTT's\n";
my $pattern = "^(.*?CACTT.+?){$_}(?!.*?CACTT).*\$";
my #grep_matches = grep(/$pattern/, #matching);
print "$_\n" for #grep_matches;
my #copy = #grep_matches;
my $squashed = #copy;
print "number of rows total: $squashed\n";
}
for (2...6) {
MATCHER($_);
}
Notes:
#matching contains the strings from 1, 2, and 3 in an array.
the for loop is set from integers 2-6 because I have a separate regex that works to forbid duplicate occurrences of the pattern.
This loop ALMOST works except that for n=2, matches containing 3 occurrences of the "cactt" pattern are returned. In fact, for any string containing n+1 matches (where n>=2), lines with n+1 occurrences are also returned by the match. I though the negative lookahead could prevent this behavior in perl. If anyone could give me thoughts, I would be appreciative.
Also, I have thought of getting a count per line and separating them by count; I dislike the approach because it requires two steps when one should accomplish what I want.
I would be okay with a:
foreach (#matches) { $_ =~ /$pattern/; push(#selected_by_n, $1);}
The regex seems like it should be similar, but for whatever reason in practice the results differ dramatically.
Thanks in advance!
Your code is sort of strange. This regex
my $pattern = "^(.*?CACTT.+?){$_}(?!.*?CACTT).*\$";
..tries to match first beginning of string ^, then a minimal match of any character .*?, followed by your sequence CACTT, followed by a minimal match (but slightly different from .*?) .+?. And you want to match these $_ times. You assume $_ will be correct when calling the sub (this is bad). Then you have a look-ahead assumption that wants to make sure that there is no minimal match of any char .*? followed by your sequence, followed by any char of any length followed by end of line $.
First off, this is always redundant: ^.*. Beginning of line anchor followed by any character any number of times. This actually makes the anchor useless. Same goes for .*$. Why? Because any match that will occur, will occur anyway at the first possible time. And .*$ matches exactly the same thing that the empty string does: Anything.
For example: the regex /^.*?foo.*?$/ matches exactly the same thing as /foo/. (Excluding cases of multiline matching with strings that contain newlines).
In your case, if you want to count the occurrences of a string inside a string, you can just match them like this:
my $count = () = $str =~ /CACTT/gi;
This code:
my #copy = #grep_matches;
my $squashed = #copy;
Is completely redundant. You can just do my $squashed = #grep_matches. It makes little to no sense to first copy the array.
This code:
MATCHER($_);
Does the same as this: MATCHER("foo") or MATCHER(3.1415926536). You are not using the subroutine argument, you are ignoring it, and relying on the fact that $_ is global and visible inside the sub. What you want to do is
sub MATCHER {
my $number = shift; # shift argument from #_
Now you have encapsulated the code and all is well.
What you want to do in your case, I assume, is to count the occurrences of the substring inside your strings, then report them. I would do something like this
use strict;
use warnings;
use Data::Dumper;
my %data;
while (<DATA>) {
chomp;
my $count = () = /cactt/gi; # count number of matches
push #{ $data{$count} }, $_; # store count and original
}
print Dumper \%data;
__DATA__
aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf
asfdcacttaskdfjcacttklasdjf
cksjdfcacttlkasdjf
This will print
$VAR1 = {
'2' => [
'asfdcacttaskdfjcacttklasdjf'
],
'3' => [
'aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf'
],
'1' => [
'cksjdfcacttlkasdjf'
]
};
This is just to demonstrate how to create the data structure. You can now access the strings in the order of matches. For example:
for (#$data{3}) { # print strings with 3 matches
print;
}
Would you just do something like this:
use warnings;
use strict;
my $n=2;
my $match_line_cnt=0;
my $line_cnt=0;
while (<DATA>) {
my $m_cnt = () = /cactt/g;
if ($m_cnt>=$n){
print;
$match_line_cnt++;
}
$line_cnt++;
}
print "total lines: $line_cnt\n";
print "matched lines: $match_line_cnt\n";
print "squashed: ",$line_cnt-$match_line_cnt;
__DATA__
aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf
asfdcacttaskdfjcacttklasdjf
cksjdfcacttlkasdjf
prints:
aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf
asfdcacttaskdfjcacttklasdjf
total lines: 3
matched lines: 2
squashed: 1
I think you're unintentionally asking two seperate questions.
If you want to directly capture the number of times a pattern matches in a string, this one liner is all you need.
$string = 'aslkdfjcacttlaksdjcacttlaksjdfcacttlskjdf';
$pattern = qr/cactt/;
print $count = () = $string =~ m/$pattern/g;
-> 3
That last line is as if you had written $count = #junk = $string =~ m/$pattern/g; but without needing an intermediate array variable. () = is the null list assignment and it throws away whatever is assigned to it just like scalar undef = throws away its right hand side. But, the null list assignment still returns the number of things thrown away when its left hand side is in scalar context. It returns an empty list in list context.
If you want to match strings that only contain some number of pattern matches, then you want to stop matching once too many are found. If the string is large (like a document) then you would waste a lot of time counting past n.
Try this.
sub matcher {
my ($string, $pattern, $n) = #_;
my $c = 0;
while ($string =~ m/$pattern/g) {
$c++;
return if $c > $n;
}
return $c == $n ? 1 : ();
}
Now there is one more option but if you call it over and over again it gets inefficient. You can build a custom regex that matches only n times on the fly. If you only build this once however, it's just fine and speedy. I think this is what you originally had in mind.
$regex = qr/^(?:(?:(?!$pattern).)*$pattern){$n}(?:(?!$pattern).)*$/;
I'll leave the rest of that one to you. Check for n > 1 etc. The key is understanding how to use lookahead. You have to match all the NOT THINGS before you try to match THING.
https://perldoc.perl.org/perlre

Change 3rd octet of IP in string format using PowerShell

Think I've found the worst way to do this:
$ip = "192.168.13.1"
$a,$b,$c,$d = $ip.Split(".")
[int]$c = $c
$c = $c+1
[string]$c = $c
$newIP = $a+"."+$b+"."+$c+"."+$d
$newIP
But what is the best way? Has to be string when completed. Not bothered about validating its a legit IP.
Using your example for how you want to modify the third octet, I'd do it pretty much the same way, but I'd compress some of the steps together:
$IP = "192.168.13.1"
$octets = $IP.Split(".") # or $octets = $IP -split "\."
$octets[2] = [string]([int]$octets[2] + 1) # or other manipulation of the third octet
$newIP = $octets -join "."
$newIP
You can simply use the -replace operator of PowerShell and a look ahead pattern. Look at this script below
Set-StrictMode -Version "2.0"
$ErrorActionPreference="Stop"
cls
$ip1 = "192.168.13.123"
$tests=#("192.168.13.123" , "192.168.13.1" , "192.168.13.12")
foreach($test in $tests)
{
$patternRegex="\d{1,3}(?=\.\d{1,3}$)"
$newOctet="420"
$ipNew=$test -replace $patternRegex,$newOctet
$msg="OLD ip={0} NEW ip={1}" -f $test,$ipNew
Write-Host $msg
}
This will produce the following:
OLD ip=192.168.13.123 NEW ip=192.168.420.123
OLD ip=192.168.13.1 NEW ip=192.168.420.1
OLD ip=192.168.13.12 NEW ip=192.168.420.12
How to use the -replace operator?
https://powershell.org/2013/08/regular-expressions-are-a-replaces-best-friend/
Understanding the pattern that I have used
The (?=) in \d{1,3}(?=.\d{1,3}$) means look behind.
The (?=.\d{1,3}$ in \d{1,3}(?=.\d{1,3}$) means anything behind a DOT and 1-3 digits.
The leading \d{1,3} is an instruction to specifically match 1-3 digits
All combined in plain english "Give me 1-3 digits which is behind a period and 1-3 digits located towards the right side boundary of the string"
Look ahead regex
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
CORRECTION
The regex pattern is a look ahead and not look behind.
If you have PowerShell Core (v6.1 or higher), you can combine -replace with a script block-based replacement:
PS> '192.168.13.1' -replace '(?<=^(\d+\.){2})\d+', { 1 + $_.Value }
192.168.14.1
Negative look-behind assertion (?<=^(\d+\.){2}) matches everything up to, but not including, the 3rd octet - without considering it part of the overall match to replace.
(?<=...) is the look-behind assertion, \d+ matches one or more (+) digits (\d), \. a literal ., and {2} matches the preceding subexpression ((...)) 2 times.
\d+ then matches just the 3rd octet; since nothing more is matched, the remainder of the string (. and the 4th octet) is left in place.
Inside the replacement script block ({ ... }), $_ refers to the results of the match, in the form of a [MatchInfo] instance; its .Value is the matched string, i.e. the 3rd octet, to which 1 can be added.
Data type note: by using 1, an implicit [int], as the LHS, the RHS (the .Value string) is implicitly coerced to [int] (you may choose to use an explicit cast).
On output, whatever the script block returns is automatically coerced back to a string.
If you must remain compatible with Windows PowerShell, consider Jeff Zeitlin's helpful answer.
For complete your method but shortly :
$a,$b,$c,$d = "192.168.13.1".Split(".")
$IP="$a.$b.$([int]$c+1).$d"
function Replace-3rdOctet {
Param(
[string]$GivenIP,
[string]$New3rdOctet
)
$GivenIP -match '(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})' | Out-Null
$Output = "$($matches[1]).$($matches[2]).$New3rdOctet.$($matches[4])"
Return $Output
}
Copy to a ps1 file and dot source it from command line, then type
Replace-3rdOctet -GivenIP '100.201.190.150' -New3rdOctet '42'
Output: 100.201.42.150
From there you could add extra error handling etc for random input etc.
here's a slightly different method. [grin] i managed to not notice the answer by JeffZeitlin until after i finished this.
[edit - thanks to JeffZeitlin for reminding me that the OP wants the final result as a string. oops! [*blush*]]
what it does ...
splits the string on the dots
puts that into an [int] array & coerces the items into that type
increments the item in the targeted slot
joins the items back into a string with a dot for the delimiter
converts that to an IP address type
adds a line to convert the IP address to a string
here's the code ...
$OriginalIPv4 = '1.1.1.1'
$TargetOctet = 3
$OctetList = [int[]]$OriginalIPv4.Split('.')
$OctetList[$TargetOctet - 1]++
$NewIPv4 = [ipaddress]($OctetList -join '.')
$NewIPv4
'=' * 30
$NewIPv4.IPAddressToString
output ...
Address : 16908545
AddressFamily : InterNetwork
ScopeId :
IsIPv6Multicast : False
IsIPv6LinkLocal : False
IsIPv6SiteLocal : False
IsIPv6Teredo : False
IsIPv4MappedToIPv6 : False
IPAddressToString : 1.1.2.1
==============================
1.1.2.1

PowerShell Regex - word with wildcards and commas

Trying to do a replace on what I understand to be a simple operation but hitting a wall.
I can replace a word with a comma on the end:
$firstval = 'ssonp,RDPNP,LanmanWorkstation,webclient,MfeEpePcNP,PRNetworkProvider'
($firstval) -replace 'webclient+,',''
ssonp,RDPNP,LanmanWorkstation,MfeEpePcNP,PRNetworkProvider
But haven't been able to work out how to add a wildcard in the word, or how I'd have multiple words with wildcards proceeded by a comma, e.g.:
w* client+,* fee*, etc
(spaces added to stop being interpreted as formatting within the question)
Played with a few permeations and attempted to use examples from other questions without any luck.
The -replace operator takes a regular expression as its first parameter. You seem to be confusing wildcards and regular expressions. Your pattern w*client+,*fee*,, though a valid regular expression, seems to be intended to use wildcards.
The regular expression equivalent of the * wildcard is .*, where . means "any character" and * means "0 or more occurrences". Thus, the regular expression equivalent of w*client, would be w.*client,, and, similarly the regular expression equivalent of *fee*, would be .*fee.*,. Since the string to be searched has comma-separated values, however, we don't want our patterns to include "any character" (.*) but rather "any character but comma" ([^,]*). Therefore, the patterns to use become w[^,]*client, and [^,]*fee[^,]*,, respectively.
To search for both words in a string, separate the two patterns with |. The following builds such a pattern and tests it against strings with a match in various locations:
# Match w*client or *fee*
$wordPattern = 'w[^,]*client|[^,]*fee[^,]*';
# Match $wordPattern and at most one comma before or after
$wordWithAdjacentCommaPattern = '({0}),?|,({0})$' -f $wordPattern;
"`$wordWithAdjacentCommaPattern: $wordWithAdjacentCommaPattern";
# Replace single value
'webclient', `
# Replace first value
'webclient,middle,last', `
# Replace middle value
'first,webclient,last', `
# Replace last value
'first,middle,webclient' `
| ForEach-Object -Process { '"{0}" => "{1}"' -f $_, ($_ -replace $wordWithAdjacentCommaPattern); };
This outputs the following:
$wordWithAdjacentCommaPattern: (w[^,]*client|[^,]*fee[^,]*),?|,(w[^,]*client|[^,]*fee[^,]*)$
"webclient" => ""
"webclient,middle,last" => "middle,last"
"first,webclient,last" => "first,last"
"first,middle,webclient" => "first,middle"
A non-regex alternative you might consider would be to split your input string into individual values, filter out values that match certain wildcards, and reassemble what's left into comma-separated values:
(
'ssonp,RDPNP,LanmanWorkstation,webclient,MfeEpePcNP,PRNetworkProvider' -split ',', -1, 'SimpleMatch' `
| Where-Object { $_ -notlike 'w*client' -and $_ -notlike '*fee*'; } `
) -join ',';
By the way, you used the regular expression webclient+, to match and remove the text webclient, from your string (looks like the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkProvider\Order\ProviderOrder registry value). Just a note that, with the +, that will search for the literal text webclien followed by 1 or more occurrences of t followed by the literal text ,. Thus, that will match webclientt,, webclienttt,, webclientttttttttt,, etc. as well webclient,. If you are only interested in matching webclient, then you can just use the pattern webclient, (no +).