Remove all characters except regex pattern in array - regex

So Im creating an array of all the versions of a particular pkg in a directory
What I want to do is strip out all the characters except the version numbers
The first array has info such as
GoogleChrome.45.45.34.nupkg
GoogleChrome.34.28.34.nupkg
So the output I need is
45.45.34
34.28.34
$dirList = Get-ChildItem $sourceDir -Recurse -Include "*.nupkg" -Exclude
$pkgExclude |
Foreach-Object {$_.Name}
$reg = '.*[0-9]*.nupkg'
$appName ='GoogleChrome'
$ouText = $dirList | Select-String $appName$reg -AllMatches | % {
$_.Matches.Value }
$ouText
$verReg='(\d+)(.)(?!nupkg)'
The last regex matches the pattern of what I want to keep but I cant figure out how to extract what I dont need.

You do not need to post-process matches if you apply the right pattern from the start.
In order to extract . separated digits in between GoogleChrome. and .nupkg you may use
Select-String '(?<=GoogleChrome\.)[\d.]+(?=\.nupkg)' -AllMatches
See the regex demo
Details
(?<=GoogleChrome\.) - the location should be preceded with GoogleChrome. substring
[\d.]+ - one or more digits or/and .
(?=\.nupkg) - there must be .nupkg immediately to the right of the current location.
If .nupkg should not be relied upon, use
Select-String '(?<=GoogleChrome\.)\d+(?:\.\d+)+' -AllMatches
Here, \d+(?:\.\d+)+ will match 1 or more digits followed with 1 or more occurrences of a . and 1+ digits only if preceded with GoogleChrome..

(\d+.?)+(?!nupkg)
this would give you desired output in the match, check the regex demo

Related

Replacing a Matched Character in Powershell

I have a text file of 3 name entries:
# dot_test.txt
001 AALTON, Alan .....25 Every Street
006 JOHNS, Jason .... 3 Steep Street
002 BROWN. James .... 101 Browns Road
My task is to find instances of NAME. when it should be NAME, using the following:
Select-String -AllMatches -Path $input_path -Pattern '(?s)[A-Z]{3}.*?\D(?=\s|$)' -CaseSensitive |
ForEach-Object { if($_.Matches.Value -match '\.$'){$_.Matches.Value -replace '\,$'} }
The output is:
BROWN.
The conclusion is this script block identifies the instance of NAME. but fails to make the replacement.
Any suggestions on how to achieve this would be appreciated.
$_.Matches.Value -replace '\,$'
This attempts to replace a , (which you needn't escape as \,) at the end of ($) your match with the empty string (due to the absence of a second, replacement operand), i.e. it would effectively remove a trailing ,.
However, given that your match contains no , and that you instead want to replace its trailing . with ,, use the following:
$_.Matches.Value -replace '\.$', ',' # -> 'BROWN,'
You can use -replace directly, and if you need to replace both a comma and dot at the end of the string, use [.,]$ regex:
Select-String -AllMatches -Path $input_path -Pattern '(?s)[A-Z]{3}.*?\D(?=\s|$)' -CaseSensitive | % {$_.Matches.Value -replace '\.$', ','}
Details:
(?s)[A-Z]{3}.*?\D(?=\s|$) - matches
(?s) - RegexOptions.Singleline mode on and . can match line breaks
[A-Z]{3} - three uppercase ASCII letters
.*? - any zero or more chars as few as possible
\D - any non-digit char
(?=\s|$) - a positive lookahead that matches a location either immediately followed with a whitespace or end of string.
The \.$ pattern matches a . at the end of string.

Regular expression to locate one string appearing anywhere after another but before someting

I have an EDI file. This is the piece in question:
N1*ST*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*TEST#GMAIL.COM
N1*BY*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*TEST2#GMAIL.COM
I am using powershell
Get-ChildItem 'C:\Temp\*.edi' | Where-Object {(Select-String -InputObject $_ -Pattern 'PER\*EM\*\w+#\w+\.\w+' -List)}
I want to find the email address that appears after the N1*ST, but before the N1*BY. I have the expression that works for an email address but I am stuck on how to only get the one value. The real issue is sometimes the email is there and other times it is not. So I really do want to ignore that second email after the N1*BY.
Thanks in advance for the help.
You can use
(?s)(?<=N1\*ST.*)PER\*EM\*\w+#\w+\.\w+(?=.*N1\*BY)
See the .NET regex demo.
Details
(?s) - a DOTALL (RegexOptions.Singleline in .NET) regex inline modifier making . match newline chars, too
(?<=N1\*ST.*) - a positive lookbehind that matches a location immediaely preceded with N1*ST
PER\*EM\* -a PER*EM* string
\w+#\w+ - 1+ word chars, #, and 1+ word chars
\. - a dot
\w+ - 1+ word chars
(?=.*N1\*BY) - a positive lookahead that matches a location immediaely followed with N1*BY literal string.
NOTE: You need to read in the file contents with Get-Content $filepath -Raw in order to find the proper match.
Something like
Get-ChildItem 'C:\Temp\*.edi' | % { Get-Content $_ -Raw | Select-String -Pattern '(?s)(?<=N1\*ST.*)PER\*EM\*\w+#\w+\.\w+(?=.*N1\*BY)' } | % { $_.Matches.value }

Select-String: match a string only if it isn't preceded by a specific character

I have a list of files that contain either of the two strings:
"stuff" or ";stuff"
I'm trying to write a PowerShell Script that will return only the files that contain "stuff". The script below currently returns all the files because obviously "stuff" is a substring of ";stuff"
For the life of me, I cannot figure out how to only matches file that contain "stuff", without a preceding ;
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern "stuff" -SimpleMatch $_ }
Note: C:\temp\list\list.txt contains a list of file paths that are each passed to Select-String.
Thanks for the help.
You cannot perform the desired matching with literal substring searches (-SimpleMatch).
Instead, use a regex with a negative look-behind assertion ((?<!..)) to rule out stuff substrings preceded by a ; char.: (?<!;)stuff
Applied to your command:
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern '(?<!;)stuff' -LiteralPath $_ }
Regex pitfalls:
It is tempting to use [^;]stuff instead, using a negated (^) character set ([...]) (see this answer); however, this will not work as expected if stuff appears at the very start of a line, because a character set - whether negated or not - only matches an actual character, not the start-of-the-line position.
It is then tempting to apply ? to the negated character set (for an optional match - 0 or 1 occurrence): [^;]?stuff. However, that would match a string containing ;stuff again, given that stuff is technically preceded by a "0-repeat occurrence" of the negated character set; thus, ';stuff' -match '[^;]?stuff' yields $true.
Only a look-behind assertion works properly in this case - see regular-expressions.info.
To complement #mklement0's answer, I suggest an alternative approach to make your code easier to read and understand:
#requires -Version 4
#(Get-Content -Path 'C:\Temp\list\list.txt').
ForEach([IO.FileInfo]).
Where({ $PSItem | Select-String -Pattern '(?<!;)stuff' -Quiet })
This will turn your strings into objects (System.IO.FilePath) and utilizes the array functions ForEach and Where for brevity/conciseness. Further, this allows you to pipe the paths as objects which will be accepted by the -Path parameter into Select-String to make it more understandable (I find long lists of parameter sets difficult to read).
The example code posted won't actually run, as it will look at each line as the -Path value.
What you need is to get the content, select the string you're after, then filter the results with Where-Object
Get-Content "C:\temp\list\list.txt" | Select-String -Pattern "stuff" | Where-Object {$_ -notmatch ";stuff"}
You could create a more complex regex if needed, but depends on what your result data from your files looks like

Regex for multiple app versions

Im trying to get list of versions from my custom attribute in powershell script. Atrribute looks like this:
[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]
And I end up with regex like this but it does'nt work at all:
'\(\"([^\",?]*)\"+\)'
You should do this as a two-step process: First you parse out the CompatibleVersions attribute, and then you split out those version numbers. Otherwise you will have difficulties finding the version numbers individually without likely finding otheer version-like numbers.
$s = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
$versions = ($s | Select-String -Pattern 'CompatibleVersions\(([^)]+)\)' | % { $_.Matches }).Groups[1].Value
$versions.Split(',') | % { $_.Trim('"') } | Write-Host
# 1.7.1.0
# 1.7.1.1
# 1.2.2.3
Start by grabbing the parentheses pair and everything inside:
$string = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
if($string -match '\(([^)]+)\)'){
# Remove the parentheses themselves, split by comma and then trim the "
$versionList = $Matches[0].Trim("()") -split ',' |ForEach-Object Trim '"'
}
You may use
$s | select-string -pattern "\d+(?:\.\d+)+" -AllMatches | Foreach {$_.Matches} | ForEach-Object {$_.Value}
The \d+(?:\.\d+)+ pattern will match:
\d+ - 1 or more digits
(?:\.\d+)+ - 1 or more sequences of a . and 1+ digits.
See the regex demo on RegexStorm.
'"([.\d]+)"' will match any substring composed of dots and digits (\d) and comprised into double quotes (")
Try it here
A number between .. can be 0, but cannot be 00, 01 or similar.
Pay attention to the starting [
This is a regex for the check:
^\[assembly: CompatibleVersions\("(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*\)]$
Here is the regex with tests.
But if you are reading a list, you should use instead:
^\[assembly: CompatibleVersions\("((?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*)\)]$
By it you will extract the "...","..."... consequence from the inner parenthesis.
After that split the result string by '","' into a list and remove last " from the last element and the first " from the first element. Now you have list of correct versions Strings.
Alas, regex cannot create a list without split() function.

Trying to match this using regular expressions in PowerShell

I am trying to use regular expressions to match certain lines in a file, but I am having some trouble.
The file contains text like this:
Mario, 123456789
Luigi, 234-567-890
Nancy, 345 5666 77533
Bowser, 348759823745908732589
Peach, 534785
Daisy, 123-456-7890
I'm trying to match just the numbers as either XXX-XXX-XXX or XXX XXX XXX pattern.
I've tried a few different ways, but it always expects something I don't want it to or it tell me everything is false.
I'm using PowerShell to do this.
At first I tried:
{$match = $i -match "\d{3}\-\d{3}\-\d{3}|\d{3}\ \d{3}\ \d{3}"
Write-Host $match}
But when I do that it matches the long strong of numbers and XXX-XXX-XXXXX.
I read something saying that n would match the exact quantity, so I tried that...
{$match = $i -match "\d{n3}\-\d{n3}\-\d{n3}|\d{n3}\ \d{n3}\ \{n3}"
Write-Host $match}
That made everything false...
So I tried
{$match = $i -match "\d\n{3}\-\d\n{3}\-\d\n{3}|\d\n{3}\ \d\n{3}\ \d\n{3}"
I also tried the lazy quantifier, ?:
{$match = $i -match "\d{3?}\-\d{3?}\-\d{3?}|\d{3?}\ \{3?}\ \{3?}"
Write-Host $match}
Still false...
The final thing I tried was this...
{$match = $i -match "\d[0-9\{3\}\-\d[0-9]\{3\}\-\d[0-9]{3\}|\d[0-9]\{3\}\ \d[0-9]\{3}\ \d[0-9]\{3\}"<br>
Write-Host $match}
Still no luck...
The following pattern gives two matches:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}'}
Luigi, 234-567-890
Daisy,
123-456-7890
If you want to exclude the last match, add the '$' anchor (represents the end of the string:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890
If you want to be very specific and match lines from start to end (use the ^ anchor, denotes the start of the string):
Get-Content .\test.txt | Where-Object {$_ -match '^\w+,\s+\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890
Your first answer is the closest. The {3} matches exactly 3 characters. I think the n you saw was supposed to represent any number, not an actual n character. The reason it matches the long strings is that you only specified that the match must find 3 digits, dash or space, 3 digits, dash or space, then 3 more digits. You did not specify that it doesn't count if there are more digits after that.
To not match when there is a number after, you can use a negative lookahead.
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})(?!\d)
Alternatively, if you want to only match at the end of the line, possibly with trailing space
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})\s*$
As Gideon said, your first is the best place to start.
"\b\d{3}\-\d{3}\-\d{3}\b|\b\d{3}\ \d{3}\ \d{3}\b"
The \b special character added before and after each statement is a word boundary - basically a space or newline or punctuation like a period or comma. This ensures that 9999 doesn't match, but 999. does.
Try this:
/(\d+[- ])+\d+/
It's better not to have so rigid regular expressions, unless you are absolutely sure there that your input will not change.
So this regex matches at least a digit, then greedily searches for more digits followed by a space or a dash. This is also repeated as much as possible then followed by at least another digit.
When manipulating data in PowerShell, it usually is a good idea to create objects representing the data (after all, PowerShell is all about objects). Filtering based on object properties is usually easier and more robust. Your problem is a good example.
Here is what we are after:
the persons: $persons
where: where
the number of that person: $_.number
matches: -match
the pattern
starting with three digits: ^\d{3}
followed by three digits between dashes or spaces: (-\d{3}-|\ \d{3}\ )
ending on three digits: \d{3}$
Below is the entire script:
$persons = import-csv -Header "name", "number" -delimiter "," data.csv
$persons | where {$_.number -match "^\d{3}(\-\d{3}\-|\ \d{3}\ )\d{3}$"}
You can also use Select-String:
Select-String '(\d{3}[ -]){2}\d{3}$' .\file.txt | % {$_.Line}