Append character after each instance of a regex match in PowerShell - regex

I was wondering if it was possible to append a character (to be used as a delimiter later on) to each instance of a regex match in a string.
I'm parsing text for a string between < >, and have a working regex pattern -- though this collapses each instance of the match.
What I would like to do is append each instance of a match with a , so I can call the .split(',') method later on and have a collection of string I can loop through.
$testString = "<blah#gmail.com><blah1#gmail.com>"
$testpattern = [regex]::Match($testString, '(?<=<)(.*)(?=>)').Value
$testPattern will now be "blah#gmail.combblah1#gmail.com"
What I would like to is to add a delimiter between each instance of the match, to call the .split() method to work with a collection after the fact.

$testpattern is blah#gmail.com><blah1#gmail.com
You should use <(.*)><(.*)> to keep both email address and then concatenate both strings: $testpattern = $testpattern[0] + "your string you want inbetween" + $testpattern[1]
Not sure about 0 and 1, depends on the language.
Another point, be carefull, if there are some spaces or invalid characters for email, it'll still capture them. you should use something like <([a-zA-Z0-9\-#\._]*\#[a-zA-Z0-9-]*\.[a-z-A-Z]*)><([a-zA-Z0-9\-#\._]*\#[a-zA-Z0-9-]*\.[a-z-A-Z]*)>

I know this isn't the only way to handle the problem above, and definitely not the most efficient -- but I ended up doing the following.
So to restate the question, I need to parse email headers (to line), for all the smtp addresses (value between '<' and '>'), and store all the addresses in a collection after the fact.
$EMLToCol = #()
$parseMe = $CDOMessage.to
# select just '<emailAddress>'
$parsed = Select-String -Pattern '(<.*?>)+' -InputObject $parseMe -AllMatches | ForEach-Object { $_.matches }
# remove this guy '<', and this guy '>'
$parsed = $parsed.Value | ForEach-Object {$_ -replace '<' -replace '>'}
# add to EMLToCol array
$parsed | ForEach-Object {$EMLToCol += $_}

Related

Select-String: match a string only if it isn't preceded by a specific character

I have a list of files that contain either of the two strings:
"stuff" or ";stuff"
I'm trying to write a PowerShell Script that will return only the files that contain "stuff". The script below currently returns all the files because obviously "stuff" is a substring of ";stuff"
For the life of me, I cannot figure out how to only matches file that contain "stuff", without a preceding ;
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern "stuff" -SimpleMatch $_ }
Note: C:\temp\list\list.txt contains a list of file paths that are each passed to Select-String.
Thanks for the help.
You cannot perform the desired matching with literal substring searches (-SimpleMatch).
Instead, use a regex with a negative look-behind assertion ((?<!..)) to rule out stuff substrings preceded by a ; char.: (?<!;)stuff
Applied to your command:
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern '(?<!;)stuff' -LiteralPath $_ }
Regex pitfalls:
It is tempting to use [^;]stuff instead, using a negated (^) character set ([...]) (see this answer); however, this will not work as expected if stuff appears at the very start of a line, because a character set - whether negated or not - only matches an actual character, not the start-of-the-line position.
It is then tempting to apply ? to the negated character set (for an optional match - 0 or 1 occurrence): [^;]?stuff. However, that would match a string containing ;stuff again, given that stuff is technically preceded by a "0-repeat occurrence" of the negated character set; thus, ';stuff' -match '[^;]?stuff' yields $true.
Only a look-behind assertion works properly in this case - see regular-expressions.info.
To complement #mklement0's answer, I suggest an alternative approach to make your code easier to read and understand:
#requires -Version 4
#(Get-Content -Path 'C:\Temp\list\list.txt').
ForEach([IO.FileInfo]).
Where({ $PSItem | Select-String -Pattern '(?<!;)stuff' -Quiet })
This will turn your strings into objects (System.IO.FilePath) and utilizes the array functions ForEach and Where for brevity/conciseness. Further, this allows you to pipe the paths as objects which will be accepted by the -Path parameter into Select-String to make it more understandable (I find long lists of parameter sets difficult to read).
The example code posted won't actually run, as it will look at each line as the -Path value.
What you need is to get the content, select the string you're after, then filter the results with Where-Object
Get-Content "C:\temp\list\list.txt" | Select-String -Pattern "stuff" | Where-Object {$_ -notmatch ";stuff"}
You could create a more complex regex if needed, but depends on what your result data from your files looks like

How to use regex to remove everything except certain "key"/"character containing"

Running my code gives me this output in a txt file:
19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at ASSOS-032DEEE8EB423.local.:80 (interface 1)
So I just want to parse out string "ASSOS-032DEEE8EB423.local" and remove everything else from the txt file. I can't figure out how to use regex to do so to remove everything except string containing ASSOS-. So the thing is that the string will always contain ASSOS- but the rest is always changing to different numbers. So I'm trying to always be able to get ASSOS-XXXXXXXXXXX.local
This is how I'm trying to do:
$string = 'Get-Content C:\MyFile.Txt'
$pattern = ''
$string -replace $pattern, ' '
It's just that I don't know so much about regex and how to write it to parse out string containing "ASSOS-" and remove everything after ASSOS-XXXXXXXXXXX.local
I would pipe the file content to Select-String and return the values of matches for a string starting with "ASSOS-", ending with "local" and having whatever non-whitespace characters in between:
Get-Content test.txt | Select-String -Pattern "ASSOS-\S*local" | ForEach-Object {$_.Matches.Value}
A possible solution:
$str = "19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at **ASSOS-032DEEE8EB423.local**.:80 (interface 1)"
$str -replace '.*\*\*(.*?)\*\*.*', '$1'
The RegEx .*\*\*(.*?)\*\*.* captures all characters within **...**. The * have to be escaped by a \ to make it work.

match regex and replace bug with special charakters

I've built a script to read all Active Directory Group Memberships and save them to a file.
Problem is, the Get-ADPrincipalGroupMembership cmdlet outputs all groups like this:
CN=Group_Name,OU=Example Mail,OU=Example Management, DC=domain,DC=de
So I need to do a bit of a regex and/or replacement magic here to replace the whole line with just the first string beginning from "CN=" to the first ",".
The result would be like this:
Group_Name
So, there is one AD group that's not gonna be replaced. I already got an idea why tho, but I don't know how to work around this. In our AD there is a group with a special character, something like this:
CN=AD_Group_Name+up,OU=Example Mail,OU=Example Management, DC=domain,DC=de
So, because of the little "+" sign, the whole line doesn't even get touched.
Does anyone know why this is happening?
Import-Module ActiveDirectory
# Get Username
Write-Host "Please enter the Username you want to export the AD-Groups from."
$UserName = Read-Host "Username"
# Set Working-Dir and Output-File Block:
$WorkingDir = "C:\Users\USER\Desktop"
Write-Host "Working directory is set to " + $WorkingDir
$OutputFile = $WorkingDir + "\" + $UserName + ".txt"
# Save Results to File
Get-ADPrincipalGroupMembership $UserName |
select -Property distinguishedName |
Out-File $OutputFile -Encoding UTF8
# RegEx-Block to find every AD-Group in Raw Output File and delete all
# unnaccessary information:
[regex]$RegEx_mark_whole_Line = "^.*"
# The ^ matches the start of a line (in Ruby) and .* will match zero or more
# characters other than a newline
[regex]$RegEx_mark_ADGroup_Name = "(?<=CN=).*?(?=,)"
# This regex matches everything behind the first "CN=" in line and stops at
# the first "," in the line. Then it should jump to the next line.
# Replace-Block (line by line): Replace whole line with just the AD group
# name (distinguishedName) of this line.
foreach ($line in Get-Content $OutputFile) {
if ($line -like "CN=*") {
$separator = "CN=",","
$option = [System.StringSplitOptions]::RemoveEmptyEntries
$ADGroup = $line.Split($separator, $option)
(Get-Content $OutputFile) -replace $line, $ADGroup[0] |
Set-Content $OutputFile -Encoding UTF8
}
}
Your group name contains a character (+) that has a special meaning in a regular expression (one or more times the preceding expression). To disable special characters escape the search string in your replace operation:
... -replace [regex]::Escape($line), $ADGroup[0]
However, I fail to see what you need that replacement for in the first place. Basically you're replacing a line in the output file with a substring from that line that you already extracted before. Just write that substring to the output file and you're done.
$separator = 'CN=', ','
$option = [StringSplitOptions]::RemoveEmptyEntries
(Get-Content $OutputFile) | ForEach-Object {
$_.Split($separator, $option)[0]
} | Set-Content $OutputFile
Better yet, use the Get-ADObject cmdlet to expand the names of the group members:
Get-ADPrincipalGroupMembership $UserName |
Get-ADObject |
Select-Object -Expand Name
First off, depending on what you're doing here this might or might not be a good idea. The CN is /not/ immutable so if you're storing it somewhere as a key you're likely to run into problems down the road. The objectGUID property of the group is a good primary key, though.
As far as getting this value, I think you can simplify this a lot. The name property that the cmdlet outputs will always have your desired value:
Get-ADPrincipalGroupMembership <username> | select name
Ansgar's answer is much better in terms of using the regex, but I believe that in this case you could do a dirty workaround with the IndexOf function. In your if-statement you could do the following:
if ($line -like "CN=*") {
$ADGroup = $line.Substring(3, $line.IndexOf(',')-3)
}
The reason this works here is that you know the output will begin with CN=YourGroupName meaning that you know that the string you want begins at the 4th character. Secondly, you know that the group name will not contain any comma, meaning that the IndexOf(',') will always find the end of that string so you don't need to worry about the nth occurrence of a string in a string.

How to extract sub-pattern from regex match

when I run the below regex match command, either:
'abc123' -match '(\d+)|(\w+)|(abc123)|(25)'
or
[regex]::matches('abc123', '(\d+)|(\w+)|(abc123)|(25)')
is there a way for me to extract the matching sub-pattern? In this case it would be the third capture block: 'abc123'
You can't get the exact regex part that matched your string as far as I'm aware, if you use a smart constructor for the Regex you can easily automate it though.
$ToMatch = 'abc123FOO'
$PossibleMatches = #('\d+','\w+','abc123.+','25')
$JoinOn = ')|('
$Regex = "($($PossibleMatches -join $JoinOn))"
$CaughtGroup = [Regex]::Matches($ToMatch,$Regex).Groups | ? {$_.Success -and $_.Name -ne '0'}
$CaughtIndex = [int]$CaughtGroup.Name
$CaughtMatch = $PossibleMatches[$CaughtIndex]
"Matched Group $($CaughtIndex) '$($CaughtMatch)'"
will give you
Matched Group 2 'abc123.+'
if this isn't ok for you (i.e. you have wildly varied regex etc.) you might want to do break up the program flow and try match it against an array of possible ones first?

Regex for multiple app versions

Im trying to get list of versions from my custom attribute in powershell script. Atrribute looks like this:
[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]
And I end up with regex like this but it does'nt work at all:
'\(\"([^\",?]*)\"+\)'
You should do this as a two-step process: First you parse out the CompatibleVersions attribute, and then you split out those version numbers. Otherwise you will have difficulties finding the version numbers individually without likely finding otheer version-like numbers.
$s = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
$versions = ($s | Select-String -Pattern 'CompatibleVersions\(([^)]+)\)' | % { $_.Matches }).Groups[1].Value
$versions.Split(',') | % { $_.Trim('"') } | Write-Host
# 1.7.1.0
# 1.7.1.1
# 1.2.2.3
Start by grabbing the parentheses pair and everything inside:
$string = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
if($string -match '\(([^)]+)\)'){
# Remove the parentheses themselves, split by comma and then trim the "
$versionList = $Matches[0].Trim("()") -split ',' |ForEach-Object Trim '"'
}
You may use
$s | select-string -pattern "\d+(?:\.\d+)+" -AllMatches | Foreach {$_.Matches} | ForEach-Object {$_.Value}
The \d+(?:\.\d+)+ pattern will match:
\d+ - 1 or more digits
(?:\.\d+)+ - 1 or more sequences of a . and 1+ digits.
See the regex demo on RegexStorm.
'"([.\d]+)"' will match any substring composed of dots and digits (\d) and comprised into double quotes (")
Try it here
A number between .. can be 0, but cannot be 00, 01 or similar.
Pay attention to the starting [
This is a regex for the check:
^\[assembly: CompatibleVersions\("(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*\)]$
Here is the regex with tests.
But if you are reading a list, you should use instead:
^\[assembly: CompatibleVersions\("((?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*)\)]$
By it you will extract the "...","..."... consequence from the inner parenthesis.
After that split the result string by '","' into a list and remove last " from the last element and the first " from the first element. Now you have list of correct versions Strings.
Alas, regex cannot create a list without split() function.