PowerShell Select Query With Regex Pattern as sub-expression - regex

I'm trying to do a semi-one liner to replace the contents of a partial AD Distinguished Name column with the users actual name.
IE:
Pattern in $_.identity.DistinguishedName
CN=Touchdown§3939303030313134393535383932,CN=ExchangeActiveSyncDevices,CN=Guy\, Some,OU=Employees,OU=Departments and Categories,DC=something,DC=com
One Liner That doesn't work
$devices | Select #{N="Name";E={ (Get-AdUser -Identity ($_.Identity.DistinguishedName -match ".*\,CN=ExchangeActiveSyncDevices\,(.*)" | Out-Null; $Matches[1])).Name }}
This alone works....
$devices[0].Identity.DistinguishedName -match ".*\,CN=ExchangeActiveSyncDevices\,(.*)" | Out-Null; $Matches[1]
And Displays...
CN=Guy\, Some,OU=Employees,OU=Departments and Categories,DC=something,DC=com
This also works, which is similar to what i'm trying to achieve, but doesn't allow me to take the DistinguishedName and go lookup the actual name.
$devices | Select #{N="Name";E={ $_.Identity.DistinguishedName -match ".*\,CN=ExchangeActiveSyncDevices\,(.*)" | Out-Null; $Matches[1] }}
As soon as you try to do this it breaks down because i'm assuming your not allowed to use a ; to break to the next command when feeding that identity parameter in Get-ADUser.
Get-ADUser -Identity ($devices[0].Identity.DistinguishedName -match ".*\,CN=ExchangeActiveSyncDevices\,(.*)" | Out-Null; $Matches[1])
How would one accomplish this by using a select expression without having to populate a whole new separate variable and replace the original contents with the modified contents?

I don't understand why you're piping to Out-Null. If I understand your question, you're looking for a way to extract substrings from a regular expression? You can use Select-String; for example:
"Hello, world" | Select-String '^\S+, (\S+)' | ForEach-Object {
$_.Matches[0].Groups[1].Value
}
# outputs "world"
Bill

Related

PowerShell Replace Syntax: Regex Substitution and Environmental Variable

I am trying to substitute both regex and an environment variable and can't find the correct syntax (because of the mismatch of single and double quotes). The short script I am developing will rename files. Here is what my setup looks like a few of the ways I tried.
# Original File Name: (BRP-01-001-06K48b-SC-CC-01).tif
# Desired File Name: (BRP-12-345-06K48b-SC-CC-01).tif
# Variables defined by user:
PS ..\user> $pic,$change="01-001","12-345"
# The problem is with the "-replace" near the end of the command
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", '$1$change$3'); echo $new}
PS ..\user> (BRP-$change-06K48b-SC-CC-01).tif
# Also tried:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1$change`$3"); echo $new}
PS ..\user> $112-345-06K48b-SC-CC-01).tif
# If I put a space before $change:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1 $change`$3"); echo $new}
PS ..\user> (BRP- 12-345-06K48b-SC-CC-01).tif
In the last example it "works" if I add space before $change ... but I do not want the space. I realize I could do another replace operation to fix the space but I would like to do this all in one command if possible.
What syntax do I need to replace using both environment variables and regex substitutions?
Also, out of curiosity, once working, will this replace all occurrences within a file name or just the first. For instance, will the file:
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"
change to:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-12-345-06K48b-SC-OR-01)"
or only the first match, like:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"
Best practice is surrounding your capture group name in {} or using named capture groups within your substitution string. Using {} with your second example, should work out nicely.
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)" -replace "(\(BRP-)($pic)(-.{15}\))", "`${1}$change`${3}"
When PowerShell variables, capture groups, and string literals are in the replacement string, you can't use surrounding single quotes. Using surrounding double quotes allows inner expansion to happen. As a result, you will need to backtick escape $ used to identify capture groups.
Your second example has the proper syntax, typically, but because $change begins with digits, it creates unintended consequences. You are escaping $ in the substitution string to use capture groups 1 and 3. Since $change evaluates to 12-345, the intended capture group 1 is actually capture group 112, which doesn't exist. See below for an illustration of your second attempt:
"(\(BRP-)($pic)(-.{15}\))":
Capture Group 1: (BRP-
Capture Group 2: 01-001
Capture Group 3: -06K48b-SC-CC-01)
"`$1$change`$3" at runtime becomes $112-345$3 and then becomes $112-345-06K48b-SC-CC-01). Notice that $112 has been interpolated before the capture groups are substituted. Then capture group 112 is checked. Since it does not exist, $112 is just assumed to be a string.
The below might be what you are after,
$pic = "01-001"
$change = "12-345"
$RenameFiles_FilterStr = "*BRP-$pic*.tif"
gci $RenameFiles_FilterStr -recurse | % { $_.BaseName -replace $pic,$change }
# The above returns renamed strings (files not renamed yet). If the expected result matches the returned ones, then uncomment the below and run to rename the files
# gci $RenameFiles_FilterStr -recurse | % { Rename-Item -NewName ($_.BaseName -replace $pic,$change) }

Keep first regex match and discard others

Yep another regex question... I am using PowerShell to extract a simple number from a filename when looping through a folder like so:
# sample string "ABCD - (123) Sample Text Here"
Get-ChildItem $processingFolder -filter *.xls | Where-Object {
$name = $_.Name
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name) | { $_.Groups[1].Value }
}
All I am looking for is the number surrounded by brackets. This is successful, but it appears the $_.Name actually grabs more than just the name of the file, and the regex ends up picking up some other bits I don't want.
I understand why, as it's going through each regex match as an object and taking the value out of each and putting in $metric. I need some help editing the code so it only bothers with the first object.
I would just use -match etc if I wasn't bothered with the actual contents of the match, but it needs to be kept.
I don't see a cmdlet call before $_.Groups[1].Value which should be ForEach-Object but that is a minor thing. We need to make a small improvement on your regex pattern as well to account for the brackets but not include them in the return.
$processingFolder = "C:\temp"
$pattern = '\((\d+)\)'
Get-ChildItem $processingFolder -filter "*.xls" | ForEach-Object{
$details = ""
if($_.Name -match $pattern){$details = $matches[1]}
$_ | Add-Member -MemberType NoteProperty -Name Details -Value $details -PassThru
} | select name, details
This will loop all the files and try and match numbers in brackets. If there is more than one match it should only take the first one. We use a capture group in order to ignore the brackets in the results. Next we use Add-Member to make a new property called Details which will contain the matched value.
Currently this will return all files in the $processingFolder but a simple Where-Object{$_.Details} would return just the ones that have the property populated. If you have other properties that you need to make you can chain the Add-Members together. Just don't forget the -passthru.
You could also just make your own new object if you need to go that route with multiple custom parameters. It certainly would be more terse. That last question I answered has an example of that.
After doing some research in to the data being returned itself (System.Text.RegularExpressions.MatchCollection) I found the Item method, so called that on $metric like so:
$name = '(111) 123 456 789 Name of Report Here 123'
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name)
Write-Host $metric.Item(1)
Whilst probably not the best approach, it returns what I'm expecting for now.

Retain carriage returns in text filtered through a regular expression

I need to search though a folder of logs and retrieve the most recent logs. Then I need to filter each log, pull out the relevant information and save to another file.
The problem is the regular expression I use to filter the log is dropping the carriage return and the line feed so the new file just contains a jumble of text.
$Reg = "(?ms)\*{6}\sBEGIN(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+"
get-childitem "logfolder" -filter *.log |
where-object {$_.LastAccessTime -gt [datetime]$Test.StartTime} |
foreach {
$a=get-content $_;
[regex]::matches($a,$reg) | foreach {$_.groups[0].value > "MyOutFile"}
}
Log structure:
******* BEGIN MESSAGE *******
<Info line 1>
Date 18.03.2010 15:07:37 18.03.2010
<Info line 2>
File Number: 00000003
<Info line 3>
*Variable number of lines*
******* END MESSAGE *******
Basically capture everything between the BEGIN and END where the dates and file numbers are a certain value. Does anyone know how I can do this without losing the line feeds? I also tried using Out-File | Select-String -Pattern $reg, but I've never had success with using Select-String on a multiline record.
As #Matt pointed out, you need to read the entire file as a single string if you want to do multiline matches. Otherwise your (multiline) regular expression would be applied to single lines one after the other. There are several ways to get the content of a file as a single string:
(Get-Content 'C:\path\to\file.txt') -join "`r`n"
Get-Content 'C:\path\to\file.txt' | Out-String
Get-Content 'C:\path\to\file.txt' -Raw (requires PowerShell v3 or newer)
[IO.File]::ReadAllText('C:\path\to\file.txt')
Also, I'd modify the regular expression a little. Most of the time log messages may vary in length, so matching fixed lengths may fail if the log message changes. It's better to match on invariant parts of the string and leave the rest as variable length matches. And personally I find it a lot easier to do this kind of content extraction in several steps (makes for simpler regular expressions). In your case I would first separate the log entries from each other, and then filter the content:
$date = [regex]::Escape('13.06.2015')
$fnum = '00000003'
$re1 = "(?ms)\*{7} BEGIN MESSAGE \*{7}\s*([\s\S]*?)\*{7} END MESSAGE \*{7}"
$re2 = "(?ms)[\s\S]*?Date\s+$date[\s\S]*?File Number:\s+$fnum[\s\S]*"
Get-ChildItem 'C:\log\folder' -Filter '*.log' | ? {
$_.LastAccessTime -gt [DateTime]$Test.StartTime
} | % {
Get-Content $_.FullName -Raw |
Select-String -Pattern $re1 -AllMatches |
select -Expand Matches |
% {
$_.Groups[1].Value |
Select-String -Pattern $re2 |
select -Expand Matches |
select -Expand Groups |
select -Expand Value
}
} | Set-Content 'C:\path\to\output.txt'
BTW, don't use the redirection operator (>) inside a loop. It would overwrite the output file's content with each iteration. If you must write to a file inside a loop use the append redirection operator instead (>>). However, performance-wise it's usually better to put writing to output files at the end of the pipeline (see above).
Wanted to see if I could make that regex better but for now if you are using those regex modes you should be reading your text file in as a single string which helps a lot.
$a=get-content $_ -Raw
or if you don't have PowerShell 3.0
$a=(get-content $_) -join "`r`n"
I had to solve the problem of disappearing newlines in a completely different context. What you get when you do a get-content of a text file is an array of records, where each record is a line of text.
The only way I found to put the newline back in after some transformation was to use the automatic variable $OFS (output field separator). The default value is space, but if you set it to carriage return line feed, then you get separate records on separate lines.
So try this (it might work):
$OFS = "`r`n"

grep string between two other strings as delimiters

I have to do a report on how many times a certain CSS class appears in the content of our pages (over 10k pages). The trouble is, the header and footer contains that class, so a grep returns every single page.
So, how do I grep for content?
EDIT: I am looking for if a page has list-unstyled between <main> and </main>
So do I use a regular expression for that grep? or do I need to use PowerShell to have more functionality?
I have grep at my disposal and PowerShell, but I could use a portable software if that is my only option.
Ideally, I would get a report (.txt or .csv) with pages and line numbers where the class shows up, but just a list of the pages themselves would suffice.
EDIT: Progress
I now have this in PowerShell
$files = get-childitem -recurse -path w:\test\york\ -Filter *.html
foreach ($file in $files)
{
$htmlfile=[System.IO.File]::ReadAllText($file.fullName)
$regex="(?m)<main([\w\W]*)</main>"
if ($htmlfile -match $regex) {
$middle=$matches[1]
[regex]::Matches($middle,"list-unstyled")
Write-Host $file.fullName has matches in the middle:
}
}
Which I run with this command .\FindStr.ps1 | Export-csv C:\Tools\text.csv
it outputs the filename and path with string in the console, put does not add anything to the CSV. How can I get that added in?
What Ansgar Wiechers' answer says is good advice. Don't string search html files. I don't have a problem with it but it is worth noting that not all html files are the same and regex searches can produce flawed results. If tools exists that are aware of the file content structure you should use them.
I would like to take a simple approach that reports all files that have enough occurrences of the text list-unstyled in all html files in a given directory. You expect there to be 2? So if more than that show up then there is enough. I would have done a more complicated regex solution but since you want the line number as well I came up with this compromise.
$pattern = "list-unstyled"
Get-ChildItem C:\temp -Recurse -Filter *.html |
Select-String $pattern |
Group-Object Path |
Where-Object{$_.Count -gt 2} |
ForEach-Object{
$props = #{
File = $_.Group | Select-Object -First 1 -ExpandProperty Path
PatternFound = ($_.Group | Select-Object -ExpandProperty LineNumber) -join ";"
}
New-Object -TypeName PSCustomObject -Property $props
}
Select-String is a grep like tool that can search files for string. It reports the located line number in the file which I why we are using it here.
You should get output that looks like this on your PowerShell console.
File PatternFound
---- ------------
C:\temp\content.html 4;11;54
Where 4,11,54 is the lines where the text was found. The code filters out results where the count of lines is less than 3. So if you expect it once in the header and footer those results should be excluded.
You can create a regexp that will be suitable for multiline match. The regexp "(?m)<!-- main content -->([\w\W]*)<!-- end content -->" matches a multiline content delimited by your comments, with (?m) part meaning that this regexp has multiline option enabled. The group ([\w\W]*) matches everything between your comments, and also enables you to query $matches[1] which will contain your "main text" without headers and footers.
$htmlfile=[System.IO.File]::ReadAllText($fileToGrep)
$regex="(?m)<!-- main content -->([\w\W]*)<!-- end content -->"
if ($htmlfile -match $regex) {
$middle=$matches[1]
[regex]::Matches($middle,"list-unstyled")
}
This is only an example of how should you parse the file. You populate $fileToGrep with a file name which you desire to parse, then run this snippet to receive a string that contains all the list-unstyled strings in the middle of that file.
Don't use string matches for something like this. Analyze the DOM instead. That should allow you to exclude headers and footers by selecting the appropriate root element.
$ie = New-Object -COM 'InternetExplorer.Application'
$url = '...'
$classname = 'list-unstyled'
$ie.Navigate($url)
do { Start-Sleep -Milliseconds 100 } until ($ie.ReadyState -eq 4)
$root = $ie.Document.getElementsById('content-element-id')
$hits = $root.getElementsByTagName('*') | ? { $_.ClassName -eq $classname }
$hits.Count # number of occurrences of $classname below content element

Using powershell, in a csv doc, need to iterate and insert a character

So my csv file looks something like:
J|T|W
J|T|W
J|T|W
I'd like to iterate through, most likely using a regex so that after the two pipes and content \|.+{2}, and insert a tab character `t.
I'm assuming I'd use get-content to loop through, but I'm unsure of where to go from there.
Also...just thought of this, it is possible that the line will overrun to the next line, and therefore the two pipes will be on different lines, which I'm pretty sure makes a difference.
-Thanks
Ok, I'll move the comment discussion to an answer since it seems like it is a potentially valid solution:
Import-csv .\test.csv -Delimiter '|' -Header 'One', 'two', 'three' | %{$_.Three = "`t$($_.Three)"; $_} | Export-CSV .\test_result.cs
This works for a file that is known to have 3 fields. For a more generic solution, if you have the ability to determine the number of fields initially being exported to CSV, then:
Import-csv .\test.csv -Delimiter '|' -Header (1..$fieldCount) | %{$_.$fieldCount = "`t$($_.$fieldCount)"; $_} | Export-CSV .\test_result.cs
In PowerShell you can use the -replace operator with a regex e.g.:
$c = Get-Content foo.csv | Foreach {$_ -replace '<regex_here>','new_string'}
$c | Out-File foo.csv -encoding ascii
Note that in new_string you can refer to capture groups using $1 but you'll want to put that string in single quotes so PowerShell won't try to interpret $1 as a variable reference.