Multiline regex to match config block - regex

I am having some issues trying to match a certain config block (multiple ones) from a file. Below is the block that I'm trying to extract from the config file:
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
There are multiple ones just like this, each with a different MAC address. How do I match a config block across multiple lines?

The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:
$fileContent = [io.file]::ReadAllText("C:\file.txt")
Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:
$fileContent = Get-Content c:\file.txt -Raw
Then you need to specify a regex option to match across line terminators i.e.
SingleLine mode (. matches any char including line feed), as well as
Multiline mode (^ and $ match embedded line terminators), e.g.
(?smi) - note the "i" is to ignore case
e.g.:
C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
Foreach {$_.Matches} | Foreach {$_.Value}
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.

In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:
$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
%{ $_.Matches } |
%{ $_.Value }
The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.
If the start of the block isn't required in the output, the updated script is:
$fileContent |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.
If $fileContent isn't required for any further processing, the variable can be eliminated:
[io.file]::ReadAllText("c:\file.txt") |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }

This regex will search for the text ap followed by any number of characters and new lines ending with a !:
(?si)(a).+?\!{1}
So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:
$Lines = Get-Content "c:\test\test.txt"
$varObjs = #()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
#Checks to make sure the line isn't blank or a !. If it is, it skips to next line
if ($Lines[$num] -match "!") {
$varLast++
continue
}
if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
$varLast++
continue
}
$Index = [array]::IndexOf($lines, $lines[$num])
$b=0
$varObj = New-Object System.Object
while ($Lines[$num + $b] -notmatch "!" ) {
#Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
$b ++
} #end While
#Adds the $varObj to $varObjs for future use
$varObjs += $varObj
$varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs

To me, a very clean and simple approach is to use a multiline bloc regex, with named captures, like this:
# Based on this text configuration:
$configurationText = #"
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
"#
# We can build a multiline regex bloc with the strings to be captured.
# Here, i am using the regex '.*?' than roughly means 'capture anything, as less as possible'
# A more specific regex can be defined for each field to capture.
# ( ) in the regex if for defining a group
# ?<> is for naming a group
$regex = #"
(?<userId>.*?) (?<userCode>.*?)
use profile (?<userProfile>.*?)
use rf-domain (?<userDomain>.*?)
hostname (?<hostname>.*?)
area (?<area>.*?)
!
"#
# Lets see if this matches !
if($configurationText -match $regex)
{
# it does !
Write-Host "Config text is successfully matched, here are the matches:"
$Matches
}
else
{
Write-Host "Config text could not be matched."
}
This script outputs the following:
PS C:\Users\xdelecroix> C:\FusionInvest\powershell\regex-capture-multiline-stackoverflow.ps1
Config text is successfully matched, here are the matches:
Name Value
---- -----
hostname ACCESSPOINT
userProfile PROFILE
userCode 00-01-23-45-67-89
area inside
userId ap71xx
userDomain DOMAIN
0 ap71xx 00-01-23-45-67-89...
For more flexibility, you can use Select-String instead of -match, but this is not really important here, in the context of this sample.

Here's my take. If you don't need the regex, you can use -like or .contains(). The question never says what the search pattern is. Here's an example with a windows text file.
$file = (get-content -raw file.txt) -replace "`r" # avoid the line ending issue
$pattern = 'two
three
f.*' -replace "`r"
# just showing what they really are
$file -replace "`r",'\r' -replace "`n",'\n'
$pattern -replace "`r",'\r' -replace "`n",'\n'
$file -match $pattern
$file | select-string $pattern -quiet

Related

Matching list of words but not when found inside longer words

I have a list of keywords (sometimes with non-alphanumeric characters) that I’d like to find in a list of files. I can do that with the code below, but I want to avoid matching keywords if they are found inside another word, e.g.:
Keywords.csv:
Keywords
Lo.rem <-- Match if not prefixed by nor suffixed with a letter
is <-- Same
simply) <-- Match if not prefixed by a letter
printing. <-- Same
(text <-- Match if not suffixed with a letter
-and <-- Same
Files.csv:
Files
C:\AFolder\aFile.txt
C:\AFolder\AnotherFolder\anotherFile.txt
C:\AFolder\anotherFile2.txt
Here's my code so far if useful:
$keywords = (((Import-Csv "C:\Keywords.csv" | Where Keywords).Keywords)-replace '[[+*?()\\.]','\$&') #Import list of keywords to search for
$paths = ((Import-Csv "C:\Files.csv" | Where Files).Files) #Import list of files to look for matching keywords
$count = 0
ForEach ($path in $paths) {
$file = [System.IO.FileInfo]$path
Add-Content -Path "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Value $file.FullName #Create a file in C:\Matches and insert the path of the file being searched
$hash = #{}
Get-Content $file |
Select-String -Pattern $keywords -AllMatches |
Foreach {$_.Matches.Value} |
%{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} | #I don't remember what this does, probably fixes error messages I was getting
Out-File -FilePath "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Append -Encoding UTF8 #Appends keywords that were found to the file created
$count = $count +1
}
I’ve tried playing with regex negative lookahead/lookbehind but did not get anywhere, especially since I’m a beginner in PowerShell, e.g.:
Select-String -Pattern "(?<![A-Za-z])$($keywords)(?![A-Za-z])" -AllMatches
Any suggestions? Much appreciated
This will escape any regex reserved characters in your keyword list, and join them with | to specify the OR condition, and then wrap them in parenthesis.
"(?<![A-Za-z])($(($keywords|%{[regex]::escape($_)}) -join '|'))(?![A-Za-z])"
That would be consumed as something like this:
"(?<![A-Za-z])(Lo\.rem|is|simply\)|printing\.|\(text|-and)(?![A-Za-z])"

Powershell - Find and Replace then Save

I need to read 10K+ files, search the files line by line, for the string of characters after the word SUFFIX. Once I capture that string I need to remove all traces of it from the file then re-save the file.
With the example below - I would capture -4541. Then I would replace all occurrences of -4541 with NULL.
Once I replace all the occurrences I then save the changes.
Here is my Data:
ABSDOMN VER 1 D SUFFIX -4541
05 ST-CTY-CDE-FMHA-4541
10 ST-CDE-FMHA-4541 9(2)
10 CTY-CDE-FMHA-4541 9(3)
05 NME-CTY-4541 X(20)
05 LST-UPDTE-DTE-4541 9(06)
05 FILLER X
Here is a starting script. I can Display the line that has the word SUFFIX but I cannot capture the string after it. In this case -4541.
$CBLFileList = Get-ChildItem -Path "C:\IDMS" -File -Recurse
$regex = "\bSUFFIX\b"
$treat = $false
ForEach($CBLFile in $CBLFileList) {
Write-Host "Processing .... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match $regex) {
Write-Host "Found Match - $_" -foregroundcolor green
$treat=$true
}
}
Try the following:
Note: Be sure to make backup copies of the input files first, as they will be updated in place. Use -Encoding with Set-Content to specify the desired encoding, if it should be different from Set-Content's default.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '(?<=SUFFIX) -\d+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
& { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
Set-Content -LiteralPath $CBLFile.FullName
}
}
Based on your feedback, the regex that worked for you in the end was (?<=SUFFIX).*$ (which could be simplified to (?<=SUFFIX).+ in this case), i.e. one that captures whatever follows substring SUFFIX, instead of only capturing a space followed by a - and one or more digits (\d+).

PowerShell to match multiple lines with regex pattern

I write a Powershell script and regex to search two configs text files to find matches for Management Vlan. For example, each text file has two Management vlan configured as below:
Config1.txt
123 MGMT_123_VLAN
234 MGMT_VLAN_234
Config2.txt
890 MGMT_VLAN_890
125 MGMT_VLAN_USERS
Below is my script. It has several problems.
First, if I ran the script with the $Mgmt_vlan = Select-String -Path $File -Pattern $String -AllMatches then the screen output shows the expected four (4) Mgmt vlan, but in the CSV file output shows as follow
Filename Mgmt_vlan
Config1.txt System.Object[]
Config2.txt System.Object[]
I ran the script the output on the console screen shows exactly four (4) Management vlans that I expected, but in the CSV file it did not. It shows only these vlans
Second, if I ran the script with $Mgmt_vlan = Select-String -Path $File -Pattern $String | Select -First 1
Then the CSV shows as follows:
Filename Mgmt_vlan
Config1.txt 123 MGMT_123_VLAN
Config2.txt 890 MGMT_VLAN_890
The second method Select -First 1 appears to select only the first match in the file. I tried to change it to Select -First 2 and then CSV shows column Mgmt_Vlan as System.Object[].
The result output to the screen shows exactly four(4) Mgmt Vlans as expected.
$folder = "c:\config_folder"
$files = Get-childitem $folder\*.txt
Function find_management_vlan($Text)
{
$Vlan = #()
foreach($file in files) {
Mgmt_Vlan = Select-String -Path $File -Pattern $Text -AllMatches
if($Mgmt_Vlan) # if there is a match
{
$Vlan += New-Object -PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = $Mgmt_vlan}
$Vlan | Select 'Filename', 'Mgmt_vlan' | export-csv C:\documents\Mgmt_vlan.csv
$Mgmt_Vlan # test to see if it shows correct matches on screen and yes it did
}
else
{
$Vlan += New-Object -PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = "Mgmt Vlan Not Found"}
$Vlan | Select 'Filename', 'Mgmt_vlan' | Export-CSV C:\Documents\Mgmt_vlan.csv
}
}
}
find_management_vlan "^\d{1,3}\s.MGMT_"
Regex correction
First of all, there are a lot of mistakes in this code.
So this is probably not code that you actually used.
Secondly, that pattern will not match your strings, because if you use "^\d{1,3}\s.MGMT_" you will match 1-3 numbers, any whitespace character (equal to [\r\n\t\f\v ]), any character (except for line terminators) and MGMT_ chars and anything after that. So not really what you want. So in your case you can use for example this: ^\d{1,3}\sMGMT_ or with \s+ for more than one match.
Code Correction
Now back to your code... You create array $Vlan, that's ok.
After that, you tried to get all strings (in your case 2 strings from every file in your directory) and you create PSObject with two complex objects. One is FileInfo from System.IO and second one is an array of strings (String[]) from System. Inside the Export-Csv function .ToString() is called on every property of the object being processed. If you call .ToString() on an array (i.e. Mgmt_vlan) you will get "System.Object[]", as per default implementation. So you must have a collection of "flat" objects if you want to make a csv from it.
Second big mistake is creating a function with more than one responsibility. In your case your function is responsible for gathering data and after that for exporting data. That's a big no no. So repair your code and move that Export somewhere else. You can use for example something like this (i used get-content, because I like it more, but you can use whatever you want to get your string collection.
function Get-ManagementVlans($pattern, $files)
{
$Vlans = #()
foreach ($file in $files)
{
$matches = (Get-Content $file.FullName -Encoding UTF8).Where({$_ -imatch $pattern})
if ($matches)
{
$Vlans += $matches | % { New-Object -TypeName PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = $_.Trim()} }
}
else
{
$Vlans += New-Object -TypeName PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = "Mgmt Vlan Not Found"}
}
}
return $Vlans
}
function Export-ManagementVlans($path, $data)
{
#do something...
$data | Select Filename,Mgmt_vlan | Export-Csv "$path\Mgmt_vlan.csv" -Encoding UTF8 -NoTypeInformation
}
$folder = "C:\temp\soHelp"
$files = dir "$folder\*.txt"
$Vlans = Get-ManagementVlans -pattern "^\d{1,3}\sMGMT_" -files $files
$Vlans
Export-ManagementVlans -path $folder -data $Vlans```
Summary
But in my opinion in this case is overprogramming to create something like you did. You can easily do it in oneliner (but you didn't have information if the file doesn't include anything). The power of powershell is this:
$pattern = "^\d{1,3}\s+MGMT_"
$path = "C:\temp\soHelp\"
dir $path -Filter *.txt -File | Get-Content -Encoding UTF8 | ? {$_ -imatch $pattern} | select #{l="FileName";e={$_.PSChildName}},#{l="Mgmt_vlan";e={$_}} | Export-Csv -Path "$path\Report.csv" -Encoding UTF8 -NoTypeInformation
or with Select-String:
dir $path -Filter *.txt -File | Select-String -Pattern $pattern -AllMatches | select FileName,#{l="Mgmt_vlan";e={$_.Line}} | Export-Csv -Path "$path\Report.csv" -Encoding UTF8 -NoTypeInformation

Powershell Return the highest 4 Digit Number Found in a String Pattern - Search Word Documents

I am trying to return the highest 4 digit number found in string pattern, in a set of documents.
String Pattern: 3 Letters dash 4 Digits
The word documents contain within them a document identifier code such as below.
Sample Files:
Car Parts.docx > CPW - 2345
CarHandles.docx > CPW - 8723
CarList.docx > CPA - 9083
I have referenced sample code that I am trying to adapt. I am not a VBA or powershell programmer - so I may be wrong in what I am trying to do?
I am happy to look at alternatives - on a Windows platform.
I have referenced this to get me started
http://chris-nullpayload.rhcloud.com/2012/07/find-and-replace-string-in-all-docx-files-recursively/
PowerShell: return the number of instances find in a file for a search pattern
Powershell: return filename with highest number
$list = gci "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
foreach ($foo in $list) {
$objWord = New-Object -ComObject word.application
$objWord.Visible = $False
$objDoc = $objWord.Documents.Open("$foo")
$objSelection = $objWord.Selection
$Pat1 = [regex]'[A-Z]{3}-[0-9]{4}' # Find the regex match 3 letters followed by 4 numbers eg HGW - 1024
$findtext= "$Pat1"
$highestNumber =
# Find the highest occurrence of this pattern found in the documents searched - output to text file or on screen
Sort-Object | # This may also be wrong -I added it for when I find the pattern
Select-Object -Last 1 -ExpandProperty Name
<# The below may not be needed - ?
$ReplaceText = ""
$ReplaceAll = 2
$FindContinue = 1
$MatchFuzzy = $False
$MatchCase = $False
$MatchPhrase = $false
$MatchWholeWord = $True
$MatchWildcards = $True
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False
$objSelection.Find.execute(
$FindText,
$MatchCase,
$MatchWholeWord,
$MatchWildcards,
$MatchSoundsLike,
$MatchAllWordForms,
$Forward,
$Wrap,
$Format,
$ReplaceText,
$ReplaceAll
}
}
#>
I appreciate any advice on how to proceed -
Try this:
# This library is needed to extact zip archives. A .docx is a zip archive
# .NET 4.5 or later is requried
Add-Type -AssemblyName System.IO.Compression.FileSystem
# This function gets plain text from a word document
# adapted from http://stackoverflow.com/a/19503654/284111
# It is not ideal, but good enough
function Extract-Text([string]$fileName) {
#Generate random temporary file name for text extaction from .docx
$tempFileName = [Guid]::NewGuid().Guid
#Extract document xml into a variable ($text)
$entry = [System.IO.Compression.ZipFile]::OpenRead($fileName).GetEntry("word/document.xml")
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($entry,$tempFileName)
$text = [System.IO.File]::ReadAllText($tempFileName)
Remove-Item $tempFileName
#Remove actual xml tags and leave the text behind
$text = $text -replace '</w:r></w:p></w:tc><w:tc>', " "
$text = $text -replace '</w:r></w:p>', "`r`n"
$text = $text -replace "<[^>]*>",""
return $text
}
$fileList = Get-ChildItem "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
# Adapted from http://stackoverflow.com/a/36023783/284111
$fileList |
Foreach-Object {[regex]::matches((Extract-Text $_), '(?<=[A-Za-z]{3}\s*(?:-|–)\s*)\d{4}')} |
Select-Object -ExpandProperty captures |
Sort-Object value -Descending |
Select-Object -First 1 -ExpandProperty value
The main idea behind this is not to monkey around the COM api for Word, but instead just try and extract the text information from the document manually.
The way to get the highest number is first isolate it using a regex and then sort and select the first item. Something like this:
[regex]::matches($objSelection, '(?<=[A-Z]{3}\s*-\s*)\d{4}') `
| Select -ExpandProperty captures `
| sort value -Descending `
| Select -First 1 -ExpandProperty value `
| Add-Content outfile.txt
I think the problem you are having with your regex is that your example data contains spaces around the dash in the code which haven't allowed for in your pattern.

Search mutiple words using regular expression in powershell

I am new to powershell. I highly appreciate any help you can provide for the below. I have a powershell script but not being able to complete to get all the data fields from the text file.
I have a file 1.txt as below.
I am trying to extract output for "pid" and "ctl00_lblOurPrice" from the file in table format below so that I can get open this in excel. Column headings are not important. :
pid ctl00_lblOurPrice
0070362408 $6.70
008854787666 $50.70
Currently I am only able to get pid as below. Would like to also get the price for each pid. -->
0070362408
008854787666
c:\scan\1.txt:
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=0070362408'>
This is sentence B1.. This is sentence B2... This is sentence B3...
GFGFGHHGH
HHGHGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$6.70
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=008854787666'>
This is sentence B1.. This is sentence B2... This is sentence B3...
6GBNGH;L
887656HGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$50.70
...
...
Current powershell script:
$files=Get-ChildItem c:\scan -recurse
$output_file = ‘c:\output\outdata.txt’
foreach ($file in $files) {
$input_path = $file
$regex = ‘num=\d{1,13}’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % {
($_.Value) -replace "num=","" } | Out-File $output_file -Append }
Thanks in advance for your help
I'm going to assume that you either mean pid=\d{1,13} in your code, or that your sample text should have read num= instead of pid=. We will go with the assumption that it is in fact supposed to be pid.
In that case we will turn the entire file into one long string with -Join "", and then split it on "href" to create records for each site to parse against. Then we match for pid= and ending when it comes across a non-numeric character, and then we look for a dollar amount (a $ followed by numbers, followed by a period, and then two more numbers).
When we have a pair of PID/Price matches we can create an object with two properties, PID and Price, and output that. For this I will assign it to an array, to be used later. If you do not have PSv3 or higher you will have to change [PSCustomObject][ordered] into New-Object PSObject -Property but that loses the order of properties, so I like the former better and use it in my example here.
$files=Get-ChildItem C:\scan -recurse
$output_file = 'c:\output\outdata.csv'
$Results = #()
foreach ($file in $files) {
$Results += ((gc $File) -join "") -split "href" |?{$_ -match "pid=(\d+?)[^\d].*?(\$\d*?\.\d{2})"}|%{[PSCustomObject][ordered]#{"PID"=$Matches[1];"Price"=$Matches[2]}}
}
$Results | Select PID,Price | Export-Csv $output_file -NoTypeInformation