I am trying to add quote characters around two fields in a file of comma separated lines. Here is one line of data:
1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
which I would like to become this:
1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
I began developing my regular expression in a simple PowerShell script, and soon I have the following:
$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew
which gives me this output:
1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
Great! I'm all set. Extend this example to the general case of a file of similar lines of data:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}
This is a listing of test_data.csv:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
This is the output of my script:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
I have also tried this version of the script:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}
and obtained the same results.
My simple test script has convinced me that the regex is correct, but something happens when I use that regex inside a filter script in the Where-Object cmdlet.
What simple, yet critical, detail am I overlooking here?
Here is my PSVerion:
Major Minor Build Revision
----- ----- ----- --------
5 0 10586 117
You're misunderstanding how Where-Object works. The cmdlet outputs those input lines for which the -FilterScript expression evaluates to $true. It does NOT output whatever you do inside that scriptblock (you'd use ForEach-Object for that).
You don't need either Where-Object or ForEach-Object, though. Just put Get-Content in parentheses and use that as the first operand for the -replace operator. You also don't need the 4th capturing group. I would recommend anchoring the expression at the beginning of the string, though.
(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'
This seems to work here. I used ForEach-Object to process each record.
Get-Content test_data.csv |
ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }
This also seems to work. Uses the ? to create a reluctant (lazy) capture.
Get-Content test_data.csv |
ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }
I would just make a small change to what you have in order for this to work. Simply change the script to the following, noting that I changed the -FilterScript to a ForEach-Object and fixed a minor typo that you had on the last item in the regular expression with the quotes:
Get-Content c:\temp\test_data.csv | ForEach-Object {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}
I tested this with the data you provided and it adds the quotes to the correct columns.
I am having some issues trying to match a certain config block (multiple ones) from a file. Below is the block that I'm trying to extract from the config file:
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
There are multiple ones just like this, each with a different MAC address. How do I match a config block across multiple lines?
The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:
$fileContent = [io.file]::ReadAllText("C:\file.txt")
Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:
$fileContent = Get-Content c:\file.txt -Raw
Then you need to specify a regex option to match across line terminators i.e.
SingleLine mode (. matches any char including line feed), as well as
Multiline mode (^ and $ match embedded line terminators), e.g.
(?smi) - note the "i" is to ignore case
e.g.:
C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
Foreach {$_.Matches} | Foreach {$_.Value}
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.
In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:
$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
%{ $_.Matches } |
%{ $_.Value }
The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.
If the start of the block isn't required in the output, the updated script is:
$fileContent |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.
If $fileContent isn't required for any further processing, the variable can be eliminated:
[io.file]::ReadAllText("c:\file.txt") |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
This regex will search for the text ap followed by any number of characters and new lines ending with a !:
(?si)(a).+?\!{1}
So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:
$Lines = Get-Content "c:\test\test.txt"
$varObjs = #()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
#Checks to make sure the line isn't blank or a !. If it is, it skips to next line
if ($Lines[$num] -match "!") {
$varLast++
continue
}
if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
$varLast++
continue
}
$Index = [array]::IndexOf($lines, $lines[$num])
$b=0
$varObj = New-Object System.Object
while ($Lines[$num + $b] -notmatch "!" ) {
#Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
$b ++
} #end While
#Adds the $varObj to $varObjs for future use
$varObjs += $varObj
$varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs
To me, a very clean and simple approach is to use a multiline bloc regex, with named captures, like this:
# Based on this text configuration:
$configurationText = #"
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
"#
# We can build a multiline regex bloc with the strings to be captured.
# Here, i am using the regex '.*?' than roughly means 'capture anything, as less as possible'
# A more specific regex can be defined for each field to capture.
# ( ) in the regex if for defining a group
# ?<> is for naming a group
$regex = #"
(?<userId>.*?) (?<userCode>.*?)
use profile (?<userProfile>.*?)
use rf-domain (?<userDomain>.*?)
hostname (?<hostname>.*?)
area (?<area>.*?)
!
"#
# Lets see if this matches !
if($configurationText -match $regex)
{
# it does !
Write-Host "Config text is successfully matched, here are the matches:"
$Matches
}
else
{
Write-Host "Config text could not be matched."
}
This script outputs the following:
PS C:\Users\xdelecroix> C:\FusionInvest\powershell\regex-capture-multiline-stackoverflow.ps1
Config text is successfully matched, here are the matches:
Name Value
---- -----
hostname ACCESSPOINT
userProfile PROFILE
userCode 00-01-23-45-67-89
area inside
userId ap71xx
userDomain DOMAIN
0 ap71xx 00-01-23-45-67-89...
For more flexibility, you can use Select-String instead of -match, but this is not really important here, in the context of this sample.
Here's my take. If you don't need the regex, you can use -like or .contains(). The question never says what the search pattern is. Here's an example with a windows text file.
$file = (get-content -raw file.txt) -replace "`r" # avoid the line ending issue
$pattern = 'two
three
f.*' -replace "`r"
# just showing what they really are
$file -replace "`r",'\r' -replace "`n",'\n'
$pattern -replace "`r",'\r' -replace "`n",'\n'
$file -match $pattern
$file | select-string $pattern -quiet