Regular expression seems not to work in Where-Object cmdlet - regex

I am trying to add quote characters around two fields in a file of comma separated lines. Here is one line of data:
1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
which I would like to become this:
1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
I began developing my regular expression in a simple PowerShell script, and soon I have the following:
$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew
which gives me this output:
1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
Great! I'm all set. Extend this example to the general case of a file of similar lines of data:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}
This is a listing of test_data.csv:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
This is the output of my script:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
I have also tried this version of the script:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}
and obtained the same results.
My simple test script has convinced me that the regex is correct, but something happens when I use that regex inside a filter script in the Where-Object cmdlet.
What simple, yet critical, detail am I overlooking here?
Here is my PSVerion:
Major Minor Build Revision
----- ----- ----- --------
5 0 10586 117

You're misunderstanding how Where-Object works. The cmdlet outputs those input lines for which the -FilterScript expression evaluates to $true. It does NOT output whatever you do inside that scriptblock (you'd use ForEach-Object for that).
You don't need either Where-Object or ForEach-Object, though. Just put Get-Content in parentheses and use that as the first operand for the -replace operator. You also don't need the 4th capturing group. I would recommend anchoring the expression at the beginning of the string, though.
(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'

This seems to work here. I used ForEach-Object to process each record.
Get-Content test_data.csv |
ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }
This also seems to work. Uses the ? to create a reluctant (lazy) capture.
Get-Content test_data.csv |
ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }

I would just make a small change to what you have in order for this to work. Simply change the script to the following, noting that I changed the -FilterScript to a ForEach-Object and fixed a minor typo that you had on the last item in the regular expression with the quotes:
Get-Content c:\temp\test_data.csv | ForEach-Object {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}
I tested this with the data you provided and it adds the quotes to the correct columns.

Related

How can i replace all lines in a file with a pattern using Powershell?

I have a file with lines that i wish to remove like the following:
key="Id" value=123"
key="FirstName" value=Name1"
key="LastName" value=Name2"
<!--key="FirstName" value=Name3"
key="LastName" value=Name4"-->
key="Address" value=Address1"
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
key="ReferenceNo" value=765
have tried the following: `
$values = #('key="FirstName"','key="Lastname"', 'add key="Address"');
$regexValues = [string]::Join('|',$values)
$lineprod = Get-Content "D:\test\testfile.txt" | Select-String $regexValues|Select-Object -
ExpandProperty Line
if ($null -ne $lineprod)
{
foreach ($value in $lineprod)
{
$prod = $value.Trim()
$contentProd | ForEach-Object {$_ -replace $prod,""} |Set-Content "D:\test\testfile.txt"
}
}
The issue is that only some of the lines get replaced and or removed and some remain.
The output should be
key="Id" value=123"
key="ReferenceNo" value=765
But i seem to get
key="Id" value=123"
key="ReferenceNo" value=765
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
Any ideas as to why this is happening or changes to the code above ?
Based on your comment, the token 'add key="Address"' should be changed for just 'key="Address"' then the concatenating logic to build your regex looks good. You need to use the -NotMatch switch so it matches anything but those values. Also, Select-String can read files, so, Get-Content can be removed.
Note, the use of (...) in this case is important because you're reading and writing to the same file in the same pipeline. Wrapping the statement in parentheses ensure that all output from Select-String is consumed before passing it through the pipeline. Otherwise, you would end up with an empty file.
$values = 'key="FirstName"', 'key="Lastname"', 'key="Address"'
$regexValues = [string]::Join('|', $values)
(Select-String D:\test\testfile.txt -Pattern $regexValues -NotMatch) |
ForEach-Object Line | Set-Content D:\test\testfile.txt
Outputs:
key="Id" value=123"
key="ReferenceNo" value=765

Extract string from text file via Powershell

I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
This is what I want :
server01
server02
server03 test
I have code so far :
$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")
$Match = $Regex.Match($String)
You may use
[regex]::matches($String, '(?<=Equal\s*")[^"]+')
See the regex demo.
See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:
(?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
[^"]+ - consumes 1+ chars other than double quotation mark.
Demo:
$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}
Output:
server01
server02
server03 test
Here is a full snippet reading the file in, getting all matches and saving to file:
$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Value } |
Set-Content $newfile
Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:
$str = #'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'#
[regex]::Matches($str, '".*?"').Value -replace '"'
Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.
It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.
As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:
# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'
See this feature suggestion on GitHub.
Optional reading: performance comparison
Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.
Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.
Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).
In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).
Here is a performance comparison using the Time-Command function, averaging 1000 runs:
Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
{ [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
{ [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
{ $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
Format-Table Factor, Command
Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.
Factor Command
------ -------
1.00 [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85 [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07 [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35 $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'
You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value
If you want to export the results, you can do the following:
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"
An alterative reading the file directly with Select-String using Wiktor's good RegEx:
Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt
Sample output:
> Get-Content .\NewFile.txt
server01
server02
server03 test

Powershell replace lines stating with given pattern

Background:
I am trying to read a config file and place it in a string variable for further use. For now I've managed to remove the newlines, which was fairly simple. However, I'm having a bit of trouble with removing comments (lines starting with a #).
What I have so far
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","")
What I've tried
A lot of it is something along the lines of
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("#.*?`r`n","").Replace("`r`n","")
( .Replace("(?<=#).*?(?=`r`n)","") , .Replace('^[^#].*?`r`n','') etc)
A lot of the resources I've found have treated how to iteratively read from a file and write back to it or a new one, but what I need is for the result to stay in a variable and for the original file not to be altered in any way (also I'd rather avoid using temp files or even variables if possible). I think there is something fundamental I'm missing about the input to Replace. (Also found this semi-relevant piece when you're using the ConvertFrom-Csv Using Import-CSV in Powershell, ignoring commented lines .)
Sample input:
text;
weird text;
other-sort-of-text;
#commented out possibility;
more-input/with-comment;#comment
Sample output:
text;weird text;other-sort-of-text;more-input/with-comment;
Additional info:
Am going to run this on current builds of Windows 10, locally now I seem to have pwershell version 5.1.14393.693
Split the string at semicolons, remove element starting with a #, then join the result back to a string.
((Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","") -split ';' |
Where-Object { $_.Trim() -notlike '#*' }) -join ';'
This might be identical to some of the other responses, but here is how I would do it:
$Data = Get-Content $HOME/path/to/config.txt -Raw
(($Data -split(';')).replace("`r`n",'') | Where-Object { $_ -notlike '^#*' }) -join(';')
Anyway you do it, remember that rn needs to be expanded, so it has to be encased in double quotes, unlike the rest of your characters.
#############solution 1 with convertfrom-string####################
#short version
(gc "$HOME/path/to/config.txt" | ? {$_ -notlike "#*"} | cfs -D ";").P1 -join ";"
#verbose version
(Get-Content "$HOME/path/to/config.txt" | where {$_ -notlike "#*"} | ConvertFrom-String -Delimiter ";").P1 -join ";"
#############Solution 2 with convertfrom-csv#######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | ConvertFrom-csv -Delimiter ";" -Header "P1").P1 -join ";"
#############Solution 3 with split #######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | %{$_.Split(';')[0]}) -join ";"

replace thousands separators in csv with regex

I'm running into problems trying to pull the thousands separators out of some currency values in a set of files. The "bad" values are delimited with commas and double quotes. There are other values in there that are < $1000 that present no issue.
Example of existing file:
"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12"
Example of desired file (thousands separators removed):
"12345.67",12.34,"123456.78",1.00,"123456789.12"
I found a regex expression for matching the numbers with separators that works great, but I'm having trouble with the -replace operator. The replacement value is confusing me. I read about $& and I'm wondering if I should use that here. I tried $_, but that pulls out ALL my commas. Do I have to use $matches somehow?
Here's my code:
$Files = Get-ChildItem *input.csv
foreach ($file in $Files)
{
$file |
Get-Content | #assume that I can't use -raw
% {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem
out-file output.csv -append -encoding ascii
}
Tony Hinkle's comment is the answer: don't use regex for this (at least not directly on the CSV file).
Your CSV is valid, so you should parse it as such, work on the objects (change the text if you want), then write a new CSV.
Import-Csv -Path .\my.csv | ForEach-Object {
$_ | ForEach-Object {
$_ -replace ',',''
}
} | Export-Csv -Path .\my_new.csv
(this code needs work, specifically the middle as the row will have each column as a property, not an array, but a more complete version of your CSV would make that easier to demonstrate)
You can try with this regex:
,(?=(\d{3},?)+(?:\.\d{1,3})?")
See Live Demo or in powershell:
% {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' }
But it's more about the challenge that regex can bring. For proper work, use #briantist answer which is the clean way to do this.
I would use a simpler regex, and use capture groups instead of the entire capture.
I have tested the follow regular expression with your input and found no issues.
% {$_ -replace '([\d]),([\d])','$1$2' }
eg. Find all commas with a number before and after (so that the weird mixed splits dont matter) and replace the comma entirely.
This would have problems if your input has a scenario without that odd mixing of quotes and no quotes.

Regex replace contents of file and delete lines that don't match

I have a large log file where I want to extract certain types of lines. I have created a working regex to match these lines. How can I now use this regex to extract the lines and nothing else? I have tried
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
else{
$_ -replace ".*", ""
}
}
Which almost works, but the lines that are not of interest still remain as blank lines (meaning the lines of interested are spaced VERY far apart).
The best way is to remove the else clause altogether. If you do that, then no object will be returned from that iteration of the ForEach-Object block.
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
}
Just to append to briantist's answer you don't even need the loop structure. -match and -replace will function as array operators. Removing the need for the if and ForEach-Object.
(Get-Content .\file) -match "..." -replace "...","..."
Get-Content being the target of the alias cat