Below I have a ps1 for finding and appending text defined by regex pattern; i.e. from [pattern] to [pattern]Foo. Is there a simpler way to do this for multiple regex patterns, other than defining each regex as pattern2, pattern3, etc. and creating a separate "ForEach" to correspond to every regex? Because that's how I did it, and it works but it looks very rudimentary.
$pattern1 = [regex]'([___)'
$pattern2 = [regex]'([___)'
Get-ChildItem 'C:\\File\\Location\\*.txt' -Recurse | ForEach {
(Get-Content $_ |
ForEach { $_ -replace $pattern1, ('$1'+'FOO')} |
ForEach { $_ -replace $pattern2, ('$1'+'FOO')}) |
Set-Content $_
}
If you are replacing with the same replacement pattern, just use alternation:
$pattern = [regex]'(pattern1|pattern2)'
NOTE: in unanchored alternations, you should watch out for the order of the alternatives: if a shorter branch can match at the same location in string, a longer one - if it is present after the shorter one - won't get tested. E.g. (on|one|ones) will only match on in ones. See more about that in the Remember That The Regex Engine Is Eager.
Related
I have a folder full of text documents in .adoc format that have some text in them. The text is following: link:lalala.html[lalala]. I want to replace this text with xref:lalala.adoc[lalala]. So, basically, just replace link: with xref:, .html with .adoc, leave all the rest unchanged.
But the problem is that lalala can be anything from a word to ../topics/halva.html.
I definitely know that I need to use regex patterns, I previously used similar script. A replace directive wrapped in an object:
Get-ChildItem -Path *.adoc -file -recurse | ForEach-Object {
$lines = Get-Content -Path $PSItem.FullName -Encoding UTF8 -Raw
$patterns = #{
'(\[\.dfn \.term])#(.*?)#' = '$1_$2_' ;
}
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
foreach($k in $patterns.Keys){
$pat = [regex]::new($k, $option)
$lines = $pat.Replace($lines, $patterns.$k)
}
$lines | Set-Content -Path $PSItem.FullName -Encoding UTF8 -Force
}
Looks like I need a different script since the new task cannot be added as just another object. I could've just replaced each part separately, using two objects: replace link: with xref:, then replace .html with .adoc.
But this can interfere with other links that end with .html and don't start with link:. In the text, absolute links usually don't have link: in the beginning. They always start with http:// or https://. And they still may or may not end with .html. So the best idea is to take the whole string link:lalala.html[lalala] and try to replace it with xref:lalala.adoc[lalala].
I need the help of someone who knows regex and PowerShell, please this would save me.
As a pattern, you might use
\blink:(.+?)\.html(?=\[[^][]*])
\blink: Match link:
(.+?) Capture 1+ chars as least as possbile in group 1
\.html match .html
(?=\[[^][]*]) Assert from an opening till closing square bracket at the right
Regex demo
In the replacement use group 1 using $1
xref:$1.adoc
Example
$Strings = #("link:lalala.html[lalala]", "link:../topics/halva.html[../topics/halva.html]")
$Strings -replace "\blink:(.+?)\.html(?=\[[^][]*])",'xref:$1.adoc'
Output
xref:lalala.adoc[lalala]
xref:../topics/halva.adoc[../topics/halva.html]
I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
This is what I want :
server01
server02
server03 test
I have code so far :
$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")
$Match = $Regex.Match($String)
You may use
[regex]::matches($String, '(?<=Equal\s*")[^"]+')
See the regex demo.
See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:
(?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
[^"]+ - consumes 1+ chars other than double quotation mark.
Demo:
$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}
Output:
server01
server02
server03 test
Here is a full snippet reading the file in, getting all matches and saving to file:
$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Value } |
Set-Content $newfile
Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:
$str = #'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'#
[regex]::Matches($str, '".*?"').Value -replace '"'
Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.
It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.
As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:
# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'
See this feature suggestion on GitHub.
Optional reading: performance comparison
Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.
Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.
Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).
In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).
Here is a performance comparison using the Time-Command function, averaging 1000 runs:
Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
{ [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
{ [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
{ $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
Format-Table Factor, Command
Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.
Factor Command
------ -------
1.00 [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85 [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07 [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35 $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'
You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value
If you want to export the results, you can do the following:
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"
An alterative reading the file directly with Select-String using Wiktor's good RegEx:
Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt
Sample output:
> Get-Content .\NewFile.txt
server01
server02
server03 test
I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":
get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}
Examples:
12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf
This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:
12345_09_MoreText.pdf
The desired output would be:
12345_00001_09_2018_Text_MoreText.pdf
12345_00002_09_2018_Text_MoreText.pdf
12345_00003_09_2018_Text_MoreText.pdf
How do I correctly replace the second and third underscore and everything inbetween with an "_"?
If you don't want to use regex -
$files = get-childitem *.pdf #get all pdf files
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
Rename-Item -Path $file.FullName -NewName $New
}
Sample Data -
$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
}
You may use
-replace '^((?:[^_]*_){2})[^_]+_', '$1'
See the regex demo
Details
^ - start of the line
((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
[^_]* - 0+ chars other than an underscore
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore
To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:
Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
$_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
-join '_' reassembles the modified array into a _-separated string, yielding the desired result.
Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.
As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:
Get-ChildItem *.pdf |
Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf
Wouldn't it be nice if we could write [0..1 + 3..] instead?
This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.
here's one other way ... using string methods.
'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
Split('_').
Where({
$_ -notmatch ','
}) -join '_'
result = 12345_00003_09_2018_Text_MoreText.pdf
that does the following ...
split on the underscores
toss out any item that has a comma in it
join the remaining items back into a string with underscores
i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]
I'm running into problems trying to pull the thousands separators out of some currency values in a set of files. The "bad" values are delimited with commas and double quotes. There are other values in there that are < $1000 that present no issue.
Example of existing file:
"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12"
Example of desired file (thousands separators removed):
"12345.67",12.34,"123456.78",1.00,"123456789.12"
I found a regex expression for matching the numbers with separators that works great, but I'm having trouble with the -replace operator. The replacement value is confusing me. I read about $& and I'm wondering if I should use that here. I tried $_, but that pulls out ALL my commas. Do I have to use $matches somehow?
Here's my code:
$Files = Get-ChildItem *input.csv
foreach ($file in $Files)
{
$file |
Get-Content | #assume that I can't use -raw
% {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem
out-file output.csv -append -encoding ascii
}
Tony Hinkle's comment is the answer: don't use regex for this (at least not directly on the CSV file).
Your CSV is valid, so you should parse it as such, work on the objects (change the text if you want), then write a new CSV.
Import-Csv -Path .\my.csv | ForEach-Object {
$_ | ForEach-Object {
$_ -replace ',',''
}
} | Export-Csv -Path .\my_new.csv
(this code needs work, specifically the middle as the row will have each column as a property, not an array, but a more complete version of your CSV would make that easier to demonstrate)
You can try with this regex:
,(?=(\d{3},?)+(?:\.\d{1,3})?")
See Live Demo or in powershell:
% {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' }
But it's more about the challenge that regex can bring. For proper work, use #briantist answer which is the clean way to do this.
I would use a simpler regex, and use capture groups instead of the entire capture.
I have tested the follow regular expression with your input and found no issues.
% {$_ -replace '([\d]),([\d])','$1$2' }
eg. Find all commas with a number before and after (so that the weird mixed splits dont matter) and replace the comma entirely.
This would have problems if your input has a scenario without that odd mixing of quotes and no quotes.
I have an array of strings like:
File1
File2
File1_s1
File2_s1
Print$
PSDrive
PSParentPath
I have a need to select all strings that do not conform to a dynamic set of rules. I don't really need fancy regex, I just want to match a dynamic amount of very simple regex rules. Basically:
$Arr | Where {($_.Name -notlike '_s1') -and ($_.Name -notlike 'Print$')}
But I need a dynamic amount of -ands specified by an input to the function. Is there any easy way to do this?
Ok so you can do this
$omits = "_s1","Print$"
$regex = '({0})' -f (($omits | ForEach-Object{[regex]::Escape($_)}) -join "|")
$arr | Where-Object{$_ -notmatch $regex}
$omits would contain the list of strings you want to -match/-notmatch. Then we take each member and run a regex escape on it ($ is a special regex character. The end of line anchor) The take each scrubbed string and build a matching group. So in the above example $regex would be
(_s1|Print\$)
Add more entries to $omit as you see fit. Which would give the filtered results as
File1
File2
PSDrive
PSParentPath
If you can be trusted to escape your own regex your options open up more.
$omits = "_s1","Print\$","^PS"
$regex = '({0})' -f ($omits -join "|")
That way the PS has to be at the beginning of the string.
As the #Matt suggests, without wildcards in your strings -NotLike is basically equivalent to -ne (and then you could just use -NotIn). I'm assuming your examples are missing wildcards but your actual patterns are not.
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike $pattern};
}
Or you could do some basic matching with:
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike "*$pattern*"};
}