Expansion of a variable in a regex pattern doesn't work

Expansion of a variable in a regex pattern doesn't work - regex

As a novice in powershell coding, I have some difficulties with expansion of a variable in PowerShell regex patterns.
What I wanted to do is:
Scan for logfiles that have been changed between two timeframes
For each of the logfiles, I get part of the name which indicates the date it is referencing to.
That date is stored in the variable $filedate.
Then go trough each line logfiles
Whenever I find a line that looks like:
14:00:15 blablabla
In a file named blabla20130620.log
I want that the data line becomes
2013-06-20 14:00:15 blablabla
It should write the output in append mode to a text file (to concatenate different log files)
Here is what I got until now (I'm testing in a sandbox now, so no comments etc...)
$Logpath = "o:\Log"
$prevcheck="2013-06-24 19:27:14"
$currenttd="{0:yyyy-MM-dd HH:mm:ss}" -f (get-date)
$batch = 1000
[regex]$match_regex = '^([01]\d|2[0-3]):([0-5]\d):([0-5]\d)'
If (Test-Path "$Logpath\test.txt"){
Remove-Item "$Logpath\test.txt"
}
$files=Get-ChildItem $LogPath\*.log | Where-Object { $_.LastWriteTime -ge "$prevcheck" - and $_.LastWriteTime -le "$currenttd" -and !$_.PSIsContainer }
foreach ($file in $files)
{
$filedate=$file.Name.Substring(6,4) + "-" + $file.Name.Substring(10,2) + "-" + $file.Name.Substring(12,2)
## This doesn't seem to work fine
## results look like:
## "$filedate" 14:00:15 blablabla
$replace_regex = '"$filedate" $_'
## I tried this too, but without success
## The time seems to dissappear now
## results look like:
## 2013-06-20 blablabla
#$replace_regex = iex('$filedate' + $_)
(Get-Content $file.PSPath -ReadCount $batch) |
foreach-object {if ($_ -match $match_regex) { $_ -replace $match_regex, $replace_regex} else { $_ }}|
out-file -Append "o:\log\test.txt"

You're over-complicating things.
You're comparing dates in your Where-Object filter, so you don't need to transform your reference dates to strings. Just use dates:
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
You can use a regular expression to extract the date from the file name and transform it into the desired format:
$filedate = $file.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Your regular expression for matching the time is overly correct. Use ^(\d{2}:\d{2}:\d{2}) instead. It's a little sloppier, but it will most likely suffice and is a lot easier on the eye.
To prepend the time-match with the date, use "$filedate `$1". The double quotes will cause $filedate to be expanded to the date from the file name, and the escaped $ (``$1`) will keep the grouped match (see Richard's explanation).
While you can assign the results from each step to variables, it'd be simpler to just use a single pipeline.
Try this:
$Logpath = "o:\Log"
$Logfile = "$Logpath\test.txt"
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
If (Test-Path -LiteralPath $Logfile) { Remove-Item $Logfile }
Get-ChildItem "$LogPath\*.log" | ? {
-not $_.PSIsContainer -and
$_.LastWriteTime -ge $prevcheck -and
$_.LastWriteTime -le $currenttd
} | % {
$filedate = $_.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Get-Content $_ | % {
$_ -replace '^(\d{2}:\d{2}:\d{2})', "$filedate `$1"
} | Out-File -Append $Logfile
}

In PowerShell strings have to be in double quotes (") for variable substitution. Single quoted (') strings do not perform variable substitution.
In your script (in which I suggest you indent the content of code blocks to make the structure easier to follow):
$replace_regex = '"$filedate" $_'
where the string is single quoted, so no variable substitution. This can be fixed by remembering the back-quote (`) character can be used to escape double quotes embedded in a double quoted string:
$replace_regex = "`"$filedate`" $_"
But remember:
$ is a regex meta-character, so if you want to include a $ in a regex in double quotes it will need to be escaped to avoid PSH treating it as the start of the variable name.
Any regex meta-characters in the variable will have their regex meaning. Consider escaping the content of the variable before substitution ([regex]::Escape(string)).

Related

Replacing any content inbetween second and third underscore

I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":
get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}
Examples:
12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf
This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:
12345_09_MoreText.pdf
The desired output would be:
12345_00001_09_2018_Text_MoreText.pdf
12345_00002_09_2018_Text_MoreText.pdf
12345_00003_09_2018_Text_MoreText.pdf
How do I correctly replace the second and third underscore and everything inbetween with an "_"?

If you don't want to use regex -
$files = get-childitem *.pdf #get all pdf files
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
Rename-Item -Path $file.FullName -NewName $New
}
Sample Data -
$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
}

You may use
-replace '^((?:[^_]*_){2})[^_]+_', '$1'
See the regex demo
Details
^ - start of the line
((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
[^_]* - 0+ chars other than an underscore
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore

To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:
Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
$_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
-join '_' reassembles the modified array into a _-separated string, yielding the desired result.
Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.
As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:
Get-ChildItem *.pdf |
Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf
Wouldn't it be nice if we could write [0..1 + 3..] instead?
This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.

here's one other way ... using string methods.
'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
Split('_').
Where({
$_ -notmatch ','
}) -join '_'
result = 12345_00003_09_2018_Text_MoreText.pdf
that does the following ...
split on the underscores
toss out any item that has a comma in it
join the remaining items back into a string with underscores
i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]

replace thousands separators in csv with regex

I'm running into problems trying to pull the thousands separators out of some currency values in a set of files. The "bad" values are delimited with commas and double quotes. There are other values in there that are < $1000 that present no issue.
Example of existing file:
"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12"
Example of desired file (thousands separators removed):
"12345.67",12.34,"123456.78",1.00,"123456789.12"
I found a regex expression for matching the numbers with separators that works great, but I'm having trouble with the -replace operator. The replacement value is confusing me. I read about $& and I'm wondering if I should use that here. I tried $_, but that pulls out ALL my commas. Do I have to use $matches somehow?
Here's my code:
$Files = Get-ChildItem *input.csv
foreach ($file in $Files)
{
$file |
Get-Content | #assume that I can't use -raw
% {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem
out-file output.csv -append -encoding ascii
}

Tony Hinkle's comment is the answer: don't use regex for this (at least not directly on the CSV file).
Your CSV is valid, so you should parse it as such, work on the objects (change the text if you want), then write a new CSV.
Import-Csv -Path .\my.csv | ForEach-Object {
$_ | ForEach-Object {
$_ -replace ',',''
}
} | Export-Csv -Path .\my_new.csv
(this code needs work, specifically the middle as the row will have each column as a property, not an array, but a more complete version of your CSV would make that easier to demonstrate)

You can try with this regex:
,(?=(\d{3},?)+(?:\.\d{1,3})?")
See Live Demo or in powershell:
% {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' }
But it's more about the challenge that regex can bring. For proper work, use #briantist answer which is the clean way to do this.

I would use a simpler regex, and use capture groups instead of the entire capture.
I have tested the follow regular expression with your input and found no issues.
% {$_ -replace '([\d]),([\d])','$1$2' }
eg. Find all commas with a number before and after (so that the weird mixed splits dont matter) and replace the comma entirely.
This would have problems if your input has a scenario without that odd mixing of quotes and no quotes.

Exclude array items based on dynamic criteria

I have an array of strings like:
File1
File2
File1_s1
File2_s1
Print$
PSDrive
PSParentPath
I have a need to select all strings that do not conform to a dynamic set of rules. I don't really need fancy regex, I just want to match a dynamic amount of very simple regex rules. Basically:
$Arr | Where {($_.Name -notlike '_s1') -and ($_.Name -notlike 'Print$')}
But I need a dynamic amount of -ands specified by an input to the function. Is there any easy way to do this?

Ok so you can do this
$omits = "_s1","Print$"
$regex = '({0})' -f (($omits | ForEach-Object{[regex]::Escape($_)}) -join "|")
$arr | Where-Object{$_ -notmatch $regex}
$omits would contain the list of strings you want to -match/-notmatch. Then we take each member and run a regex escape on it ($ is a special regex character. The end of line anchor) The take each scrubbed string and build a matching group. So in the above example $regex would be
(_s1|Print\$)
Add more entries to $omit as you see fit. Which would give the filtered results as
File1
File2
PSDrive
PSParentPath
If you can be trusted to escape your own regex your options open up more.
$omits = "_s1","Print\$","^PS"
$regex = '({0})' -f ($omits -join "|")
That way the PS has to be at the beginning of the string.

As the #Matt suggests, without wildcards in your strings -NotLike is basically equivalent to -ne (and then you could just use -NotIn). I'm assuming your examples are missing wildcards but your actual patterns are not.
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike $pattern};
}
Or you could do some basic matching with:
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike "*$pattern*"};
}

Multiline regex to match config block

I am having some issues trying to match a certain config block (multiple ones) from a file. Below is the block that I'm trying to extract from the config file:
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
There are multiple ones just like this, each with a different MAC address. How do I match a config block across multiple lines?

The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:
$fileContent = [io.file]::ReadAllText("C:\file.txt")
Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:
$fileContent = Get-Content c:\file.txt -Raw
Then you need to specify a regex option to match across line terminators i.e.
SingleLine mode (. matches any char including line feed), as well as
Multiline mode (^ and $ match embedded line terminators), e.g.
(?smi) - note the "i" is to ignore case
e.g.:
C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
Foreach {$_.Matches} | Foreach {$_.Value}
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.

In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:
$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
%{ $_.Matches } |
%{ $_.Value }
The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.
If the start of the block isn't required in the output, the updated script is:
$fileContent |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.
If $fileContent isn't required for any further processing, the variable can be eliminated:
[io.file]::ReadAllText("c:\file.txt") |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }

This regex will search for the text ap followed by any number of characters and new lines ending with a !:
(?si)(a).+?\!{1}
So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:
$Lines = Get-Content "c:\test\test.txt"
$varObjs = #()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
#Checks to make sure the line isn't blank or a !. If it is, it skips to next line
if ($Lines[$num] -match "!") {
$varLast++
continue
}
if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
$varLast++
continue
}
$Index = [array]::IndexOf($lines, $lines[$num])
$b=0
$varObj = New-Object System.Object
while ($Lines[$num + $b] -notmatch "!" ) {
#Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
$b ++
} #end While
#Adds the $varObj to $varObjs for future use
$varObjs += $varObj
$varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs

To me, a very clean and simple approach is to use a multiline bloc regex, with named captures, like this:
# Based on this text configuration:
$configurationText = #"
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
"#
# We can build a multiline regex bloc with the strings to be captured.
# Here, i am using the regex '.*?' than roughly means 'capture anything, as less as possible'
# A more specific regex can be defined for each field to capture.
# ( ) in the regex if for defining a group
# ?<> is for naming a group
$regex = #"
(?<userId>.*?) (?<userCode>.*?)
use profile (?<userProfile>.*?)
use rf-domain (?<userDomain>.*?)
hostname (?<hostname>.*?)
area (?<area>.*?)
!
"#
# Lets see if this matches !
if($configurationText -match $regex)
{
# it does !
Write-Host "Config text is successfully matched, here are the matches:"
$Matches
}
else
{
Write-Host "Config text could not be matched."
}
This script outputs the following:
PS C:\Users\xdelecroix> C:\FusionInvest\powershell\regex-capture-multiline-stackoverflow.ps1
Config text is successfully matched, here are the matches:
Name Value
---- -----
hostname ACCESSPOINT
userProfile PROFILE
userCode 00-01-23-45-67-89
area inside
userId ap71xx
userDomain DOMAIN
0 ap71xx 00-01-23-45-67-89...
For more flexibility, you can use Select-String instead of -match, but this is not really important here, in the context of this sample.

Here's my take. If you don't need the regex, you can use -like or .contains(). The question never says what the search pattern is. Here's an example with a windows text file.
$file = (get-content -raw file.txt) -replace "`r" # avoid the line ending issue
$pattern = 'two
three
f.*' -replace "`r"
# just showing what they really are
$file -replace "`r",'\r' -replace "`n",'\n'
$pattern -replace "`r",'\r' -replace "`n",'\n'
$file -match $pattern
$file | select-string $pattern -quiet

Powershell: Leave item alone if regex doesn't match

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.
Example:
$f = #("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")
I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".
So far I have this:
$d = $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d
The output:
testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf
It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.
Thanks in advance.

You can simply do:
PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf

$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"
Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".
As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).
Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.
Thats why you get this long string with every character surrounded by DATESTAMP.
To check if there even exists a \d{4,} in your string you should able to use
[regex]::IsMatch($_, "\d{4,}")
I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed
$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Expansion of a variable in a regex pattern doesn't work - regex

Related

Replacing any content inbetween second and third underscore

replace thousands separators in csv with regex

Exclude array items based on dynamic criteria

Multiline regex to match config block

Powershell: Leave item alone if regex doesn't match

Categories

Resources