Powershell: Leave item alone if regex doesn't match - regex

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.
Example:
$f = #("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")
I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".
So far I have this:
$d = $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d
The output:
testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf
It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.
Thanks in advance.

You can simply do:
PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf

$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"
Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".
As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).
Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.
Thats why you get this long string with every character surrounded by DATESTAMP.
To check if there even exists a \d{4,} in your string you should able to use
[regex]::IsMatch($_, "\d{4,}")
I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed
$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}

Related

Replacing any content inbetween second and third underscore

I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":
get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}
Examples:
12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf
This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:
12345_09_MoreText.pdf
The desired output would be:
12345_00001_09_2018_Text_MoreText.pdf
12345_00002_09_2018_Text_MoreText.pdf
12345_00003_09_2018_Text_MoreText.pdf
How do I correctly replace the second and third underscore and everything inbetween with an "_"?
If you don't want to use regex -
$files = get-childitem *.pdf #get all pdf files
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
Rename-Item -Path $file.FullName -NewName $New
}
Sample Data -
$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
}
You may use
-replace '^((?:[^_]*_){2})[^_]+_', '$1'
See the regex demo
Details
^ - start of the line
((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
[^_]* - 0+ chars other than an underscore
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore
To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:
Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
$_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
-join '_' reassembles the modified array into a _-separated string, yielding the desired result.
Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.
As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:
Get-ChildItem *.pdf |
Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf
Wouldn't it be nice if we could write [0..1 + 3..] instead?
This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.
here's one other way ... using string methods.
'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
Split('_').
Where({
$_ -notmatch ','
}) -join '_'
result = 12345_00003_09_2018_Text_MoreText.pdf
that does the following ...
split on the underscores
toss out any item that has a comma in it
join the remaining items back into a string with underscores
i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]

Parsing Data in powershell, with the format of Label:Data

I am doing a Invoke-Webrequest in powershell to an url that does not contain any HTML, just text. I am needing to pick out a specific part of this data that is in the format of Label:Data. Each piece of data is one it's own separate line. I'm looking for some ideas on how to accomplish this. Here is a sample of the $Response.Contentdata below. I am looking to isolate the speed-over-ground:0.0
rate-of-turn:0.0
course-over-ground:293.0
speed-over-ground:0.0
heading-true:243.0
hdop:1.0
active-waypoint-name:
bearing-to-waypoint:
distance-to-waypoint:
cross-track-error:0
cross-track-error-limit:
cross-track-error-scale:0
lateral-speed-bow:0.09
lateral-speed-stern:-0.05
longitudinal-speed:-0.05
I guess it's a single string, rather than an array of lines. So, split it into lines:
$Response.Content -split "`r?`n"
Find the one which says speed-over-ground
$line = $Response.Content -split "`r?`n" | Where-Object { $_ -match 'speed-over-ground' }
Split the text from the number, using the : separator, and take the second item, converted from text to a number if appropriate:
[decimal]$speedOverGround = $line.Split(':')[1]
Although, I might try to turn all of them into an object in a bulk transform. Complexity varies with the exact possible inputs, but this tries to convert numbers to numbers and leave empty ones as nulls:
$data = New-Object -TypeName PSCustomObject
$Response.Content -split "`r?`n" -replace ':\s*$', ':$null' |
ForEach-Object {
$name, $value = $_.Split(':').Trim()
$decimalValue = 0
if ([decimal]::TryParse($value, [ref]$decimalValue))
{
$value = $decimalValue
}
$data | Add-Member -NotePropertyName $name -NotePropertyValue $value
}
# Then you can do:
$data.'speed-over-ground'

Regex replace contents of file and delete lines that don't match

I have a large log file where I want to extract certain types of lines. I have created a working regex to match these lines. How can I now use this regex to extract the lines and nothing else? I have tried
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
else{
$_ -replace ".*", ""
}
}
Which almost works, but the lines that are not of interest still remain as blank lines (meaning the lines of interested are spaced VERY far apart).
The best way is to remove the else clause altogether. If you do that, then no object will be returned from that iteration of the ForEach-Object block.
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
}
Just to append to briantist's answer you don't even need the loop structure. -match and -replace will function as array operators. Removing the need for the if and ForEach-Object.
(Get-Content .\file) -match "..." -replace "...","..."
Get-Content being the target of the alias cat

Powershell replace exact string

I want to replace a simple string "WEEK." (with a dot) in a text file with the string "TEST"
$LOG= "C:\FILE.TXT"
$A= "TEST"
(Get-Content $LOG) | Foreach { $_ -Replace "WEEK.", $A } | Set-Content $LOG;
The problem is that my file has this content:
WEEK_A WEEK.
And when I run my script the result is:
TESTA TEST
and the result that i want is:
WEEK_A TEST
I try with ^ "WEEK." and "^WEEK.$" but it not worked
Can you help me with the regexp? Thanks
====== EDIT ==================
Ok. I try with
$LOG= "C:\FILE.TXT"
$A= "TEST"
(Get-Content $LOG) | Foreach { $_ -Replace "WEEK\.", $A } | Set-Content $LOG;
and seems its works
The reason why this happened is because you have used pattern WEEK. The dot was a problem: in a regular expression world, the dot means "any character". That's why it was replacing both WEEK_ and WEEK..
When you have added backslash, then the dot was escaped ie. it lost it's special meaning. Thus making it work.

Expansion of a variable in a regex pattern doesn't work

As a novice in powershell coding, I have some difficulties with expansion of a variable in PowerShell regex patterns.
What I wanted to do is:
Scan for logfiles that have been changed between two timeframes
For each of the logfiles, I get part of the name which indicates the date it is referencing to.
That date is stored in the variable $filedate.
Then go trough each line logfiles
Whenever I find a line that looks like:
14:00:15 blablabla
In a file named blabla20130620.log
I want that the data line becomes
2013-06-20 14:00:15 blablabla
It should write the output in append mode to a text file (to concatenate different log files)
Here is what I got until now (I'm testing in a sandbox now, so no comments etc...)
$Logpath = "o:\Log"
$prevcheck="2013-06-24 19:27:14"
$currenttd="{0:yyyy-MM-dd HH:mm:ss}" -f (get-date)
$batch = 1000
[regex]$match_regex = '^([01]\d|2[0-3]):([0-5]\d):([0-5]\d)'
If (Test-Path "$Logpath\test.txt"){
Remove-Item "$Logpath\test.txt"
}
$files=Get-ChildItem $LogPath\*.log | Where-Object { $_.LastWriteTime -ge "$prevcheck" - and $_.LastWriteTime -le "$currenttd" -and !$_.PSIsContainer }
foreach ($file in $files)
{
$filedate=$file.Name.Substring(6,4) + "-" + $file.Name.Substring(10,2) + "-" + $file.Name.Substring(12,2)
## This doesn't seem to work fine
## results look like:
## "$filedate" 14:00:15 blablabla
$replace_regex = '"$filedate" $_'
## I tried this too, but without success
## The time seems to dissappear now
## results look like:
## 2013-06-20 blablabla
#$replace_regex = iex('$filedate' + $_)
(Get-Content $file.PSPath -ReadCount $batch) |
foreach-object {if ($_ -match $match_regex) { $_ -replace $match_regex, $replace_regex} else { $_ }}|
out-file -Append "o:\log\test.txt"
You're over-complicating things.
You're comparing dates in your Where-Object filter, so you don't need to transform your reference dates to strings. Just use dates:
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
You can use a regular expression to extract the date from the file name and transform it into the desired format:
$filedate = $file.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Your regular expression for matching the time is overly correct. Use ^(\d{2}:\d{2}:\d{2}) instead. It's a little sloppier, but it will most likely suffice and is a lot easier on the eye.
To prepend the time-match with the date, use "$filedate `$1". The double quotes will cause $filedate to be expanded to the date from the file name, and the escaped $ (``$1`) will keep the grouped match (see Richard's explanation).
While you can assign the results from each step to variables, it'd be simpler to just use a single pipeline.
Try this:
$Logpath = "o:\Log"
$Logfile = "$Logpath\test.txt"
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
If (Test-Path -LiteralPath $Logfile) { Remove-Item $Logfile }
Get-ChildItem "$LogPath\*.log" | ? {
-not $_.PSIsContainer -and
$_.LastWriteTime -ge $prevcheck -and
$_.LastWriteTime -le $currenttd
} | % {
$filedate = $_.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Get-Content $_ | % {
$_ -replace '^(\d{2}:\d{2}:\d{2})', "$filedate `$1"
} | Out-File -Append $Logfile
}
In PowerShell strings have to be in double quotes (") for variable substitution. Single quoted (') strings do not perform variable substitution.
In your script (in which I suggest you indent the content of code blocks to make the structure easier to follow):
$replace_regex = '"$filedate" $_'
where the string is single quoted, so no variable substitution. This can be fixed by remembering the back-quote (`) character can be used to escape double quotes embedded in a double quoted string:
$replace_regex = "`"$filedate`" $_"
But remember:
$ is a regex meta-character, so if you want to include a $ in a regex in double quotes it will need to be escaped to avoid PSH treating it as the start of the variable name.
Any regex meta-characters in the variable will have their regex meaning. Consider escaping the content of the variable before substitution ([regex]::Escape(string)).