Powershell regex to grab data between second set of double quotes - regex

I have a dataset like the below, I've been trying to use split to get the second column of data. Can this be done with a regex? I just need the data in the second column, the unc paths.
"\app\DATA\8161ST1\201901\20190111\2562233\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*"
"\app\DATA\8161ST1\201901\20190111\2562234\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*"
"\app\DATA\8161ST1\201901\20190111\2562235\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*"

A conceptually simple solution:
# Array of input lines, such as would be returned by Get-Content
$lines = #'
"\app\DATA\8161ST1\201901\20190111\2562233\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*"
"\app\DATA\8161ST1\201901\20190111\2562234\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*"
"\app\DATA\8161ST1\201901\20190111\2562235\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*"
'# -split [Environment]::NewLine
# Extract the content of the last "..."-enclosed token from each line.
$lines | ForEach-Object { ($_ -split '"')[-2] }
The above yields:
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*

Related

Powershell: Regex matching with Get-content -Raw flag results in empty results [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 5 months ago.
Solution was adding (?ms) to the front of my regex query
I am trying to search for chunks of text within a file, and preserving the line breaks in a chunk.
When I define my variable as $variable = get-content $fromfile,
my function (below) is able to find the text I'm looking for but it is difficult to parse further due to a lack of line breaks
function FindBetween($first, $second, $importing){
$pattern = "$first(.*?)$second"
$result = [regex]::Match($importing, $pattern).Groups[1].Value
return $result
}
when I define my variable as $variable = get-content $fromfile -raw, the output of my query is blank. I'm able to print the variable, and it does preserve the line breaks.
I run into the same issue regardless of if I add \r\n to the end of my pattern, if I use #() around my variable definition, if I use -Delimiter \n, or any combination of all those.
Whole code is here:
param($fromfile)
$working = get-content $fromfile -raw
function FindBetween($first, $second, $importing){
$pattern = "(?ms)$first(.*?)$second"
$result = [regex]::Match($importing, $pattern).Groups[1].Value
#$result = select-string -InputObject $importing -Pattern $pattern
return $result
}
FindBetween -first "host ####" -second "#### flag 2" -importing $working | Out-File "testresult.txt"
the file I'm testing it against looks like:
#### flag 1 host ####
stuff in between
#### flag 2 server ####
#### process manager ####
As to why I'm doing this:
I'm trying to automate taking a file that has defined sections with titles and outputting the content of those separate sections into a .csv (each section is formatted drastically different from each other). These files are all uniform to each other, containing the same sections and general content.
If you're doing -raw you probably need to change your RegEx to "(?ms)$first(.*?)$second" so that . will match new lines.

Parsing Data in powershell, with the format of Label:Data

I am doing a Invoke-Webrequest in powershell to an url that does not contain any HTML, just text. I am needing to pick out a specific part of this data that is in the format of Label:Data. Each piece of data is one it's own separate line. I'm looking for some ideas on how to accomplish this. Here is a sample of the $Response.Contentdata below. I am looking to isolate the speed-over-ground:0.0
rate-of-turn:0.0
course-over-ground:293.0
speed-over-ground:0.0
heading-true:243.0
hdop:1.0
active-waypoint-name:
bearing-to-waypoint:
distance-to-waypoint:
cross-track-error:0
cross-track-error-limit:
cross-track-error-scale:0
lateral-speed-bow:0.09
lateral-speed-stern:-0.05
longitudinal-speed:-0.05
I guess it's a single string, rather than an array of lines. So, split it into lines:
$Response.Content -split "`r?`n"
Find the one which says speed-over-ground
$line = $Response.Content -split "`r?`n" | Where-Object { $_ -match 'speed-over-ground' }
Split the text from the number, using the : separator, and take the second item, converted from text to a number if appropriate:
[decimal]$speedOverGround = $line.Split(':')[1]
Although, I might try to turn all of them into an object in a bulk transform. Complexity varies with the exact possible inputs, but this tries to convert numbers to numbers and leave empty ones as nulls:
$data = New-Object -TypeName PSCustomObject
$Response.Content -split "`r?`n" -replace ':\s*$', ':$null' |
ForEach-Object {
$name, $value = $_.Split(':').Trim()
$decimalValue = 0
if ([decimal]::TryParse($value, [ref]$decimalValue))
{
$value = $decimalValue
}
$data | Add-Member -NotePropertyName $name -NotePropertyValue $value
}
# Then you can do:
$data.'speed-over-ground'

replace thousands separators in csv with regex

I'm running into problems trying to pull the thousands separators out of some currency values in a set of files. The "bad" values are delimited with commas and double quotes. There are other values in there that are < $1000 that present no issue.
Example of existing file:
"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12"
Example of desired file (thousands separators removed):
"12345.67",12.34,"123456.78",1.00,"123456789.12"
I found a regex expression for matching the numbers with separators that works great, but I'm having trouble with the -replace operator. The replacement value is confusing me. I read about $& and I'm wondering if I should use that here. I tried $_, but that pulls out ALL my commas. Do I have to use $matches somehow?
Here's my code:
$Files = Get-ChildItem *input.csv
foreach ($file in $Files)
{
$file |
Get-Content | #assume that I can't use -raw
% {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem
out-file output.csv -append -encoding ascii
}
Tony Hinkle's comment is the answer: don't use regex for this (at least not directly on the CSV file).
Your CSV is valid, so you should parse it as such, work on the objects (change the text if you want), then write a new CSV.
Import-Csv -Path .\my.csv | ForEach-Object {
$_ | ForEach-Object {
$_ -replace ',',''
}
} | Export-Csv -Path .\my_new.csv
(this code needs work, specifically the middle as the row will have each column as a property, not an array, but a more complete version of your CSV would make that easier to demonstrate)
You can try with this regex:
,(?=(\d{3},?)+(?:\.\d{1,3})?")
See Live Demo or in powershell:
% {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' }
But it's more about the challenge that regex can bring. For proper work, use #briantist answer which is the clean way to do this.
I would use a simpler regex, and use capture groups instead of the entire capture.
I have tested the follow regular expression with your input and found no issues.
% {$_ -replace '([\d]),([\d])','$1$2' }
eg. Find all commas with a number before and after (so that the weird mixed splits dont matter) and replace the comma entirely.
This would have problems if your input has a scenario without that odd mixing of quotes and no quotes.

Expansion of a variable in a regex pattern doesn't work

As a novice in powershell coding, I have some difficulties with expansion of a variable in PowerShell regex patterns.
What I wanted to do is:
Scan for logfiles that have been changed between two timeframes
For each of the logfiles, I get part of the name which indicates the date it is referencing to.
That date is stored in the variable $filedate.
Then go trough each line logfiles
Whenever I find a line that looks like:
14:00:15 blablabla
In a file named blabla20130620.log
I want that the data line becomes
2013-06-20 14:00:15 blablabla
It should write the output in append mode to a text file (to concatenate different log files)
Here is what I got until now (I'm testing in a sandbox now, so no comments etc...)
$Logpath = "o:\Log"
$prevcheck="2013-06-24 19:27:14"
$currenttd="{0:yyyy-MM-dd HH:mm:ss}" -f (get-date)
$batch = 1000
[regex]$match_regex = '^([01]\d|2[0-3]):([0-5]\d):([0-5]\d)'
If (Test-Path "$Logpath\test.txt"){
Remove-Item "$Logpath\test.txt"
}
$files=Get-ChildItem $LogPath\*.log | Where-Object { $_.LastWriteTime -ge "$prevcheck" - and $_.LastWriteTime -le "$currenttd" -and !$_.PSIsContainer }
foreach ($file in $files)
{
$filedate=$file.Name.Substring(6,4) + "-" + $file.Name.Substring(10,2) + "-" + $file.Name.Substring(12,2)
## This doesn't seem to work fine
## results look like:
## "$filedate" 14:00:15 blablabla
$replace_regex = '"$filedate" $_'
## I tried this too, but without success
## The time seems to dissappear now
## results look like:
## 2013-06-20 blablabla
#$replace_regex = iex('$filedate' + $_)
(Get-Content $file.PSPath -ReadCount $batch) |
foreach-object {if ($_ -match $match_regex) { $_ -replace $match_regex, $replace_regex} else { $_ }}|
out-file -Append "o:\log\test.txt"
You're over-complicating things.
You're comparing dates in your Where-Object filter, so you don't need to transform your reference dates to strings. Just use dates:
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
You can use a regular expression to extract the date from the file name and transform it into the desired format:
$filedate = $file.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Your regular expression for matching the time is overly correct. Use ^(\d{2}:\d{2}:\d{2}) instead. It's a little sloppier, but it will most likely suffice and is a lot easier on the eye.
To prepend the time-match with the date, use "$filedate `$1". The double quotes will cause $filedate to be expanded to the date from the file name, and the escaped $ (``$1`) will keep the grouped match (see Richard's explanation).
While you can assign the results from each step to variables, it'd be simpler to just use a single pipeline.
Try this:
$Logpath = "o:\Log"
$Logfile = "$Logpath\test.txt"
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
If (Test-Path -LiteralPath $Logfile) { Remove-Item $Logfile }
Get-ChildItem "$LogPath\*.log" | ? {
-not $_.PSIsContainer -and
$_.LastWriteTime -ge $prevcheck -and
$_.LastWriteTime -le $currenttd
} | % {
$filedate = $_.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Get-Content $_ | % {
$_ -replace '^(\d{2}:\d{2}:\d{2})', "$filedate `$1"
} | Out-File -Append $Logfile
}
In PowerShell strings have to be in double quotes (") for variable substitution. Single quoted (') strings do not perform variable substitution.
In your script (in which I suggest you indent the content of code blocks to make the structure easier to follow):
$replace_regex = '"$filedate" $_'
where the string is single quoted, so no variable substitution. This can be fixed by remembering the back-quote (`) character can be used to escape double quotes embedded in a double quoted string:
$replace_regex = "`"$filedate`" $_"
But remember:
$ is a regex meta-character, so if you want to include a $ in a regex in double quotes it will need to be escaped to avoid PSH treating it as the start of the variable name.
Any regex meta-characters in the variable will have their regex meaning. Consider escaping the content of the variable before substitution ([regex]::Escape(string)).

Powershell: Leave item alone if regex doesn't match

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.
Example:
$f = #("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")
I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".
So far I have this:
$d = $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d
The output:
testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf
It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.
Thanks in advance.
You can simply do:
PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf
$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"
Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".
As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).
Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.
Thats why you get this long string with every character surrounded by DATESTAMP.
To check if there even exists a \d{4,} in your string you should able to use
[regex]::IsMatch($_, "\d{4,}")
I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed
$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}