Extract certain values from string in .txt files with PowerShell - regex

Im trying to extract certain values from multiple lines inside a .txt file with PowerShell. Im currently using multiple replace and remove cmd's but it doesn't work as expected and is a bit too complex.
Is there a more simple way to do this?
My script:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$result1 = $file.replace(" <<< [NAK]#99","")
$result2 = $result1.remove(0,3) #this only works for the first line for some reason...
$result3 = $result2.replace("\(([^\)]+)\)", "") #this should remove the string within paranthesis but doesn't work
.txt file:
29 09:10:16.874 (0133563471) <<< [NAK]#99[CAR]0998006798[CAR]
29 09:10:57.048 (0133603644) <<< [NAK]#99[CAR]0998019022[CAR]
29 09:59:56.276 (0136542798) <<< [NAK]#99[CAR]0998016987[CAR]
29 10:05:36.728 (0136883233) <<< [NAK]#99[CAR]0998050310[CAR]
29 10:55:36.792 (0139883179) <<< [NAK]#99[CAR]099805241D[CAR]0998028452[CAR]
29 11:32:16.737 (0142083132) <<< [NAK]#99[CAR]0998050289[CAR]0998031483[CAR]
29 11:34:16.170 (0142202566) <<< [NAK]#99[CAR]0998034787[CAR]
29 12:01:56.317 (0143862644) <<< [NAK]#99[CAR]0998005147[CAR]
The output i expect:
09:10:16.874 [CAR]0998006798[CAR]
09:10:57.048 [CAR]0998019022[CAR]
09:59:56.276 [CAR]0998016987[CAR]
10:05:36.728 [CAR]0998050310[CAR]
10:55:36.792 [CAR]099805241D[CAR]0998028452[CAR]
11:32:16.737 [CAR]0998050289[CAR]0998031483[CAR]
11:34:16.170 [CAR]0998034787[CAR]
12:01:56.317 [CAR]0998005147[CAR]

or more simple:
$Array = #()
foreach ($line in $file)
{
$Array += $line -replace '^..\s' -replace '\s\(.*\)' -replace '<<<.*#\d+'
}
$Array

Another option is to just grab the parts of a line you need with one regex and concat them:
$input_path = 'c:\data\in.txt'
$output_file = 'c:\data\out.txt'
$regex = '(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { [string]::Format("{0} {1}", $_.Groups[1].Value, $_.Groups[2].Value) } > $output_file
The regex is
(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)
See the regex demo
Details:
(\d+(?::\d+)+\.\d+) - Group 1: one or more digits followed with 1+ sequences of : and one or more digits, then . and again 1+ digits
.*?\[NAK]#99 - any 0+ chars other than newline as few as possible up to the first [NAK]#99 literal char sequence
(.*) - Group 2: the rest of the line
After we get all matches, the $_.Groups[1].Value concatenated with $_.Groups[2].Value yield the expected output.

Multiple issues.
Inside the loop you reference $file rather than $line. In the last operation, you're using the String.Replace() method with a regex pattern - something that method doesn't understand - use the -replace operator instead:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$line = $line.Replace(" <<< [NAK]#99","")
$line = $line.Remove(0,3)
# now use the -replace operator and output the result
$line -replace "\(([^\)]+)\)",""
}
You could do it all in one regular expression replacement:
$line -replace '\(\d{10}\)\ <<<\s+\[NAK]\#99',''

Related

Get-Content using Regex on the filename + content

I want to print the ouput of a file matching a regex (Get-Content), with the concern that I'm looking (Get-ChildItem) the File using Regex too.
-File example:
ITOPS_Log [2022-06-18].txt
-File content:QQQ-9999999-QQQ
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content $file.FullName)
ForEach ($line in $content) {
Write-Output "$line"
}
}
Output:
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 22/06/2022 16:11 64 ITOPS_Log [2022-06-18].txt
Get-Content : An object at the specified path C:\Users\Eddy\Desktop\ITOPS_Log [2022-06-18].txt does not exist, or has been filtered by the -Include or -Exclude parameter.
At C:\Users\Eddy\Desktop\Itops_Log_test_V2.ps1:15 char:25
+ $content = (Get-Content $file.FullName)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (System.String[]:String[]) [Get-Content], Exception
+ FullyQualifiedErrorId : ItemNotFound,Microsoft.PowerShell.Commands.GetContentCommand
I know I'm not using Regex to match with the content if only want the number. But I got stuck already without filters and I need to solve this first and then apply regex to match the digits.
How can I print the ouput of the File with this code? I don't understand what I'm missing.
Continuing from my comment, The -Path parameter on Get-ChildItem and Get-Content tries to resolve wildcard characters and because your file has square brackets, it sees that as a range of characters or numbers.
To avoid that, use -LiteralPath instead so nothing in the path gets interpreted.
Then to test if the file has something resembling a date inside those square brackets, I would use an anchored regex on the file's BaseName property:
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem -LiteralPath $folder -Filter 'ITOPS_Log*.txt' -File |
Where-Object { $_.BaseName -match '\[\d{4}-\d{2}-\d{2}\]$' }
# show the found files on screen
$valid_files
#Read the file and print content.
foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $file.FullName)
foreach ($line in $content) {
Write-Host $line
# or just the number?
Write-Host ([regex]'(\d+)').Match($line).Groups[1].Value
}
}
Regex details on the file's BaseName (--> File Name without extension):
\[ Match the character “[” literally
\d Match a single digit 0..9
{4} Exactly 4 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
\] Match the character “]” literally
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $folder\$file)
ForEach ($line in $content) {
Write-Output "$line"
}
}

Negative Lookbehind Works in Editor But Not in Powershell Script

Using the following. I am attempting to replace spaces with comma-space for all instances in a string. While avoiding repeating commas already present in the string.
Test string:
'186 ATKINS, Cindy Maria 25 Every Street Smalltown, Student'
Using the following code:
Get-Content -Path $filePath |
ForEach-Object {
$match = ($_ | Select-String $regexPlus).Matches.Value
$c = ($_ | Get-Content)
$c = $c -replace $match,', '
$c
}
The output is:
'186, ATKINS,, Cindy, Maria, 25, Every, Street, Smalltown,, Student'
My $regexPlus value is:
$regexPlus = '(?s)(?<!,)\s'
I have tested the negative lookbehind assertion in my editor and it works. Why does it not work in this Powershell script? The regex 101 online editor produces this curious mention of case sensitivity:
Negative Lookbehind (?<!,)
Assert that the Regex below does not match
, matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
I have tried editing to:
$match = ($_ | Select-String $regexPlus -CaseSensitive).Matches.Value
But still not working. Any ideas are welcome.
Part of the problem here is that you are trying to force through the regex to do the replacement, when, like #WiktorStribiżew mentions, simply use -replace like it's supposed to be used. i.e. -replace does all the hard work for you.
When you do this:
$match = ($_ | Select-String $regexPlus).Matches.Value
You are right, you are trying to find Regex matches. Congratulations! It found a space character, but when you do this:
$c = $c -replace $match,', '
It interprets $match as a space character like this:
$c = $c -replace ' ',', '
And not as a regular expression that you might have been expecting. That's why it's not seeing the negative lookbehind for the commas, because all it is searching for are spaces, and it is dutifully replacing all the spaces with comma spaces.
The solution is simple in that, all you have to do is simply use the Regex text in the -replace string:
$regexPlus = '(?s)(?<!,)\s'
$c = $c -replace $regexPlus,', '
e.g. The negative lookbehind working as advertised:
PS C:> $str = '186 ATKINS, Cindy Maria 25 Every Street Smalltown, Student'
PS C:> $regexPlus = '(?s)(?<!,)\s'
PS C:> $str -replace $regexPlus,', '
186, ATKINS, Cindy, Maria, 25, Every, Street, Smalltown, Student
You can use
(Get-Content -Path $filePath) -replace ',*\s+', ', '
This code replaces zero or more commas and all one or more whitespaces after them with a single comma + space.
See the regex demo.
More details:
,* - zero or more commas
\s+ - one or more whitespace chars.

Powershell - Find and Replace then Save

I need to read 10K+ files, search the files line by line, for the string of characters after the word SUFFIX. Once I capture that string I need to remove all traces of it from the file then re-save the file.
With the example below - I would capture -4541. Then I would replace all occurrences of -4541 with NULL.
Once I replace all the occurrences I then save the changes.
Here is my Data:
ABSDOMN VER 1 D SUFFIX -4541
05 ST-CTY-CDE-FMHA-4541
10 ST-CDE-FMHA-4541 9(2)
10 CTY-CDE-FMHA-4541 9(3)
05 NME-CTY-4541 X(20)
05 LST-UPDTE-DTE-4541 9(06)
05 FILLER X
Here is a starting script. I can Display the line that has the word SUFFIX but I cannot capture the string after it. In this case -4541.
$CBLFileList = Get-ChildItem -Path "C:\IDMS" -File -Recurse
$regex = "\bSUFFIX\b"
$treat = $false
ForEach($CBLFile in $CBLFileList) {
Write-Host "Processing .... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match $regex) {
Write-Host "Found Match - $_" -foregroundcolor green
$treat=$true
}
}
Try the following:
Note: Be sure to make backup copies of the input files first, as they will be updated in place. Use -Encoding with Set-Content to specify the desired encoding, if it should be different from Set-Content's default.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '(?<=SUFFIX) -\d+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
& { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
Set-Content -LiteralPath $CBLFile.FullName
}
}
Based on your feedback, the regex that worked for you in the end was (?<=SUFFIX).*$ (which could be simplified to (?<=SUFFIX).+ in this case), i.e. one that captures whatever follows substring SUFFIX, instead of only capturing a space followed by a - and one or more digits (\d+).

Select all backslashes between two chars

I am working on a powershell script and I've got several text files where I need to replace backslashes in lines which matches this pattern: .. >\\%name% .. < .. (.. could be anything)
Example string from one of the files where the backslashes should match:
<Tag>\\%name%\TST$\Program\1.0\000\Program.msi</Tag>
Example string from one of the files where the backslashes should not match:
<Tag>/i /L*V "%TST%\filename.log" /quiet /norestart</Tag>
So far I've managed to select every char between >\\%name% and < with this expression (Regex101):
(?<=>\\\\%name%)(.*)(?=<)
but I failed to select only the backslashes.
Is there a solution which I could not yet find?
I'd recommend selecting the relevant tags with an XPath expression and then do the replacement on the text body of the selected nodes.
$xml.SelectNodes('//Tag[substring(., 1, 8) = "\\%name%"]' | ForEach-Object {
$_.'#text' = $_.'#text' -replace '\\', '\\'
}
So here's my solution:
$original_file = $Filepath
$destination_file = $Filepath + ".new"
Get-Content -Path $original_file | ForEach-Object {
$line = $_
if ($line -match '(?<=>\\\\%name%)(.*)(?=<)'){
$line = $line -replace '\\','/'
}
$line
} | Set-Content -Path $destination_file
Remove-Item $original_file
Rename-Item $destination_file.ToString() $original_file.ToString()
So this will replace every \ with an / in the given pattern but not in the way which my question was about.

RegEx Match whole line with first occurrence from the bottom of the file, upwards

I'm trying to parse a file with error codes.
I would only like the first occurrence from the bottom of the file to be returned.
So far, I've got this regex searching for the error code numbers, and it returns the whole line with the Multiline option, but it returns all lines in the file, not just the last one.
^.*?\b(639|640|460|458|664|148)\b.*$
I'm using powershell, so if you have an example using powershell - that would be great.
Thank you.
Assuming your regex is correct for matching on a line then you should be able to do something like this:
$pattern = '^.*?\b(639|640|460|458|664|148)\b.*$'
$content = Get-Content c:\somefile.txt
for ($i = $content.Length - 1; $i -ge 0; $i--) {
if ($content[$i] -match $pattern) {
$matches[1]
break
}
}
I'd use Select-String for this:
$filename = 'C:\path\to\input.txt'
$pattern = '\b(639|640|460|458|664|148)\b'
Get-Content $filename | Select-String $pattern | select -Last 1