Get-Content using Regex on the filename + content - regex

I want to print the ouput of a file matching a regex (Get-Content), with the concern that I'm looking (Get-ChildItem) the File using Regex too.
-File example:
ITOPS_Log [2022-06-18].txt
-File content:QQQ-9999999-QQQ
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content $file.FullName)
ForEach ($line in $content) {
Write-Output "$line"
}
}
Output:
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 22/06/2022 16:11 64 ITOPS_Log [2022-06-18].txt
Get-Content : An object at the specified path C:\Users\Eddy\Desktop\ITOPS_Log [2022-06-18].txt does not exist, or has been filtered by the -Include or -Exclude parameter.
At C:\Users\Eddy\Desktop\Itops_Log_test_V2.ps1:15 char:25
+ $content = (Get-Content $file.FullName)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (System.String[]:String[]) [Get-Content], Exception
+ FullyQualifiedErrorId : ItemNotFound,Microsoft.PowerShell.Commands.GetContentCommand
I know I'm not using Regex to match with the content if only want the number. But I got stuck already without filters and I need to solve this first and then apply regex to match the digits.
How can I print the ouput of the File with this code? I don't understand what I'm missing.

Continuing from my comment, The -Path parameter on Get-ChildItem and Get-Content tries to resolve wildcard characters and because your file has square brackets, it sees that as a range of characters or numbers.
To avoid that, use -LiteralPath instead so nothing in the path gets interpreted.
Then to test if the file has something resembling a date inside those square brackets, I would use an anchored regex on the file's BaseName property:
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem -LiteralPath $folder -Filter 'ITOPS_Log*.txt' -File |
Where-Object { $_.BaseName -match '\[\d{4}-\d{2}-\d{2}\]$' }
# show the found files on screen
$valid_files
#Read the file and print content.
foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $file.FullName)
foreach ($line in $content) {
Write-Host $line
# or just the number?
Write-Host ([regex]'(\d+)').Match($line).Groups[1].Value
}
}
Regex details on the file's BaseName (--> File Name without extension):
\[ Match the character “[” literally
\d Match a single digit 0..9
{4} Exactly 4 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
\] Match the character “]” literally
$ Assert position at the end of the string (or before the line break at the end of the string, if any)

#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $folder\$file)
ForEach ($line in $content) {
Write-Output "$line"
}
}

Related

Powershell - Find and Replace then Save

I need to read 10K+ files, search the files line by line, for the string of characters after the word SUFFIX. Once I capture that string I need to remove all traces of it from the file then re-save the file.
With the example below - I would capture -4541. Then I would replace all occurrences of -4541 with NULL.
Once I replace all the occurrences I then save the changes.
Here is my Data:
ABSDOMN VER 1 D SUFFIX -4541
05 ST-CTY-CDE-FMHA-4541
10 ST-CDE-FMHA-4541 9(2)
10 CTY-CDE-FMHA-4541 9(3)
05 NME-CTY-4541 X(20)
05 LST-UPDTE-DTE-4541 9(06)
05 FILLER X
Here is a starting script. I can Display the line that has the word SUFFIX but I cannot capture the string after it. In this case -4541.
$CBLFileList = Get-ChildItem -Path "C:\IDMS" -File -Recurse
$regex = "\bSUFFIX\b"
$treat = $false
ForEach($CBLFile in $CBLFileList) {
Write-Host "Processing .... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match $regex) {
Write-Host "Found Match - $_" -foregroundcolor green
$treat=$true
}
}
Try the following:
Note: Be sure to make backup copies of the input files first, as they will be updated in place. Use -Encoding with Set-Content to specify the desired encoding, if it should be different from Set-Content's default.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '(?<=SUFFIX) -\d+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
& { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
Set-Content -LiteralPath $CBLFile.FullName
}
}
Based on your feedback, the regex that worked for you in the end was (?<=SUFFIX).*$ (which could be simplified to (?<=SUFFIX).+ in this case), i.e. one that captures whatever follows substring SUFFIX, instead of only capturing a space followed by a - and one or more digits (\d+).

Replacing any content inbetween second and third underscore

I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":
get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}
Examples:
12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf
This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:
12345_09_MoreText.pdf
The desired output would be:
12345_00001_09_2018_Text_MoreText.pdf
12345_00002_09_2018_Text_MoreText.pdf
12345_00003_09_2018_Text_MoreText.pdf
How do I correctly replace the second and third underscore and everything inbetween with an "_"?
If you don't want to use regex -
$files = get-childitem *.pdf #get all pdf files
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
Rename-Item -Path $file.FullName -NewName $New
}
Sample Data -
$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
}
You may use
-replace '^((?:[^_]*_){2})[^_]+_', '$1'
See the regex demo
Details
^ - start of the line
((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
[^_]* - 0+ chars other than an underscore
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore
To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:
Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
$_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
-join '_' reassembles the modified array into a _-separated string, yielding the desired result.
Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.
As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:
Get-ChildItem *.pdf |
Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf
Wouldn't it be nice if we could write [0..1 + 3..] instead?
This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.
here's one other way ... using string methods.
'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
Split('_').
Where({
$_ -notmatch ','
}) -join '_'
result = 12345_00003_09_2018_Text_MoreText.pdf
that does the following ...
split on the underscores
toss out any item that has a comma in it
join the remaining items back into a string with underscores
i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]

Select all backslashes between two chars

I am working on a powershell script and I've got several text files where I need to replace backslashes in lines which matches this pattern: .. >\\%name% .. < .. (.. could be anything)
Example string from one of the files where the backslashes should match:
<Tag>\\%name%\TST$\Program\1.0\000\Program.msi</Tag>
Example string from one of the files where the backslashes should not match:
<Tag>/i /L*V "%TST%\filename.log" /quiet /norestart</Tag>
So far I've managed to select every char between >\\%name% and < with this expression (Regex101):
(?<=>\\\\%name%)(.*)(?=<)
but I failed to select only the backslashes.
Is there a solution which I could not yet find?
I'd recommend selecting the relevant tags with an XPath expression and then do the replacement on the text body of the selected nodes.
$xml.SelectNodes('//Tag[substring(., 1, 8) = "\\%name%"]' | ForEach-Object {
$_.'#text' = $_.'#text' -replace '\\', '\\'
}
So here's my solution:
$original_file = $Filepath
$destination_file = $Filepath + ".new"
Get-Content -Path $original_file | ForEach-Object {
$line = $_
if ($line -match '(?<=>\\\\%name%)(.*)(?=<)'){
$line = $line -replace '\\','/'
}
$line
} | Set-Content -Path $destination_file
Remove-Item $original_file
Rename-Item $destination_file.ToString() $original_file.ToString()
So this will replace every \ with an / in the given pattern but not in the way which my question was about.

Extract certain values from string in .txt files with PowerShell

Im trying to extract certain values from multiple lines inside a .txt file with PowerShell. Im currently using multiple replace and remove cmd's but it doesn't work as expected and is a bit too complex.
Is there a more simple way to do this?
My script:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$result1 = $file.replace(" <<< [NAK]#99","")
$result2 = $result1.remove(0,3) #this only works for the first line for some reason...
$result3 = $result2.replace("\(([^\)]+)\)", "") #this should remove the string within paranthesis but doesn't work
.txt file:
29 09:10:16.874 (0133563471) <<< [NAK]#99[CAR]0998006798[CAR]
29 09:10:57.048 (0133603644) <<< [NAK]#99[CAR]0998019022[CAR]
29 09:59:56.276 (0136542798) <<< [NAK]#99[CAR]0998016987[CAR]
29 10:05:36.728 (0136883233) <<< [NAK]#99[CAR]0998050310[CAR]
29 10:55:36.792 (0139883179) <<< [NAK]#99[CAR]099805241D[CAR]0998028452[CAR]
29 11:32:16.737 (0142083132) <<< [NAK]#99[CAR]0998050289[CAR]0998031483[CAR]
29 11:34:16.170 (0142202566) <<< [NAK]#99[CAR]0998034787[CAR]
29 12:01:56.317 (0143862644) <<< [NAK]#99[CAR]0998005147[CAR]
The output i expect:
09:10:16.874 [CAR]0998006798[CAR]
09:10:57.048 [CAR]0998019022[CAR]
09:59:56.276 [CAR]0998016987[CAR]
10:05:36.728 [CAR]0998050310[CAR]
10:55:36.792 [CAR]099805241D[CAR]0998028452[CAR]
11:32:16.737 [CAR]0998050289[CAR]0998031483[CAR]
11:34:16.170 [CAR]0998034787[CAR]
12:01:56.317 [CAR]0998005147[CAR]
or more simple:
$Array = #()
foreach ($line in $file)
{
$Array += $line -replace '^..\s' -replace '\s\(.*\)' -replace '<<<.*#\d+'
}
$Array
Another option is to just grab the parts of a line you need with one regex and concat them:
$input_path = 'c:\data\in.txt'
$output_file = 'c:\data\out.txt'
$regex = '(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { [string]::Format("{0} {1}", $_.Groups[1].Value, $_.Groups[2].Value) } > $output_file
The regex is
(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)
See the regex demo
Details:
(\d+(?::\d+)+\.\d+) - Group 1: one or more digits followed with 1+ sequences of : and one or more digits, then . and again 1+ digits
.*?\[NAK]#99 - any 0+ chars other than newline as few as possible up to the first [NAK]#99 literal char sequence
(.*) - Group 2: the rest of the line
After we get all matches, the $_.Groups[1].Value concatenated with $_.Groups[2].Value yield the expected output.
Multiple issues.
Inside the loop you reference $file rather than $line. In the last operation, you're using the String.Replace() method with a regex pattern - something that method doesn't understand - use the -replace operator instead:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$line = $line.Replace(" <<< [NAK]#99","")
$line = $line.Remove(0,3)
# now use the -replace operator and output the result
$line -replace "\(([^\)]+)\)",""
}
You could do it all in one regular expression replacement:
$line -replace '\(\d{10}\)\ <<<\s+\[NAK]\#99',''

regex in powershell - not change three characters before text

Is there any easy way to do this?
input: 123215-85_01_test
expected output: 01_test
Another example
input: 12154_02_test
expected output: 02_test
There will be always string "test", but different numbering before
for example this code..
$path = "c:\tmp\*.sql"
get-childitem $path | forEach-object {
$name = $_.Name
$result = $name -replace "","" # I don't know how write this regex..
$extension = $_.Extension
$newName = $prefix+"_"+ $result -f, $extension
Rename-Item -Path $_.FullName -NewName $newName
}
There are two ways you go go at this. Simple split and join or you can use one of many regexes....
Split on underscore and rejoin last 2 elements
$split = "123215-85_01_test" -split "_"
$split[-2..-1] -join "_" # $split[-2,-1] would also work.
Regex to locate the data between the last underscores
"123215-85_01_test" -replace "^.*_(\d+)_(.*)$", '$1_$2'
Note this fails if there is more than 2 underscores.