Select all backslashes between two chars

Select all backslashes between two chars - regex

I am working on a powershell script and I've got several text files where I need to replace backslashes in lines which matches this pattern: .. >\\%name% .. < .. (.. could be anything)
Example string from one of the files where the backslashes should match:
<Tag>\\%name%\TST$\Program\1.0\000\Program.msi</Tag>
Example string from one of the files where the backslashes should not match:
<Tag>/i /L*V "%TST%\filename.log" /quiet /norestart</Tag>
So far I've managed to select every char between >\\%name% and < with this expression (Regex101):
(?<=>\\\\%name%)(.*)(?=<)
but I failed to select only the backslashes.
Is there a solution which I could not yet find?

I'd recommend selecting the relevant tags with an XPath expression and then do the replacement on the text body of the selected nodes.
$xml.SelectNodes('//Tag[substring(., 1, 8) = "\\%name%"]' | ForEach-Object {
$_.'#text' = $_.'#text' -replace '\\', '\\'
}

So here's my solution:
$original_file = $Filepath
$destination_file = $Filepath + ".new"
Get-Content -Path $original_file | ForEach-Object {
$line = $_
if ($line -match '(?<=>\\\\%name%)(.*)(?=<)'){
$line = $line -replace '\\','/'
}
$line
} | Set-Content -Path $destination_file
Remove-Item $original_file
Rename-Item $destination_file.ToString() $original_file.ToString()
So this will replace every \ with an / in the given pattern but not in the way which my question was about.

Related

Get-Content using Regex on the filename + content

I want to print the ouput of a file matching a regex (Get-Content), with the concern that I'm looking (Get-ChildItem) the File using Regex too.
-File example:
ITOPS_Log [2022-06-18].txt
-File content:QQQ-9999999-QQQ
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content $file.FullName)
ForEach ($line in $content) {
Write-Output "$line"
}
}
Output:
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 22/06/2022 16:11 64 ITOPS_Log [2022-06-18].txt
Get-Content : An object at the specified path C:\Users\Eddy\Desktop\ITOPS_Log [2022-06-18].txt does not exist, or has been filtered by the -Include or -Exclude parameter.
At C:\Users\Eddy\Desktop\Itops_Log_test_V2.ps1:15 char:25
+ $content = (Get-Content $file.FullName)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (System.String[]:String[]) [Get-Content], Exception
+ FullyQualifiedErrorId : ItemNotFound,Microsoft.PowerShell.Commands.GetContentCommand
I know I'm not using Regex to match with the content if only want the number. But I got stuck already without filters and I need to solve this first and then apply regex to match the digits.
How can I print the ouput of the File with this code? I don't understand what I'm missing.

Continuing from my comment, The -Path parameter on Get-ChildItem and Get-Content tries to resolve wildcard characters and because your file has square brackets, it sees that as a range of characters or numbers.
To avoid that, use -LiteralPath instead so nothing in the path gets interpreted.
Then to test if the file has something resembling a date inside those square brackets, I would use an anchored regex on the file's BaseName property:
#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem -LiteralPath $folder -Filter 'ITOPS_Log*.txt' -File |
Where-Object { $_.BaseName -match '\[\d{4}-\d{2}-\d{2}\]$' }
# show the found files on screen
$valid_files
#Read the file and print content.
foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $file.FullName)
foreach ($line in $content) {
Write-Host $line
# or just the number?
Write-Host ([regex]'(\d+)').Match($line).Groups[1].Value
}
}
Regex details on the file's BaseName (--> File Name without extension):
\[ Match the character “[” literally
\d Match a single digit 0..9
{4} Exactly 4 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
- Match the character “-” literally
\d Match a single digit 0..9
{2} Exactly 2 times
\] Match the character “]” literally
$ Assert position at the end of the string (or before the line break at the end of the string, if any)

#Find the File using Regex:
$folder = "C:\Users\Eddy\Desktop"
$valid_files = Get-ChildItem $folder| Where-Object { $_.Name -match 'ITOPS_Log.\[\d{4}-\d{2}-\d{2}\].txt' }
Write-Output $valid_files
Write-Output $valid_files
#Read the file and print content.
Foreach ($file in $valid_files) {
$content = (Get-Content -LiteralPath $folder\$file)
ForEach ($line in $content) {
Write-Output "$line"
}
}

Replacing any content inbetween second and third underscore

I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":
get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}
Examples:
12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf
This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:
12345_09_MoreText.pdf
The desired output would be:
12345_00001_09_2018_Text_MoreText.pdf
12345_00002_09_2018_Text_MoreText.pdf
12345_00003_09_2018_Text_MoreText.pdf
How do I correctly replace the second and third underscore and everything inbetween with an "_"?

If you don't want to use regex -
$files = get-childitem *.pdf #get all pdf files
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
Rename-Item -Path $file.FullName -NewName $New
}
Sample Data -
$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = #() #declaring two arrays
foreach($file in $files)
{
$ModifiedFiles = $file.split("_")
$ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] } #ommitting anything between second and third underscore
$New = "$ModifiedFiles" -replace (" ", "_")
}

You may use
-replace '^((?:[^_]*_){2})[^_]+_', '$1'
See the regex demo
Details
^ - start of the line
((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
[^_]* - 0+ chars other than an underscore
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore

To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:
Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
$_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
-join '_' reassembles the modified array into a _-separated string, yielding the desired result.
Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.
As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:
Get-ChildItem *.pdf |
Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf
Wouldn't it be nice if we could write [0..1 + 3..] instead?
This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.

here's one other way ... using string methods.
'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
Split('_').
Where({
$_ -notmatch ','
}) -join '_'
result = 12345_00003_09_2018_Text_MoreText.pdf
that does the following ...
split on the underscores
toss out any item that has a comma in it
join the remaining items back into a string with underscores
i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]

Extract certain values from string in .txt files with PowerShell

Im trying to extract certain values from multiple lines inside a .txt file with PowerShell. Im currently using multiple replace and remove cmd's but it doesn't work as expected and is a bit too complex.
Is there a more simple way to do this?
My script:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$result1 = $file.replace(" <<< [NAK]#99","")
$result2 = $result1.remove(0,3) #this only works for the first line for some reason...
$result3 = $result2.replace("\(([^\)]+)\)", "") #this should remove the string within paranthesis but doesn't work
.txt file:
29 09:10:16.874 (0133563471) <<< [NAK]#99[CAR]0998006798[CAR]
29 09:10:57.048 (0133603644) <<< [NAK]#99[CAR]0998019022[CAR]
29 09:59:56.276 (0136542798) <<< [NAK]#99[CAR]0998016987[CAR]
29 10:05:36.728 (0136883233) <<< [NAK]#99[CAR]0998050310[CAR]
29 10:55:36.792 (0139883179) <<< [NAK]#99[CAR]099805241D[CAR]0998028452[CAR]
29 11:32:16.737 (0142083132) <<< [NAK]#99[CAR]0998050289[CAR]0998031483[CAR]
29 11:34:16.170 (0142202566) <<< [NAK]#99[CAR]0998034787[CAR]
29 12:01:56.317 (0143862644) <<< [NAK]#99[CAR]0998005147[CAR]
The output i expect:
09:10:16.874 [CAR]0998006798[CAR]
09:10:57.048 [CAR]0998019022[CAR]
09:59:56.276 [CAR]0998016987[CAR]
10:05:36.728 [CAR]0998050310[CAR]
10:55:36.792 [CAR]099805241D[CAR]0998028452[CAR]
11:32:16.737 [CAR]0998050289[CAR]0998031483[CAR]
11:34:16.170 [CAR]0998034787[CAR]
12:01:56.317 [CAR]0998005147[CAR]

or more simple:
$Array = #()
foreach ($line in $file)
{
$Array += $line -replace '^..\s' -replace '\s\(.*\)' -replace '<<<.*#\d+'
}
$Array

Another option is to just grab the parts of a line you need with one regex and concat them:
$input_path = 'c:\data\in.txt'
$output_file = 'c:\data\out.txt'
$regex = '(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { [string]::Format("{0} {1}", $_.Groups[1].Value, $_.Groups[2].Value) } > $output_file
The regex is
(\d+(?::\d+)+\.\d+).*?\[NAK]#99(.*)
See the regex demo
Details:
(\d+(?::\d+)+\.\d+) - Group 1: one or more digits followed with 1+ sequences of : and one or more digits, then . and again 1+ digits
.*?\[NAK]#99 - any 0+ chars other than newline as few as possible up to the first [NAK]#99 literal char sequence
(.*) - Group 2: the rest of the line
After we get all matches, the $_.Groups[1].Value concatenated with $_.Groups[2].Value yield the expected output.

Multiple issues.
Inside the loop you reference $file rather than $line. In the last operation, you're using the String.Replace() method with a regex pattern - something that method doesn't understand - use the -replace operator instead:
$file = Get-Content "C:\RS232_COM2*"
foreach($line in $file){
$line = $line.Replace(" <<< [NAK]#99","")
$line = $line.Remove(0,3)
# now use the -replace operator and output the result
$line -replace "\(([^\)]+)\)",""
}
You could do it all in one regular expression replacement:
$line -replace '\(\d{10}\)\ <<<\s+\[NAK]\#99',''

regex in powershell - not change three characters before text

Is there any easy way to do this?
input: 123215-85_01_test
expected output: 01_test
Another example
input: 12154_02_test
expected output: 02_test
There will be always string "test", but different numbering before
for example this code..
$path = "c:\tmp\*.sql"
get-childitem $path | forEach-object {
$name = $_.Name
$result = $name -replace "","" # I don't know how write this regex..
$extension = $_.Extension
$newName = $prefix+"_"+ $result -f, $extension
Rename-Item -Path $_.FullName -NewName $newName
}

There are two ways you go go at this. Simple split and join or you can use one of many regexes....
Split on underscore and rejoin last 2 elements
$split = "123215-85_01_test" -split "_"
$split[-2..-1] -join "_" # $split[-2,-1] would also work.
Regex to locate the data between the last underscores
"123215-85_01_test" -replace "^.*_(\d+)_(.*)$", '$1_$2'
Note this fails if there is more than 2 underscores.

Multiline Regex in PowerShell

I have this PowerShell script that's main purpose is to search through HTML files within a folder, find specific HTML markup, and replace with what I tell it to.
I have been able to do 3/4 of my find and replaces perfectly. The one I am having trouble with involves a Regular Expression.
This is the markup that I am trying to make my regex find and replace:
<a href="programsactivities_skating.html"><br />
</a>
Here is the regex I have so far, along with the function I am using it in:
automate -school "C:\Users\$env:username\Desktop\schools\$question" -query '(?mis)(?!exclude1|exclude2|exclude3)(<a[^>]*?>(\s| |<br\s?/?>)*</a>)' -replace ''
And here is the automate function:
function automate($school, $query, $replace) {
$processFiles = Get-ChildItem -Exclude *.bak -Include "*.html", "*.HTML", "*.htm", "*.HTM" -Recurse -Path $school
foreach ($file in $processFiles) {
$text = Get-Content $file
$text = $text -replace $query, $replace
$text | Out-File $file -Force -Encoding utf8
}
}
I have been trying to figure out the solution to this for about 2 days now, and just can't seem to get it to work. I have determined that problem is that I need to tell my regex to account for Multiline, and that's what I'm having trouble with.
Any help anyone can provide is greatly appreciate.
Thanks in Advance.

Get-Content produces an array of strings, where each string contains a single line from your input file, so you won't be able to match text passages spanning more than one line. You need to merge the array into a single string if you want to be able to match more than one line:
$text = Get-Content $file | Out-String
or
[String]$text = Get-Content $file
or
$text = [IO.File]::ReadAllText($file)
Note that the 1st and 2nd method don't preserve line breaks from the input file. Method 2 simply mangles all line breaks, as Keith pointed out in the comments, and method 1 puts <CR><LF> at the end of each line when joining the array. The latter may be an issue when dealing with Linux/Unix or Mac files.

I don't get what it is you're trying to do with those Exclude elements, but I find multi-line regex is usually easier to construct in a here-string:
$text = #'
<a href="programsactivities_skating.html"><br />
</a>
'#
$regex = #'
(?mis)<a href="programsactivities_skating.html"><br />
\s+?</a>
'#
$text -match $regex
True

Get-Content will return an array of strings, you want to concatenate the strings in question to create one:
function automate($school, $query, $replace) {
$processFiles = Get-ChildItem -Exclude *.bak -Include "*.html", "*.HTML", "*.htm", "*.HTM" -Recurse -Path $school
foreach ($file in $processFiles) {
$text = ""
$text = Get-Content $file | % { $text += $_ +"`r`n" }
$text = $text -replace $query, $replace
$text | Out-File $file -Force -Encoding utf8
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Select all backslashes between two chars - regex

I'd recommend selecting the relevant tags with an XPath expression and then do the replacement on the text body of the selected nodes. $xml.SelectNodes('//Tag[substring(., 1, 8) = "\\%name%"]' | ForEach-Object { $_.'#text' = $_.'#text' -replace '\\', '\\' }

Related

Get-Content using Regex on the filename + content

Replacing any content inbetween second and third underscore

Extract certain values from string in .txt files with PowerShell

regex in powershell - not change three characters before text

Multiline Regex in PowerShell

Categories

Resources