How replace text and endline in multiple file in powershell? - regex

i have multiple files like this
TEST:200333
75252
TEST:198234
201756
TEST:201616
274
TEST:200118
934521
TEST:123456
1234
and I want an output like this
200333;75252
198234;201756
201616;274
200118;934521
123456;1234
I tried this code but it doesn't work:
powershell -Command "(gc myFile.txt) -replace 'TEST:(\.+)\r\n(\.+)\r', '\1;\2' | Out-File -encoding ASCII mynewFile.txt"

You can use
powershell -Command "(gc myFile.txt -Raw) -replace '(?m)^TEST:(\d+)\r?\n(\d+)\r?$', '$1;$2' | Out-File -encoding ASCII mynewFile.txt"
Or,
powershell -Command "[system.io.file]::ReadAllText('myFile.txt') -replace '(?m)^TEST:(\d+)\r?\n(\d+)\r?$', '$1;$2' | Out-File -encoding ASCII mynewFile.txt"
See the regex demo. Note the use of -Raw option that slurps the whole file into a single string.
The regex matches
(?m) - multiline mode on
^ - line start
TEST: - some fixed text
(\d+) - Group 1: one or more digits
\r?\n - a CRLF (carriage return + line feed)/LF (line feed) line ending
(\d+) - Group 2: one or more digits
\r? - an optional CR (carriage return)
$ - end of a line.

A no-regex approach that just concatenates every other line:
powershell -Command "gc myFile.txt | % {$i=0}{if($i++%2){"$prev;$_"}else{$prev=$_.Substring(5)}} | sc mynewFile.txt -enc ASCII"
Less code-golfy version:
Get-Content myFile.txt | ForEach-Object -Begin {$i = 0} {
if ($i++ % 2 -ne 0) { # odd line
"$prev;$_" # output "value from previous line;current line"
} else { # even line
$prev = $_.Substring(5) # remember value, cut off the "TEST:"
}
} | Set-Content mynewFile.txt -Encoding ASCII

Related

Regex for multiple non-consecutive backslash for each line not working

I'm trying to list all the files that contain multiple non-consecutive backslashes in each line.
Here's my script in powershell
Get-ChildItem -Path "D:\config_files" -Include "*.xml","*.txt" -Recurse |
Foreach-Object{
$file = $_.FullName
(Get-Content $file) |
Where-Object{
$_ -match '^(.*)=(")(.*?[^\\])(\\.*)(")(.*)$'
} |
Select-Object -Unique |
ForEach-Object{
Write-Host "$file : $_"
$_ | Out-File -FilePath 'matches.txt' -Append
}
}
Here's my regex
^(.*)=(")(.*?[^\\])(\\.*)(")(.*)$
These are the expected conditions.
starts with characters
followed by ="
contains non-consecutive backslash
followed by "
End with any characters
The regex should detect the text below
<add key="12345 value="\\machine\001\0z991\master" />
<settings file="..\app\service\config\settings.config">
<key="config" value="..\app\bin\config"/>
The problem is it only works in a single line. I already added '$' end the line

Remove thousands separator from a line starting with Total Value

I have a text file which was generated with Powershell.
There is a line that starts with Total Value: $
That line has a dollar amount which contains a thousands separator comma.
I would like to delete that comma, but only in that line.
I have tried using the following however it removes commas where I was not wanting this to
occur.
$Files = Get-ChildItem "C:\Users\User\Summary.txt"
foreach ($file in $Files)
{
$file |
Get-Content |
% {$_ -replace '([\d]),([\d])','$1$2' } |
out-file "C:\Users\User\Summary2.csv" -append -encoding ascii
}
This works however again it is removing comma in areas of the file which I was hoping they could remain.
Any assistance is appreciated.
You can use
foreach ($file in $Files)
{
(Get-Content $file -Raw) -replace '(?m)(?<=^Total\s+Value:\s*\$[\d,]*),','' |
out-file "C:\Users\User\Summary2.csv" -append -encoding ascii
}
See the regex demo.
Details:
Get-Content $file -Raw gets the contents of the file into a single string variable
(?m)(?<=^Total\s+Value:\s*\$[\d,]*), is a regex that matches
(?m)
(?<=^Total\s+Value:\s*\$[\d,]*) - a positive lookbehind that matches a location that is immediately preceded with
^ - start of a line
Total\s+Value: - Total Value: string with any one or more whitespaces between the two words
\$[\d,]* - a $ char and then zero or more digits or commas (a dollar price integer part)
, - a comma (that will be removed since the -replace operator is used with an empty replacement pattern (that can even be removed))

Powershell conditional replacement of a character sequence in a tab delimited file

I would like to conditionally replace a character sequence from strings in a tab delimited file.
In the example below, I want to replace 'apple' with 'orange' when the character sequence starts with 'DEF'. 'xxx' can be any characters or any length (but unlikely to be 'DEF' or apple').
ie:
xxxDEFxapplexxx<tab>DEFxxxapplexxx<tab>xxxDEFxxxapplexxx
to:
xxxDEFxxxapplexxx<tab>DEFxxxorangexxx<tab>xxxDEFxxxapplexxx
Powershell script:
$fileName = "tabfile.txt"
(Get-Content -Path $fileName -Encoding UTF8) |
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Set-Content -Path $fileName
It works fine when each string is separated by a new line (rather than a tab).
Output:
xxxDEFxxxapplexxx
DEFxxxorangexxx
xxxDEFxxxapplexxx
but doesn't work when the strings are separated by tabs (or spaces):
Output:
xxxDEFxxxapplexxx<tab>DEFxxxapplexxx<tab>xxxDEFxxxapplexxx
Thanks.
With help from the comments by iRon and Thomas, I figured out something that works:
Split the string at the tabs to create an array:
Get-Content with -Delimiter "`t" parameter.
Conditional match and replace text on each element:
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Recreate original string by joining each element of the array with a tab character:
Join-String -Separator "`t"
Complete code:
Get-Content -Path "tabfile.txt" -Delimiter "`t"|
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Join-String -Separator "`t"|
Out-File "tabfile.txt"
You do not need any conditional logic here because -replace does it for you implicitly: if there is no match, the string input is returned as is.
The regex you can use is
(?<=(?:^|\t)DEF_)apple
See the regex demo. Add \b word boundary if apple should not be followed with _, letter or digit, or add (?![^\W_]) if it cannot be followed with a digit or letter, but can be followed with _.
Details:
(?<=(?:^|\t)DEF_) - a positive lookbehind that matches a location that is immediately preceded with start of string (^) or (|) a tab (\t) and DEF_
apple - an apple string.
In Powershell, you could use
(Get-Content -Path $fileName -Encoding UTF8) -replace "(?<=(?:^|\t)DEF_)apple", "orange" | Set-Content -Path $fileName

How to modify this regex to work in Powershell

So I have this regex https://regex101.com/r/xG8oX2/2 which gives me the matches I want.
But when I run this powershell script, it give me no matches. What should I modify in this regex to be able to get the same matches in powershell?
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*\r*\n*.*\n.*ReportLayoutID=(\d{1,7})';
$reportLayoutIDList = Get-Content -Path bigOptions.txt | Out-String |
Select-String -Pattern $pattern2 -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="ReportHash";e={$_.Groups["reportHash"]}},
#{n="LayoutID";e={$_.Groups["reportLayoutID"]}};$reportLayoutIDList |
Export-csv reportLayoutIDList.csv;
The problem is your linebreaks. In windows, linebreaks are CRLF (\r\n) while in UNIX etc. they're just LF \n.
So either you need to modify the input to only use LF or you need to replace \n with \r\n in your regex.
As #briantist mentioned, using \r?\n will match either way.
Thank you to both Frode F and briantist.
This is the regex pattern that worked in Powershell:
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*[\r?\n]*.*[\r?\n].*ReportLayoutID=(?<reportLayoutID>\d+)';

Powershell regex group replacing

I want to replace some text in every script file in folder, and I'm trying to use this PS code:
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | ForEach-Object { (Get-Content $_.fullname) -replace $pattern, 'replace text' | Set-Content $_.fullname }
But I have no idea how to keep first part of expression, and just replace the second one. Any idea how can I do this? Thanks.
Not sure that provided regex for tables names is correct, but anyway you could replace with captures using variables $1, $2 and so on, and following syntax: 'Doe, John' -ireplace '(\w+), (\w+)', '$2 $1'
Note that the replacement pattern either needs to be in single quotes ('') or have the $ signs of the replacement group specifiers escaped ("`$2 `$1").
# may better replace with $pattern = '(FROM) (?<replacement_place>[a-zA-Z0-9_.]{1,7})'
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | % `
{
(Get-Content $_.fullname) | % `
{ $_-replace $pattern, '$1 replace text' } |
Set-Content $_.fullname -Force
}
If you need to reference other variables in your replacement expression (as you may), you can use a double-quoted string and escape the capture dollars with a backtick
{ $_-replace $pattern, "`$1 replacement text with $somePoshVariable" } |