Regex replace multilines in powershell

Regex replace multilines in powershell - regex

I want to replace these line in my AssemblyInfo.cs encoded in UTF-8 with Windows CRLF at the end of each lines
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
by these
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
To do so, I have a powershell script that will parse through all my files and do the replacement.
The regex I prepare in regex101 is this one and works on 101 :
<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\("2\.0\.0\.0"\)\]\n\[assembly:\sAssemblyFileVersion\("2\.0\.0\.0"\)\]\n=======\n\[assembly:\sAssemblyVersion\("1\.1\.0\.0"\)\]\n\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\n>>>>>>>\sv1_final_release
I can't manage to make the -replace work on the new lines.
But when targeting only <<<<<<<\sHEAD, it matches and replacing is performed.
All the following variations failed :
<<<<<<<\sHEAD\n\[assembly: no error no replacement
<<<<<<<\sHEAD\r\n\[assembly: no error no replacement
<<<<<<<\sHEADrn\[assembly: no error no replacement, write-host prints it as
<<<<<<<\sHEAD
\[assembly:
It's not about /gm or (*CRLF)
My powershell instruction for info :
$ConflictVersionRegex = "<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\(`"2\.0\.0\.0`"\)\]\n\[assembly:\sAssemblyFileVersion\(`"2\.0\.0\.0`"\)\]\n=======\n\[assembly:\sAssemblyVersion\(`"1\.1\.0\.0`"\)\]\n\[assembly: AssemblyFileVersion\(`"1\.1\.0\.0`"\)\]\n>>>>>>>\sv1_final_release"
$ConflictVersionRegexTest = "<<<<<<<\sHEAD`r`n\[assembly:"
$fileContent = Get-Content($filePath)
$filecontent = $filecontent -replace $ConflictVersionRegexTest, $AssemblyNewVersion
[System.IO.File]::WriteAllLines($filePath, $fileContent, $Utf8NoBomEncoding)
What am I missing ? Why is it not replacing ?
Many thanks

Based on feedback from Poutrathor (the OP), there were two problems:
The primary problem was that Get-Content($filePath) (which should be written asGet-Content $filePath[1]) reads the file line by line, which results in an array of lines when captured in a variable.
-replace then operates on each input line individually, which means that the line-spanning regex won't match anything.
Solution: Use Get-Content -Raw (PSv3+) to read the file as a whole into a single, multi-line string.
Secondarily, you mention needing to replace the regex newline (end-of-line) escape sequence (\n) (LF) with its PowerShell string-interpolation counterpart (`n) - note that PowerShell uses `, the backtick, as the escape character:
Note that that is only necessary in the replacement string, in order to create actual, literal newlines (line breaks) on output - as opposed to using regex construct \n for matching newlines.
However, on Windows, newlines are typically CRLF sequences, i.e., a CR (\r, `r) immediately followed by a LF (\n / `n) - i.e., \r\n/ `r`n - whereas on Unix-like platforms they are just LF, \n / `n.
If you're not sure which style of newlines given input has, use \r?\n to match newlines in a cross-platform-compatible manner.
If you don't care what specific newlines the input has, this is safe to use methodically, as a matter of habit.
Therefore:
In your regex, while in your case you can choose between \r\n and `r`n, note that:
`r`n only works in double-quoted "..." strings.
It is generally preferable to use literal, single-quoted strings to store regexes - which requires use of \r\n (Windows) / \n (Unix) / \r?\n (platform-agnostic) - so that there's no confusion over which parts of the string PowerShell interpolates up front vs. which parts are interpreted by the regex engine.
In your replacement string, use `r`n inside "..." to create actual newlines.
As an alternative to using escape sequences to represent newlines, you can use here-strings to conveniently define multi-line strings with actual newlines (line breaks), as shown in Paweł Dyl's answer, but there's a caveat:
Here-strings invariably have the same style of newline as the enclosing script file, which means that:
A regex based on a here-string will only match if the input happens to have the same style of newlines as the script file.
A replacement string based on a here-string will invariably use the script file's newline style.
[1] Your call looks like a .NET method call and while it happens to work in this case, such syntax confusion should be avoided: PowerShell cmdlets and functions are invoked like shell commands: without parentheses ((...)) and with whitespace-separated arguments.

See following demo:
$newText = #'
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
'#
$src = #'
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Other lines and second instance
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Some other lines
'#
$src -replace ('<<<<<<< HEAD\s+',
'\[assembly: AssemblyVersion\("2\.0\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("2\.0\.0\.0"\)\]\s+'+
'=======\s+'+
'\[assembly: AssemblyVersion\("1\.1\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\s+'+
'>>>>>>> v1_final_release'),$newText
Also, make sure your contents are read as one large string. This can be achieved using Get-Content $path -Raw or [System.IO.File]::ReadAllText($path).

Related

PowerShell v5.1: problem with replacing text in a file using regex patterns

I tried to follow answers provided here and here.
In a "test1.txt" file I have these contents:
20220421
20220422
20220423
20220424:
222
I want to replace the contents so that they would look like this in the output file "test2.txt":
20220421:
20220422:
20220423:
20220424:
222
I attempted to achieve this with the following code:
(Get-Content '.\test1.txt').replace('^\d{8}$', '^\d{8}:$') | Out-File '.\test2.txt'
However, instead of the expected results, I got the following content in "test2.txt":
20220421
20220422
20220423
20220424:
222
Can someone explain why I'm not achieving the expected results?

You are not using the regex supporting -replace operator and you are usinv a regex in the replacement instead of the correct replacement pattern.
You can use
(Get-Content '.\test1.txt') -replace '^(\d{8}):?\r?$', '$1:') | Out-File '.\test2.txt'
The ^(\d{8}):?\r?$ regex matches eight digits capturing them into Group 1, and then an optional colon, an optional CR and then asserts the end of string position.
The replacement is Group 1 value ($1) plus the colon char.

Powershell is treating $ and ^ as the beginning and end of the whole contents, not individual lines.
This is not quite what you want - I can't get the line break in the replacement string.
#"
20220421
20220422
20220423
20220424:
222
"# -replace "(\d{8})\n",'$1:'
line breaks not working

Perl multiline regex in windows

I'm stuck with this scenario, I have this regex
*Input added here for clarity:
181221533;MG;3;1476729;<vars> <vint> <name>mtest</name> <storedPrecedure>f_sc_mtest</SP> <base>M_data</base> <dataType>I</dataType> <timeMS>17</timeMS> <ttidr>abc</ttidr> <base>S</base> <valor>0</valor> </vint> </vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;MG;6314429;740484;<vars> <vint> <name>mtest</name> <sP>f_sc_mtest</sP> <base>sscy</base> <dataType>I</dataType> <timeMS>16</timeMS> <ttidr>abc</Idtype> <base>S</base> <valor>4</valor> </vint></vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeMS>0</timeMS> <Idtype>XYZ</Idtype> <base>O</base> </vint>
</vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeProcess>1</timeProcess> <Idtype>XYZ</Idtype> <base>O</base> </vint>
</vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36
And I want to implement this regex in perl with multiline support because as you can see in the sample, there are line breaks in records and this regex searchs 'incomplete' lines (and the extra line) and fixes them (one record/line should end with a datetime)
this is what I'm attempting with perl:
perl.exe -0777 -i -pe "s/(?m)^(.*)(>)([\n]+)(<)(.*)([\n]+)(\s*)$/$1$2 $4$5/igs" "sample.txt"
And doesn't seem to work, I keep getting the same text file. I'm using perl inside a portable GIT installation (v5.34.0)
Is there something I'm missing?
edit: This is how the output should look like:
181221533;MG;3;1476729;<vars> <vint> <name>mtest</name> <storedPrecedure>f_sc_mtest</SP> <base>M_data</base> <dataType>I</dataType> <timeMS>17</timeMS> <ttidr>abc</ttidr> <base>S</base> <valor>0</valor> </vint> </vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;MG;6314429;740484;<vars> <vint> <name>mtest</name> <sP>f_sc_mtest</sP> <base>sscy</base> <dataType>I</dataType> <timeMS>16</timeMS> <ttidr>abc</Idtype> <base>S</base> <valor>4</valor> </vint></vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeMS>0</timeMS> <Idtype>XYZ</Idtype> <base>O</base> </vint> </vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeProcess>1</timeProcess> <Idtype>XYZ</Idtype> <base>O</base> </vint> </vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36

Capture the whole record and replace all newlines in it by a space, using another regex inside the replacement part (courtesy of /e modifier). Then replace all multiple newlines by a single one
perl.exe -0777 -wpe'
s{ (?:^|\R)\K (\d{9}; .*? \s+\d\d:\d\d:\d\d) }{$1 =~ s/\n+/ /r}segx; s{\n+}{\n}g
' file.txt
I consider a "record" to be: [0-9]{9}; on line/file beginning, then all up to and including a timestamp after spaces. The details for beginning and end of record should protect against accidental matching of possible unexpected patterns inside those tags.
This is cumbersome but it captures the record correctly I hope, even if some details change.
Apparently the above fails on Windows as it stands, while it is confirmed to work on Linux (the only system I can try it on right now).
The issue must be in newlines -- so try replacing \n in matches with \R or \r\n. In particular in the regex embedded in the replacement part. Or, to be safe and perhaps portable, replace \n with (\r?\n) (so the carriage return character is optional, need not be there in order to match).
So either
s{ (?:^|\R)\K (\d{9}; .*? \s+\d\d:\d\d:\d\d) }{$1 =~ s/\R+/ /r}segx; s{\R+}{\r\n}g
or
s{ (?:^|\R)\K(\d{9};.*?\s+\d\d:\d\d:\d\d) }{$1 =~ s/(\r\n)+/ /r}segx; s{(\r\n)+}{\r\n}g
But \R should match it on Windows, so you should be able to use \R for matching and \r\n when needed in replacements. See it under Misc in perlbackslash
Better yet, if it works, is to use PerlO layers. Normally a Windows build of Perl adds the :crlf layer by default but that seems not to be the case here.
In a one-liner try:
perl.exe -0777 -Mopen=:std,IO,:crlf -wpe'...'
Or, use the "one-liner" as a normal program, without file-processing switches, and set this up via open pragma and open a file manually
perl -wE'use open IO => ":crlf"; $_ = do { local $/; <> }; s{...}{...}; say' file
With layers set like this (in either way) use the regex with \n.

If the issue is having newlines in the wrong place, either multiple newlines in a row, or before a <, you may get away with something simple like this:
use strict;
use warnings;
my $str = do { local $/; <DATA> };
$str =~ s/\n(?=[<\n])//g;
print $str;
__DATA__
181221533;<valor>0</valor></vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;</vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;</vint>
</vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988; </vint>
</vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36
(I shortened the input to make it readable)
Output:
181221533;<valor>0</valor></vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;</vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;</vint></vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988; </vint></vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36

This seems to produce the wanted output:
perl.exe -0777 -pe "s: *\n(?=</): :g;s/\n+/\n/g"
The first substitution replaces whitespace followed by a newline before </ by four spaces.
The second substitution replaces multiple newlines by a single one. You can also replace it by a transliteration: tr/\n//s, the /s "squeezes" the newlines.

How can I replace anything before first forward slash using bash script?

Using GitHub workflow I have the following command
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]')
This would return a value like something/something-else/another
I am looking to add to this script to replace everything before the first forward slash with thisword
Which would output thisword/something-else/another
Can regex be used on the single line script to do this replace? I believe I could use the following regex /^[^/]+/ but unsure how to combine with the current script.

Depending on the version and distro of sed (apologies, but there are many with different syntax and flags), you might be able to do something like:
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]' | sed 's/^[a-z]*\//thisword\//' )
Sed is finding-and-replacing a string of text starting from the beginning of the line ^ which contains any number of occurrences * of lowercase characters in any order [a-z] which are then followed by the first slash. The slashes can be escaped by using the backslash character \. To clarify sed's use of /, here's the same expression omitting the regex and slashes forming part of your search string: sed 's/find/replace/'.

Try the below regex
^([a-z]*)(\/)
function formatData() {
var str = "something/something-else/another";
var res = str.replace(/^([a-z]*)!?(\/)/gim, "otherword/");
document.getElementById("demo").innerHTML = res;
}

Assuming MY_DIR holds something/something-else/another, you can use
MY_DIR="something/something-else/another"
MY_DIR="thisword/${MY_DIR#*/}"
echo "$MY_DIR"
See the online demo.
This is an example of string variable expansion where # means "replace as few chars as possible from the left", and */ glob matches any text up to a / including it.

Replacing a block of text in powershell

I have the following (sample) text:
line1
line2
line3
I would like to use the powershell -replace method to replace the whole block with:
lineA
lineB
lineC
I'm not sure how to format this to account for the carriage returns/line breaks... Just encapsulating it in quotes like this doesn't work:
{$_ -replace "line1
line2
line3",
"lineA
lineB
lineC"}
How would this be achieved? Many thanks!

There is nothing syntactically wrong with your command - it's fine to spread string literals and expressions across multiple lines (but see caveat below), so the problem likely lies elsewhere.
Caveat re line endings:
If you use actual line breaks in your string literals, they'll implicitly be encoded based on your script file's line-ending style (CRLF on Windows, LF-only on Unix) - and may not match the line endings in your input.
By contrast, if you use control-character escapes `r`n (CRLF) vs. `n` (LF-only) in double-quoted strings, as demonstrated below, you're not only able to represent multiline strings on a single line, but you also make the line-ending style explicit and independent of the script file's own encoding, which is preferable.
In the remainder of this answer I'm assuming that the input has CRLF (Windows-style) line endings; to handle LF-only (Unix-style) input instead, simply replace all `r`n instances with `n.
I suspect that you're not sending your input as a single, multiline string, but line by line, in which case your replacement command will never find a match.
If your input comes from a file, be sure to use Get-Content's -Raw parameter to ensure that the entire file content is sent as a single string, rather than line by line; e.g.:
Get-Content -Raw SomeFile |
ForEach-Object { $_ -replace "line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC" }
Alternatively, since you're replacing literals, you can use the [string] type's Replace() method, which operates on literals (which has the advantage of not having to worry about needing to escape regular-expression metacharacters in the replacement string):
Get-Content -Raw SomeFile |
ForEach-Object { $_.Replace("line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC") }
MatthewG's answer adds a twist that makes the replacement more robust: appending a final line break to ensure that only a line matching line 3 exactly is considered:
"line1`r`nline2`r`nline3" -> "line1`r`nline2`r`nline3`r`n" and
"lineA`r`nlineB`r`nlineC" -> "lineA`r`nlineB`r`nlineC`r`n"

In Powershell you can use `n (backtick-n) for a newline character.
-replace "line1`nline2`nline3`n", "lineA`nlineB`nlineC`n"

Regular Expression for carriage return occuring at begining or end file

I am looking for a way to remove 'stray' carriage returns occurring at the beginning or end of a file. ie:
\r\n <-- remove this guy
some stuff to say \r\n
some more stuff to say \r\n
\r\n <-- remove this guy
How would you match \r\n followed by 'nothing' or preceded by 'nothing'?

Try this regular expression:
^(\r\n)+|\r\n(\r\n)+$

Depending on the language either the following regex in multiline mode:
^\r\n|\r\n$
Or this regex:
\A\r\n|\r\n\z
The first one works in e.g. perl (where ^ and $ match beginning/end of line in single-line mode and beginning/end of string in multiline mode). The latter works in e.g. ruby.

Here's a sed version that should print out the stripped file:
sed -i .bak -e '/./,$!d' -e :a -e '/^\n*$/{$d;N;ba' -e '}' foo.txt
The -i tells it to perform the edit in-place and the .bak tells it to back up the original with a .bak extension first. If memory is a concern, you can use '' instead of .bak and no backup will be made. I don't recommend unless absolutely necessary, though.
The first command ('/./,$!d' should get rid of all leading blank lines), and the rest is to handle all trailing blank lines.
See this list of handy sed 1-liners for other interesting things you can chain together.

^\s+|\s+$
\s is whitespace (space, \r, \n, tab)
+ is saying 1 or more
$ is saying at the end of the input
^ is saying at the start of the input

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js