Replacing a block of text in powershell - regex

I have the following (sample) text:
line1
line2
line3
I would like to use the powershell -replace method to replace the whole block with:
lineA
lineB
lineC
I'm not sure how to format this to account for the carriage returns/line breaks... Just encapsulating it in quotes like this doesn't work:
{$_ -replace "line1
line2
line3",
"lineA
lineB
lineC"}
How would this be achieved? Many thanks!

There is nothing syntactically wrong with your command - it's fine to spread string literals and expressions across multiple lines (but see caveat below), so the problem likely lies elsewhere.
Caveat re line endings:
If you use actual line breaks in your string literals, they'll implicitly be encoded based on your script file's line-ending style (CRLF on Windows, LF-only on Unix) - and may not match the line endings in your input.
By contrast, if you use control-character escapes `r`n (CRLF) vs. `n` (LF-only) in double-quoted strings, as demonstrated below, you're not only able to represent multiline strings on a single line, but you also make the line-ending style explicit and independent of the script file's own encoding, which is preferable.
In the remainder of this answer I'm assuming that the input has CRLF (Windows-style) line endings; to handle LF-only (Unix-style) input instead, simply replace all `r`n instances with `n.
I suspect that you're not sending your input as a single, multiline string, but line by line, in which case your replacement command will never find a match.
If your input comes from a file, be sure to use Get-Content's -Raw parameter to ensure that the entire file content is sent as a single string, rather than line by line; e.g.:
Get-Content -Raw SomeFile |
ForEach-Object { $_ -replace "line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC" }
Alternatively, since you're replacing literals, you can use the [string] type's Replace() method, which operates on literals (which has the advantage of not having to worry about needing to escape regular-expression metacharacters in the replacement string):
Get-Content -Raw SomeFile |
ForEach-Object { $_.Replace("line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC") }
MatthewG's answer adds a twist that makes the replacement more robust: appending a final line break to ensure that only a line matching line 3 exactly is considered:
"line1`r`nline2`r`nline3" -> "line1`r`nline2`r`nline3`r`n" and
"lineA`r`nlineB`r`nlineC" -> "lineA`r`nlineB`r`nlineC`r`n"

In Powershell you can use `n (backtick-n) for a newline character.
-replace "line1`nline2`nline3`n", "lineA`nlineB`nlineC`n"

Related

How to replace lines depending on the remaining text in file using PowerShell

I need to edit txt file using PowerShell. The problem is that I need to apply changes for the string only if the remaining part of the string matches some pattern. For example, I need to change 'specific_text' to 'other_text' only if the line ends with 'pattern':
'specific_text and pattern' -> changes to 'other_text and pattern'
But if the line doesn't end with pattern, I don't need to change it:
'specific_text and something else' -> no changes
I know about Replace function in PowerShell, but as far as I know it makes simple change for all matches of the regex. There is also Select-String function, but I couldn't combine them properly. My idea was to make it this way:
((get-content myfile.txt | select-string -pattern "pattern") -Replace "specific_text", "other_text") | Out-File myfile.txt
But this call rewrites the whole file and leaves only changed lines.
You may use
(get-content myfile.txt) -replace 'specific_text(?=.*pattern$)', "other_text" | Out-File myfile.txt
The specific_text(?=.*pattern$) pattern matches
specific_text - some specific_text...
(?=.*pattern$) - not immediately followed with any 0 or more chars other than a newline as many as possible and then pattern at the end of the string ($).

BASH - Replacement of regex match within a file

Given the following files:
input_file:
My inputfile, contains multiple line
and also special characters {}[]ä/
template_file:
Contains multiple lines,
also special characters {}[]ä/
##regex_match## <= must be replaced by input_file
Content goes on
abc
output_file:
Contains multiple lines,
also special characters {}[]ä/
My inputfile, contains multiple line
and also special characters {}[]ä/
Content goes on
abc
I thought about sed but that would be very cumbersome because of escaping and newlines. Is there any other solution in BASH?
perl solution just for variety's sake.
perl -0777 -lpe'
BEGIN {
open $fh, "<", "input_file";
$input = $fh->getline
}
s/##regex_match##/$input/
' < template_file > output_file
sed -n -e '/##regex_match##/{r input_file' -e 'b' -e '}; p' template_file
If the regex is matched, read and output the input file and branch (end processing of the line and don't print it). Otherwise print the line.
The use of -e delimits parts of the sed commands so that the r command which reads the input file knows where the name of the file ends. Otherwise it would greedily consume the following sed commands as if they were part of the file name.
The curly braces delimit a block in the program that's like an if statement.
I tested this on MacOS, but it should be pretty similar for GNU. MacOS sed is much pickier about -e (among other differences which don't come into play here).
A very slight variation on the technique Dennis Williamson already posted, merely for discussion purposes -
sed '/##regex_match##/ {
r input_file
d
}' template_file
Contains multiple lines,
also special characters {}[]ä/
My inputfile, contains multiple line
and also special characters {}[]ä/
Content goes on
abc
c.f. the manual.
He used -e options to pass commands, where I separated them with newlines. Usually a semicolon is enough, but apparently r makes other commands on the same line get ignored.
The d prevents the tag pattern from being printed.
With any awk in any shell on every UNIX box and with any characters:
$ awk 'NR==FNR{rec=rec sep $0; sep=ORS; next} /##regex_match##/{$0=rec} 1' input_file template_file
Contains multiple lines,
also special characters {}[]ä/
My inputfile, contains multiple line
and also special characters {}[]ä/
Content goes on
abc

Regex replace multilines in powershell

I want to replace these line in my AssemblyInfo.cs encoded in UTF-8 with Windows CRLF at the end of each lines
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
by these
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
To do so, I have a powershell script that will parse through all my files and do the replacement.
The regex I prepare in regex101 is this one and works on 101 :
<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\("2\.0\.0\.0"\)\]\n\[assembly:\sAssemblyFileVersion\("2\.0\.0\.0"\)\]\n=======\n\[assembly:\sAssemblyVersion\("1\.1\.0\.0"\)\]\n\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\n>>>>>>>\sv1_final_release
I can't manage to make the -replace work on the new lines.
But when targeting only <<<<<<<\sHEAD, it matches and replacing is performed.
All the following variations failed :
<<<<<<<\sHEAD\n\[assembly: no error no replacement
<<<<<<<\sHEAD\r\n\[assembly: no error no replacement
<<<<<<<\sHEADrn\[assembly: no error no replacement, write-host prints it as
<<<<<<<\sHEAD
\[assembly:
It's not about /gm or (*CRLF)
My powershell instruction for info :
$ConflictVersionRegex = "<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\(`"2\.0\.0\.0`"\)\]\n\[assembly:\sAssemblyFileVersion\(`"2\.0\.0\.0`"\)\]\n=======\n\[assembly:\sAssemblyVersion\(`"1\.1\.0\.0`"\)\]\n\[assembly: AssemblyFileVersion\(`"1\.1\.0\.0`"\)\]\n>>>>>>>\sv1_final_release"
$ConflictVersionRegexTest = "<<<<<<<\sHEAD`r`n\[assembly:"
$fileContent = Get-Content($filePath)
$filecontent = $filecontent -replace $ConflictVersionRegexTest, $AssemblyNewVersion
[System.IO.File]::WriteAllLines($filePath, $fileContent, $Utf8NoBomEncoding)
What am I missing ? Why is it not replacing ?
Many thanks
Based on feedback from Poutrathor (the OP), there were two problems:
The primary problem was that Get-Content($filePath) (which should be written asGet-Content $filePath[1]) reads the file line by line, which results in an array of lines when captured in a variable.
-replace then operates on each input line individually, which means that the line-spanning regex won't match anything.
Solution: Use Get-Content -Raw (PSv3+) to read the file as a whole into a single, multi-line string.
Secondarily, you mention needing to replace the regex newline (end-of-line) escape sequence (\n) (LF) with its PowerShell string-interpolation counterpart (`n) - note that PowerShell uses `, the backtick, as the escape character:
Note that that is only necessary in the replacement string, in order to create actual, literal newlines (line breaks) on output - as opposed to using regex construct \n for matching newlines.
However, on Windows, newlines are typically CRLF sequences, i.e., a CR (\r, `r) immediately followed by a LF (\n / `n) - i.e., \r\n/ `r`n - whereas on Unix-like platforms they are just LF, \n / `n.
If you're not sure which style of newlines given input has, use \r?\n to match newlines in a cross-platform-compatible manner.
If you don't care what specific newlines the input has, this is safe to use methodically, as a matter of habit.
Therefore:
In your regex, while in your case you can choose between \r\n and `r`n, note that:
`r`n only works in double-quoted "..." strings.
It is generally preferable to use literal, single-quoted strings to store regexes - which requires use of \r\n (Windows) / \n (Unix) / \r?\n (platform-agnostic) - so that there's no confusion over which parts of the string PowerShell interpolates up front vs. which parts are interpreted by the regex engine.
In your replacement string, use `r`n inside "..." to create actual newlines.
As an alternative to using escape sequences to represent newlines, you can use here-strings to conveniently define multi-line strings with actual newlines (line breaks), as shown in Paweł Dyl's answer, but there's a caveat:
Here-strings invariably have the same style of newline as the enclosing script file, which means that:
A regex based on a here-string will only match if the input happens to have the same style of newlines as the script file.
A replacement string based on a here-string will invariably use the script file's newline style.
[1] Your call looks like a .NET method call and while it happens to work in this case, such syntax confusion should be avoided: PowerShell cmdlets and functions are invoked like shell commands: without parentheses ((...)) and with whitespace-separated arguments.
See following demo:
$newText = #'
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
'#
$src = #'
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Other lines and second instance
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Some other lines
'#
$src -replace ('<<<<<<< HEAD\s+',
'\[assembly: AssemblyVersion\("2\.0\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("2\.0\.0\.0"\)\]\s+'+
'=======\s+'+
'\[assembly: AssemblyVersion\("1\.1\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\s+'+
'>>>>>>> v1_final_release'),$newText
Also, make sure your contents are read as one large string. This can be achieved using Get-Content $path -Raw or [System.IO.File]::ReadAllText($path).

BASH escaping double quotes within single quotes

I'm trying to write a bash function that would escape all double quotes within single quotes, eg:
'I need to escape "these" quotes with backslashes'
would become
'I need to escape \"these\" quotes with backslashes'
My take on it was:
Find pairs of single quotes in the input and extract them with grep
Pipe into sed, escape double quotes
Sed again the whole input and replace grep match with sedded match
I managed to get it working to the part of having correctly escaped quotes section, but replacing it in the whole input fails.
The script code copypaste:
# $1 - Full name, $2 - minified name
adjust_quotes ()
{
SINGLE_QUOTES=`grep -Eo "'.*'" $2`
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
sed -r "s|'.*'|$ESCAPED_QUOTES|g" "$2" > "$2.escaped"
mv "$2.escaped" $2
echo "Quotes escaped within single quotes on $2"
}
Random additional questions:
In the console, escaping the quote with only two backslashes works, but when code is put in the script - I need four. I'd love to know
Could I modify this code into a loop to escape all pairs of single quotes, one after another until EOF?
Thanks!
P.S. I know this would probably be easier to do in eg. python, but I really need to keep it in bash.
Using BASH string replacement:
s='I need to escape "these" quotes with backslashes'
r="${s//\"/\\\"}"
echo "$r"
I need to escape \"these\" quotes with backslashes
Here's a pure bash solution, which does the transformation on stdin, printing to stdout. It reads the entire input into memory, so it won't work with really enormous files.
escape_enclosed_quotes() (
IFS=\'
read -d '' -r -a fields
for ((i=1; i<${#fields[#]}; i+=2)); do
fields[i]=${fields[i]//\"/\\\"}
done
printf %s "${fields[*]}"
)
I deliberately enclosed the body of the function in parentheses rather than braces, in order to force the body to run in a subshell. That limits the modification of IFS to the body, as well as implicitly making the variables used local.
The function uses the read builtin to read the entire input (since the line delimiter is set to NUL with -d '') into an array (-a) using a single quote as the field separator (IFS=\'). The result is that the parts of the input surrounded with single quotes are in the odd positions of the array, so the function loops over the odd indices to do the substitution only for those fields. I use bash's find-and-replace syntax instead of deferring to an external utility like sed.
This being bash, there are a couple of gotchas:
If the file contains a NUL, the rest of the file will be ignored.
If the last line of the file does not end with a newline, and the last character of that line is a single quote, it will not be output.
Both of the above conditions are impossible in a portable text file, so it's probably OK. All the same, worth taking note.
The supplementary question: why are the extra backslashes needed in
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
Answer: It has nothing to do with that line being in a script. It has to do with your use of backticks (...) for command substitution, and the idiosyncratic and often unpredictable handling of backslashes inside backticks. This syntax is deprecated. Do not use it. (Not even if you see someone else using it in some random example on the internet.) If you had used the recommended $(...) syntax for command substitution, it would have worked as expected:
ESCAPED_QUOTES=$(echo $SINGLE_QUOTES | sed 's|"|\\"|g')
(More information is in the Bash FAQ linked above.)

Perl regex substitution not working with global modifier

I have code that looks like the following:
s/(["\'])(?:\\?+.)*?\1/(my $x = $&) =~ s|^(["\'])(.*src=)([\'"])\/|$1$2$3$1.\\$baseUrl.$1\/|g;$x/ge
Ignoring the last bit (and only leaving the part where the problems occur) the code becomes:
s/(["\'])(?:\\?+.)*?\1/replace-text-here/g
I have tried using both, but I still get the same problem, which is that even though I am using the g modifier, this regex only matches and replaces the first occurrence. If this is a Perl bug, I don't know, but I was using a regex that matches everything between two quotes, and also handles escaped quotes, and I was following this blog post. In my eyes, that regex should match everything between the two quotes, then replace it, then try and find another instance of this pattern, because of the g modifier.
For a bit of background information, I am not using and version declarations, and strict and warnings are turned on, yet no warnings have shown up. My script reads an entire file into a scalar (including newlines) then the regex operates directly on that scalar. It does seem to work on each line individually - just not multiple times on one line. Perl version 5.14.2, running on Cygwin 64-bit. It could be that Cygwin (or the Perl port) is messing something up, but I doubt it.
I also tried another example from that blog post, with atomic groups and possessive quantifiers replaced with equivalent code but without those features, but this problem still plagued me.
Examples:
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
Should become (with the shortened regex):
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
Yet it only becomes:
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
<?php echo ($sub->getTarget() != "")?"target=\"".$sub->getTarget()."\"":""; ?>
Should become:
<?php echo ($sub->getTarget() != replace-text-here)?replace-text-here.$sub->getTarget().replace-text-here:replace-text-here; ?>
And as above, only the first occurrence is changed.
(And yes, I do realise that this will spark into some sort of - don't use regex for parsing HTML/PHP. But in this case I think that regex is more appropriate, as I am not looking for context, I am looking for a string (anything within quotes) and performing an operation on that string - which is regex.)
And just a note - these regexes are running in an eval function, and the actual regex is encoded in a single quoted string (which is why the single quotes are escaped). I will try any presented solution directly though to rule out my bad programming.
EDIT: As requested, a short script that presents the problems:
#!/usr/bin/perl -w
use strict;
my $data = "this is the first line, where nothing much happens
but on the second line \"we suddenly have some double quotes\"
and on the third line there are 'single quotes'
but the fourth line has \"double quotes\" AND 'single quotes', but also another \"double quote\"
the fifth line has the interesting one - \"double quoted string 'with embedded singles' AND \\\"escaped doubles\\\"\"
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
";
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
my $regex2 = 's/([\'"]).*?\1/replaced2!/g';
print $data."\n";
$_ = $data; # to make the regex operate on $_, as per the original script
eval($regex);
print $_."\n";
$_ = $data;
eval($regex2);
print $_; # just an example of an eval, but without the fancy possessive quantifiers
This produces the following output for me:
this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\""
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\replaced!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
this is the first line, where nothing much happens
but on the second line replaced2!
and on the third line there are replaced2!
but the fourth line has replaced2! AND replaced2!, but also another replaced2!
the fifth line has the interesting one - replaced2!escaped doubles\replaced2!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
Even within single-quotes, \\ gets processed as \, so this:
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
sets $regex to this:
s/(["'])(?:\?+.)*?\1/replaced!/g
which requires each character in the quoted-string to be preceded by one or more literal question-marks (\?+). Since you don't have lots of question-marks, this effectively means that you're requiring the string to be empty, either "" or ''.
The minimal fix is to add more backslashes:
my $regex = 's/(["\'])(?:\\\\?+.)*?\\1/replaced!/g';
but you really might want to rethink your approach. Do you really need to save the whole regex-replacement command as a string and run it via eval?
Update: this:
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
should be:
my $regex = 's/(["\'])(?:\\\\?+.)*?\1/replaced!/g';
since those single quotes there in the assignment turn \\ into \ and you want the regex to end up with \\.
Please boil your problem down to a short script that demonstrates the problem (including input, bad output, eval and all). Taking what you do show and trying it:
use strict;
use warnings;
my $input = <<'END';
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
END
(my $output = $input) =~ s/(["\'])(?:\\?+.)*?\1/replace-text-here/g;
print $input,"becomes\n",$output;
produces for me:
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
becomes
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
as I would expect. What does it do for you?