Replace a character with control character Field Seperator(\034) in powershell - regex

(Get-Content C:\Users\georgeji\Desktop\KAI\KAI_Block_2\Temp\KAI_ORDER_DATARECON3.NONPUBLISH) | Foreach-Object {$_ -replace "~", "\034"} | Set-Content C:\Users\georgeji\Desktop\KAI\KAI_Block_2\Temp\KAI_ORDER_DATARECON4.NONPUBLISH
I am using the following command to replace ~ in text file with Field Seperator.
This command run sucessfully but when i open the output file in notepadd ++. I am just seeing the plain \034.
Ex:
Company_Identifier\034Primary_Transaction_ID
But the output should be like below
Please Help

Use
-replace "~", [char]0x1C
If you want to use it inside a longer string, you may use
-replace "~", "more $([char]0x1C) text"
The point here is that, in Powershell, you cannot use an octal char representation (nor \xYY or \uXXXX) since the escape sequences that it supports is limited to (see source)
`0 Null
`a Alert bell/beep
`b Backspace
`f Form feed (use with printer output)
`n New line
`r Carriage return
`r`n Carriage return + New line
`t Horizontal tab
`v Vertical tab (use with printer output)
The `r (carriage return) is ignored in PowerShell (ISE) Integrated Scripting Environment host application console, it does work in a PowerShell console session.
Using the Escape character to avoid special meaning.
`` To avoid using a Grave-accent as the escape character
`# To avoid using # to create a comment
`' To avoid using ' to delimit a string
`" To avoid using " to delimit a string

Windows PowerShell does not have currently (see below) have escape sequences for character literals like \034.
Instead, what you can do is cast the numerical ascii or unicode value to [char] in a subexpression:
"Company_Identifier$([char]0x1C)Primary_Transaction_ID"
Similarly, you can provide the same cast expression as the right-most operand to -replace, it'll be converted to a single-character string:
... |Foreach-Object {$_ -replace "~", [char]0x1C} |...
PowerShell 6.0 (currently in beta) introduces the `u{} escape sequence for unicode codepoint literals:
... |Foreach-Object {$_ -replace "~", "`u{1C}"} |...

Related

Regex replace multilines in powershell

I want to replace these line in my AssemblyInfo.cs encoded in UTF-8 with Windows CRLF at the end of each lines
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
by these
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
To do so, I have a powershell script that will parse through all my files and do the replacement.
The regex I prepare in regex101 is this one and works on 101 :
<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\("2\.0\.0\.0"\)\]\n\[assembly:\sAssemblyFileVersion\("2\.0\.0\.0"\)\]\n=======\n\[assembly:\sAssemblyVersion\("1\.1\.0\.0"\)\]\n\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\n>>>>>>>\sv1_final_release
I can't manage to make the -replace work on the new lines.
But when targeting only <<<<<<<\sHEAD, it matches and replacing is performed.
All the following variations failed :
<<<<<<<\sHEAD\n\[assembly: no error no replacement
<<<<<<<\sHEAD\r\n\[assembly: no error no replacement
<<<<<<<\sHEADrn\[assembly: no error no replacement, write-host prints it as
<<<<<<<\sHEAD
\[assembly:
It's not about /gm or (*CRLF)
My powershell instruction for info :
$ConflictVersionRegex = "<<<<<<<\sHEAD\n\[assembly:\sAssemblyVersion\(`"2\.0\.0\.0`"\)\]\n\[assembly:\sAssemblyFileVersion\(`"2\.0\.0\.0`"\)\]\n=======\n\[assembly:\sAssemblyVersion\(`"1\.1\.0\.0`"\)\]\n\[assembly: AssemblyFileVersion\(`"1\.1\.0\.0`"\)\]\n>>>>>>>\sv1_final_release"
$ConflictVersionRegexTest = "<<<<<<<\sHEAD`r`n\[assembly:"
$fileContent = Get-Content($filePath)
$filecontent = $filecontent -replace $ConflictVersionRegexTest, $AssemblyNewVersion
[System.IO.File]::WriteAllLines($filePath, $fileContent, $Utf8NoBomEncoding)
What am I missing ? Why is it not replacing ?
Many thanks
Based on feedback from Poutrathor (the OP), there were two problems:
The primary problem was that Get-Content($filePath) (which should be written asGet-Content $filePath[1]) reads the file line by line, which results in an array of lines when captured in a variable.
-replace then operates on each input line individually, which means that the line-spanning regex won't match anything.
Solution: Use Get-Content -Raw (PSv3+) to read the file as a whole into a single, multi-line string.
Secondarily, you mention needing to replace the regex newline (end-of-line) escape sequence (\n) (LF) with its PowerShell string-interpolation counterpart (`n) - note that PowerShell uses `, the backtick, as the escape character:
Note that that is only necessary in the replacement string, in order to create actual, literal newlines (line breaks) on output - as opposed to using regex construct \n for matching newlines.
However, on Windows, newlines are typically CRLF sequences, i.e., a CR (\r, `r) immediately followed by a LF (\n / `n) - i.e., \r\n/ `r`n - whereas on Unix-like platforms they are just LF, \n / `n.
If you're not sure which style of newlines given input has, use \r?\n to match newlines in a cross-platform-compatible manner.
If you don't care what specific newlines the input has, this is safe to use methodically, as a matter of habit.
Therefore:
In your regex, while in your case you can choose between \r\n and `r`n, note that:
`r`n only works in double-quoted "..." strings.
It is generally preferable to use literal, single-quoted strings to store regexes - which requires use of \r\n (Windows) / \n (Unix) / \r?\n (platform-agnostic) - so that there's no confusion over which parts of the string PowerShell interpolates up front vs. which parts are interpreted by the regex engine.
In your replacement string, use `r`n inside "..." to create actual newlines.
As an alternative to using escape sequences to represent newlines, you can use here-strings to conveniently define multi-line strings with actual newlines (line breaks), as shown in Paweł Dyl's answer, but there's a caveat:
Here-strings invariably have the same style of newline as the enclosing script file, which means that:
A regex based on a here-string will only match if the input happens to have the same style of newlines as the script file.
A replacement string based on a here-string will invariably use the script file's newline style.
[1] Your call looks like a .NET method call and while it happens to work in this case, such syntax confusion should be avoided: PowerShell cmdlets and functions are invoked like shell commands: without parentheses ((...)) and with whitespace-separated arguments.
See following demo:
$newText = #'
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
'#
$src = #'
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Other lines and second instance
<<<<<<< HEAD
[assembly: AssemblyVersion("2.0.0.0")]
[assembly: AssemblyFileVersion("2.0.0.0")]
=======
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
>>>>>>> v1_final_release
Some other lines
'#
$src -replace ('<<<<<<< HEAD\s+',
'\[assembly: AssemblyVersion\("2\.0\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("2\.0\.0\.0"\)\]\s+'+
'=======\s+'+
'\[assembly: AssemblyVersion\("1\.1\.0\.0"\)\]\s+',
'\[assembly: AssemblyFileVersion\("1\.1\.0\.0"\)\]\s+'+
'>>>>>>> v1_final_release'),$newText
Also, make sure your contents are read as one large string. This can be achieved using Get-Content $path -Raw or [System.IO.File]::ReadAllText($path).

Powershell: how to replace quoted text from a batch file

I have a text file that contains:
#define VERSION "0.1.2"
I need to replace that version number from a running batch file.
set NEW_VERSION="0.2.0"
powershell -Command "(gc BBB.iss) -replace '#define VERSION ', '#define VERSION %NEW_VERSION% ' | Out-File BBB.iss"
I know that my match pattern is not correct. I need to select the entire line including the "0.2.0", but I can't figure out how to escape all that because it's all enclosed in double quotes so it runs in a batch file.
I'm guessing that [0-9].[0-9].[0-9] will match the actual old version number, but what about the quotes?
but what about the quotes?
When calling PowerShell's CLI from cmd.exe (a batch file) with powershell -command "....", use \" to pass embedded ".
(This may be surprising, given that PowerShell-internally you typically use `" or "" inside "...", but it is the safe choice from the outside.[1].)
Note:
While \" works robustly on the PowerShell side, it can situationally break cmd.exe's parsing. In that case, use "^"" (sic) with powershell.exe (Windows PowerShell), and "" with pwsh.exe (PowerShell (Core) 7+), inside overall "..." quoting. See this answer for details.
Here's an approach that matches and replaces everything between "..." after #define VERSION :
:: Define the new version *without* double quotes
set NEW_VERSION=0.2.0
powershell -Command "(gc BBB.iss) -replace '(?<=#define VERSION\s+\").+?(?=\")', '%NEW_VERSION%' | Set-Content -Encoding ascii BBB.iss"
Note that using Out-File (as used in the question) to rewrite the file creates a UTF-16LE ("Unicode") encoded file, which may be undesired; use Set-Content -Encoding ... to control the output encoding. The above command uses Set-Content -Encoding ascii as an example.
Also note that rewriting an existing file this way (read existing content into memory, write modified content back) bears the slight risk of data loss, if writing the file gets interrupted.
(?<=#define VERSION\s+\") is a look-behind assertion ((?<=...)) that matches literal #define VERSION followed by at least one space or tab (\s+) and a literal "
Note how the " is escaped as \", which - surprisingly - is how you need to escape literal " chars. when you pass a command to PowerShell from cmd.exe (a batch file).[1]
.+? then non-greedily (?) matches one or more (+) characters (.)...
...until the closing " (escaped as \") is found via (?=\"), a look-ahead assertion
((?<=...))
The net effect is that only the characters between "..." are matched - i.e., the mere version number - which then allows replacing it with just '%NEW_VERSION%', the new version number.
A simpler alternative, if all that is needed is to replace the 1st line, without needing to preserve specific information from it:
powershell -nop -Command "#('#define VERSION \"%NEW_VERSION%\"') + (gc BBB.iss | Select -Skip 1) | Set-Content -Encoding ascii BBB.iss"
The command simply creates an array (#(...)) of output lines from the new 1st line and (+) all but the 1st line from the existing file (gc ... | Select-Object -Skip 1) and writes that back to the file.
[1] When calling from cmd.exe, escaping an embedded " as "" sometimes , but not always works (try
powershell -Command "'Nat ""King"" Cole'").
Instead, \"-escaping is the safe choice.
`", which is the typical PowerShell-internal way to escape " inside "...", never works when calling from cmd.exe.
You can try this,
powershell -Command "(gc BBB.iss) -replace '(?m)^\s*#define VERSION .*$', '#define VERSION %NEW_VERSION% ' | Out-File BBB.iss"
If you want double quotes left,
powershell -Command "(gc BBB.iss) -replace '(?m)^\s*#define VERSION .*$', '#define VERSION "%NEW_VERSION%"' | Out-File BBB.iss"

Replacing a block of text in powershell

I have the following (sample) text:
line1
line2
line3
I would like to use the powershell -replace method to replace the whole block with:
lineA
lineB
lineC
I'm not sure how to format this to account for the carriage returns/line breaks... Just encapsulating it in quotes like this doesn't work:
{$_ -replace "line1
line2
line3",
"lineA
lineB
lineC"}
How would this be achieved? Many thanks!
There is nothing syntactically wrong with your command - it's fine to spread string literals and expressions across multiple lines (but see caveat below), so the problem likely lies elsewhere.
Caveat re line endings:
If you use actual line breaks in your string literals, they'll implicitly be encoded based on your script file's line-ending style (CRLF on Windows, LF-only on Unix) - and may not match the line endings in your input.
By contrast, if you use control-character escapes `r`n (CRLF) vs. `n` (LF-only) in double-quoted strings, as demonstrated below, you're not only able to represent multiline strings on a single line, but you also make the line-ending style explicit and independent of the script file's own encoding, which is preferable.
In the remainder of this answer I'm assuming that the input has CRLF (Windows-style) line endings; to handle LF-only (Unix-style) input instead, simply replace all `r`n instances with `n.
I suspect that you're not sending your input as a single, multiline string, but line by line, in which case your replacement command will never find a match.
If your input comes from a file, be sure to use Get-Content's -Raw parameter to ensure that the entire file content is sent as a single string, rather than line by line; e.g.:
Get-Content -Raw SomeFile |
ForEach-Object { $_ -replace "line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC" }
Alternatively, since you're replacing literals, you can use the [string] type's Replace() method, which operates on literals (which has the advantage of not having to worry about needing to escape regular-expression metacharacters in the replacement string):
Get-Content -Raw SomeFile |
ForEach-Object { $_.Replace("line1`r`nline2`r`nline3", "lineA`r`nlineB`r`nlineC") }
MatthewG's answer adds a twist that makes the replacement more robust: appending a final line break to ensure that only a line matching line 3 exactly is considered:
"line1`r`nline2`r`nline3" -> "line1`r`nline2`r`nline3`r`n" and
"lineA`r`nlineB`r`nlineC" -> "lineA`r`nlineB`r`nlineC`r`n"
In Powershell you can use `n (backtick-n) for a newline character.
-replace "line1`nline2`nline3`n", "lineA`nlineB`nlineC`n"

Powershell: using -split "\s+" as opposed to .split "\s+"

Prelude
I am trying to perform an operation which requires me to parse every individual word a particular file. The most straightforward way of doing this would be to load the text using the:
$content = Get-Content -Path .\<filename>
Then I will break every individual word into an individual line (this allows me to do a word count AND single word search very quickly). The problem is when I then use this line of code:
$content.split("\s+")
which should create a new line (split) on every (one or more) whitespace character. Unfortunately, my results look like this:
$content.split("\s+")
The SpeechSynthe
izer cla
provide
acce
to the functionality of a
peech
ynthe
i
engine that i
in
talled on the ho
t computer. In
talled
peech
ynthe
i
engine
But when I run
$content -split("\s+")
The results will come out correctly:
$content -split("\s+")
The
SpeechSynthesizer
class
provides
access
to
the
functionality
of
a
speech
synthesis
My question
Using powershell V.4 I am having trouble understanding what the difference between performing the operation.
$content.split("\s+")
and
$content -split("\s+")
is. And why they are outputting different results.
Is that functionality just broken?
Is there some other difference that I am not aware of at play here?
See Powershelladmin wiki:
The -split operator takes a regular expression, and to split on an arbitrary amount of whitespace, you can use the regexp "\s+".
And
To split on a single, or multiple, characters, you can also use the System.String object method Split().
PS C:\> 'a,b;c,d'.Split(',') -join ' | '
a | b;c | d
PS C:\> 'a,b;c,d'.Split(',;') -join ' | '
a | b | c | d
So, you just passed the symbols you need to split against with $content.split("\s+"), not the regex to match whitespace.
In $content -split("\s+"), \s+ is a regex pattern matching 1 or more whitespace symbols.

Usage of | in PowerShell regex

I'm trying to split some text using PowerShell, and I'm doing a little experimenting with regex, and I would like to know exactly what the "|" character does in a PowerShell regex. For example, I have the following line of code:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[|\]')}
Running this line of code gives me the following output:
-blank line-
02
: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png
If I run the code without the "|" in the -split statement as such:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[\]')}
I get the following output without the [] being stripped (essentially it's just displaying the select-string output:
[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png
If I modify the code and run it like this:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[|')}
In the output, the [ is stripped from the beginning but the output has a carriage return after each character (I did not include the full output for space purposes).
0
2
]
:
.
/
m
e
The Pipe character, "|", separates alternatives in regex.
You can see all the metacharacters defined here:
http://regexlib.com/CheatSheet.aspx?AspxAutoDetectCookieSupport=1
The answers already explain what the | is for but I would like to explain what is happening with each example that you have above.
-split '\[|\]': You are trying to match either [ or ] which is why you get 3 results. The first being a blank line which is the whitespace represented by the beginning of the line before the first [
-split '\[\]': Since you are omitting the | symbol in this example you are requesting to split on the character sequence [] which does not appear in your string. This is contrasted by the code $_.split('\[\]') which would split on every character. This is by design.
-split '\[|': Here you are running into a caveat of not specifying the right hand operand for the | operator. To quote the help from Regex101 when this regex is specified:
(null, matches any position)
Warning: An empty alternative effectively truncates the regex at this
point because it will always find a zero-width match
Which is why the last example split on every element. Also, I dont think any of this is PowerShell only. This behavior should be seen on other engines as well.
Walter Mitty is correct, | is for alternation.
You can also use [Regex]::Escape("string") in Powershell and it will return a string that has all the special characters escaped. So you can use that on any strings you want to match literally (or to determine if a specific character does or can have special meaning in a regex).