Powershell: how to replace quoted text from a batch file - regex

I have a text file that contains:
#define VERSION "0.1.2"
I need to replace that version number from a running batch file.
set NEW_VERSION="0.2.0"
powershell -Command "(gc BBB.iss) -replace '#define VERSION ', '#define VERSION %NEW_VERSION% ' | Out-File BBB.iss"
I know that my match pattern is not correct. I need to select the entire line including the "0.2.0", but I can't figure out how to escape all that because it's all enclosed in double quotes so it runs in a batch file.
I'm guessing that [0-9].[0-9].[0-9] will match the actual old version number, but what about the quotes?

but what about the quotes?
When calling PowerShell's CLI from cmd.exe (a batch file) with powershell -command "....", use \" to pass embedded ".
(This may be surprising, given that PowerShell-internally you typically use `" or "" inside "...", but it is the safe choice from the outside.[1].)
Note:
While \" works robustly on the PowerShell side, it can situationally break cmd.exe's parsing. In that case, use "^"" (sic) with powershell.exe (Windows PowerShell), and "" with pwsh.exe (PowerShell (Core) 7+), inside overall "..." quoting. See this answer for details.
Here's an approach that matches and replaces everything between "..." after #define VERSION :
:: Define the new version *without* double quotes
set NEW_VERSION=0.2.0
powershell -Command "(gc BBB.iss) -replace '(?<=#define VERSION\s+\").+?(?=\")', '%NEW_VERSION%' | Set-Content -Encoding ascii BBB.iss"
Note that using Out-File (as used in the question) to rewrite the file creates a UTF-16LE ("Unicode") encoded file, which may be undesired; use Set-Content -Encoding ... to control the output encoding. The above command uses Set-Content -Encoding ascii as an example.
Also note that rewriting an existing file this way (read existing content into memory, write modified content back) bears the slight risk of data loss, if writing the file gets interrupted.
(?<=#define VERSION\s+\") is a look-behind assertion ((?<=...)) that matches literal #define VERSION followed by at least one space or tab (\s+) and a literal "
Note how the " is escaped as \", which - surprisingly - is how you need to escape literal " chars. when you pass a command to PowerShell from cmd.exe (a batch file).[1]
.+? then non-greedily (?) matches one or more (+) characters (.)...
...until the closing " (escaped as \") is found via (?=\"), a look-ahead assertion
((?<=...))
The net effect is that only the characters between "..." are matched - i.e., the mere version number - which then allows replacing it with just '%NEW_VERSION%', the new version number.
A simpler alternative, if all that is needed is to replace the 1st line, without needing to preserve specific information from it:
powershell -nop -Command "#('#define VERSION \"%NEW_VERSION%\"') + (gc BBB.iss | Select -Skip 1) | Set-Content -Encoding ascii BBB.iss"
The command simply creates an array (#(...)) of output lines from the new 1st line and (+) all but the 1st line from the existing file (gc ... | Select-Object -Skip 1) and writes that back to the file.
[1] When calling from cmd.exe, escaping an embedded " as "" sometimes , but not always works (try
powershell -Command "'Nat ""King"" Cole'").
Instead, \"-escaping is the safe choice.
`", which is the typical PowerShell-internal way to escape " inside "...", never works when calling from cmd.exe.

You can try this,
powershell -Command "(gc BBB.iss) -replace '(?m)^\s*#define VERSION .*$', '#define VERSION %NEW_VERSION% ' | Out-File BBB.iss"
If you want double quotes left,
powershell -Command "(gc BBB.iss) -replace '(?m)^\s*#define VERSION .*$', '#define VERSION "%NEW_VERSION%"' | Out-File BBB.iss"

Related

Powershell - Extract Non-UTF-8 Characters from multiple files and Re-write the new files and create a new file with the bad Characters (ebcdic?)

I have a small script that I can use to find and replace characters or strings in a file. It works and I can use it to replace the non UTF-8 characters.
What I need to do is run the script once and replace all the invalid data in one shot AND create another file that has the File name and bad characters.
Right now I have to run the script over and over with however many invalid characters I can ID by eyeball. Then I edit my tracking file with the contents of the script I ran and the File I ran it against.
Not efficient at all. Just to be clear, I have almost no clue how to code the second part of keeping track of what is corrected.
Can anyone offer a better way of doing this?
Thank you,
-Ron
$old = 'BAD DATA'
$new = ' '
$configFiles = Get-ChildItem . *.* -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "$old", "$new" } |
Set-Content $file.PSPath
}
Here is a sample of my DATA...
"PARTHENIA STREET °212 "," "," "," ","CAUGA PARK "
The data ' °' in HEX is c2 and b0. The original file before FTP is a single byte HEX 09. Not only did it convert wrong it added a btye to the file.
Here's an example translating ebcidic to ascii based on ASCII-to-EBCDIC or EBCDIC-to-ASCII and Working with non-native PowerShell encoding (EBCDIC), but the ebcidic file is completely unrecognizable. It doesn't have a BOM.
The file was downloaded with sftp, but it sounds like it was already corrupted.
"hi`tthere","how`tare" | set-content file.txt # tab 0x09 in the middle
# From ASCII to EBCDIC
$asciibytes = get-content file.txt -Encoding byte
$rawstring = [System.Text.Encoding]::ASCII.GetString($asciibytes)
$ebcdicbytes = [System.Text.Encoding]::GetEncoding('ebcdic-cp-us').getbytes($rawstring)
$ebcdicbytes | set-content ebcidic.txt -Encoding Byte
# From EBCDIC to ASCII
$ebcidicbytes = get-content ebcidic.txt -Encoding byte
$rawstring = [System.Text.Encoding]::getencoding('ebcdic-cp-us').GetString($ebcidicbytes)
$asciibytes = [system.text.encoding]::ASCII.GetBytes($rawstring)
$asciibytes | set-content ascii.txt -Encoding Byte
Here's a script called nonascii.ps1 that strips non-ascii characters (not between space and tilde in the ascii table, and also tab) and writes to the same filename.
(get-content $args[0]) -replace '[^ -~\t]' | set-content $args[0]
Note that powershell 5.1's get-content can't recognize utf8 no bom files without the '-encoding utf8' parameter.
get-content file -encoding utf8
Also note that powershell 6.2 and above can use any encoding known by .net, although tab completion doesn't reflect this:
"hi`tthere" | set-content ebcidic.txt -encoding ebcdic-cp-us
get-content ebcidic.txt -encoding ebcdic-cp-us

Powershell JSON transformation removing unicode escape chars without removing literal \n

My issue us similiar to this question:
Json file to powershell and back to json file
When importing and exporting ARM templates in powershell, using Convert-FromJson and Convert-ToJson, introduces unicode escape sequences.
I used the code here to unescape again.
Some example code (mutltiline for clarity):
$armADF = Get-Content -Path $armFile -Raw | ConvertFrom-Json
$armADFString = $armADF | ConvertTo-Json -Depth 50
$armADFString |
ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } |
Out-File $outputFile
Here's the doco I've been reading for Unescape
Results in the the output file being identical except that all instances of literal \n (that were in the original JSON file) are turned into actual carriage returns. Which breaks the ARM template.
If I don't include the Unescape code, the \n are preserved but so are the unicode characters which also breaks the ARM template.
It seems like I need to pre-escape the \n so when I call Unescape they are turned into nice little \n. I've tried a couple of things like adding this before calling unescape.
$armADFString = $armADFString -replace("\\n","\u000A")
Which does not give me the results I need.
Anyone come across this and solved it? Any accomplished escape artists?
I reread the Unescape doco and noticed that it would also basically remove leading \ characters so I tried this unlikely bit of code:
$armADF = Get-Content -Path $armFile -Raw | ConvertFrom-Json
$armADFString = $armADF | ConvertTo-Json -Depth 50
$armADFString = $armADFString -replace("\\n","\\n")
$armADFString |
ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } |
Out-File $outputFile
Of course - replacing \\n with \\n makes complete sense :|
More than happy for anyone to pose a more elegant solution.
EDIT: I am deploying ADF ARM templates which are themselves JSON based. TO cut a long story short I also found I needed to add this to stop it unescaping legitimately escaped quotes:
$armADFString = $armADFString -replace('\\"','\\"')

Replace a character with control character Field Seperator(\034) in powershell

(Get-Content C:\Users\georgeji\Desktop\KAI\KAI_Block_2\Temp\KAI_ORDER_DATARECON3.NONPUBLISH) | Foreach-Object {$_ -replace "~", "\034"} | Set-Content C:\Users\georgeji\Desktop\KAI\KAI_Block_2\Temp\KAI_ORDER_DATARECON4.NONPUBLISH
I am using the following command to replace ~ in text file with Field Seperator.
This command run sucessfully but when i open the output file in notepadd ++. I am just seeing the plain \034.
Ex:
Company_Identifier\034Primary_Transaction_ID
But the output should be like below
Please Help
Use
-replace "~", [char]0x1C
If you want to use it inside a longer string, you may use
-replace "~", "more $([char]0x1C) text"
The point here is that, in Powershell, you cannot use an octal char representation (nor \xYY or \uXXXX) since the escape sequences that it supports is limited to (see source)
`0 Null
`a Alert bell/beep
`b Backspace
`f Form feed (use with printer output)
`n New line
`r Carriage return
`r`n Carriage return + New line
`t Horizontal tab
`v Vertical tab (use with printer output)
The `r (carriage return) is ignored in PowerShell (ISE) Integrated Scripting Environment host application console, it does work in a PowerShell console session.
Using the Escape character to avoid special meaning.
`` To avoid using a Grave-accent as the escape character
`# To avoid using # to create a comment
`' To avoid using ' to delimit a string
`" To avoid using " to delimit a string
Windows PowerShell does not have currently (see below) have escape sequences for character literals like \034.
Instead, what you can do is cast the numerical ascii or unicode value to [char] in a subexpression:
"Company_Identifier$([char]0x1C)Primary_Transaction_ID"
Similarly, you can provide the same cast expression as the right-most operand to -replace, it'll be converted to a single-character string:
... |Foreach-Object {$_ -replace "~", [char]0x1C} |...
PowerShell 6.0 (currently in beta) introduces the `u{} escape sequence for unicode codepoint literals:
... |Foreach-Object {$_ -replace "~", "`u{1C}"} |...

Powershell replace function has escape characters

I am writing a batch script in which I am trying to replace a value in a prop file. I am using PowerShell for the replacement code as I couldn't find any comparable way to do in batch script.
powershell -Command "(gc %PROPFILEPATH%) -replace '%FTPoldfilepath%', '%FTPnewfile%' | Set-Content %PROPFILEPATH%"
The variables %PROPFILEPATH%, %FTPoldfilepath% and %FTPnewfile% contain double backslashes (Eg: C:\\testing\\feed)
I realize that backslashes need to be escaped, can anyone guide me how to implement the escape function here.
Use double backslashes. Does not hurt if they come through doubled, or even tripled.
You will need to use $ENV:PROFILEPATH, $ENV:FTPoldfilepath, and $ENV:FTPnewpath in place of %PROPFILEPATH%, '%FTPoldfilepath%', and '%FTPnewfile%'
If your goal is to load the current path, replace the old path with the new one and save the new path, consider doing so with a full script instead of a single command:
$oldftppath = 'c:\some\path'
$newftppath = 'c:\new\path'
$newpath = $ENV:PROFILEPATH.replace($oldftppath,$newftppath)
But then it gets tricky. If you need a persisent environment variable, you need to use .NET framework to set it. https://technet.microsoft.com/en-us/library/ff730964.aspx
[Environment]::SetEnvironmentVariable("TestVariable", "Test value.", "User")
So, using this syntax:
[Environment]::SetEnvironmentVariable("PROFILEPATH", "$newpath", "User")
Or it could be "machine" for the context.
For one thing, as #Xalorous mentioned, you'll have to use PowerShell syntax for accessing environment variables:
powershell -Command "(gc $env:PROPFILEPATH) -replace $env:FTPoldfilepath, $env:FTPnewfile | Set-Content $env:PROPFILEPATH"
Also, only the search string needs to be escaped, not the replacement string. You can use the Escape() method of the regex class for that:
powershell -Command "(gc $env:PROPFILEPATH) -replace [regex]::Escape($env:FTPoldfilepath), $env:FTPnewfile | Set-Content $env:PROPFILEPATH"
Escaping is required here, because the -replace operator treats the search string as a regular expression.
However, since you apparently want just a simple string replacement, not a regular expression match, you could also use the Replace() method of the source string:
powershell -Command "(gc $env:PROPFILEPATH) | % { $_.Replace($env:FTPoldfilepath, $env:FTPnewfile) } | Set-Content $env:PROPFILEPATH"
As a side note, since you're using PowerShell anyway, you should seriously consider writing the whole script in PowerShell. It usually makes things a lot easier.

Usage of | in PowerShell regex

I'm trying to split some text using PowerShell, and I'm doing a little experimenting with regex, and I would like to know exactly what the "|" character does in a PowerShell regex. For example, I have the following line of code:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[|\]')}
Running this line of code gives me the following output:
-blank line-
02
: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png
If I run the code without the "|" in the -split statement as such:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[\]')}
I get the following output without the [] being stripped (essentially it's just displaying the select-string output:
[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png
If I modify the code and run it like this:
"[02]: ./media/active-directory-dotnet-how-to-use-access-control/acs-01.png" | select-string '\[\d+\]:' | foreach-object {($_ -split '\[|')}
In the output, the [ is stripped from the beginning but the output has a carriage return after each character (I did not include the full output for space purposes).
0
2
]
:
.
/
m
e
The Pipe character, "|", separates alternatives in regex.
You can see all the metacharacters defined here:
http://regexlib.com/CheatSheet.aspx?AspxAutoDetectCookieSupport=1
The answers already explain what the | is for but I would like to explain what is happening with each example that you have above.
-split '\[|\]': You are trying to match either [ or ] which is why you get 3 results. The first being a blank line which is the whitespace represented by the beginning of the line before the first [
-split '\[\]': Since you are omitting the | symbol in this example you are requesting to split on the character sequence [] which does not appear in your string. This is contrasted by the code $_.split('\[\]') which would split on every character. This is by design.
-split '\[|': Here you are running into a caveat of not specifying the right hand operand for the | operator. To quote the help from Regex101 when this regex is specified:
(null, matches any position)
Warning: An empty alternative effectively truncates the regex at this
point because it will always find a zero-width match
Which is why the last example split on every element. Also, I dont think any of this is PowerShell only. This behavior should be seen on other engines as well.
Walter Mitty is correct, | is for alternation.
You can also use [Regex]::Escape("string") in Powershell and it will return a string that has all the special characters escaped. So you can use that on any strings you want to match literally (or to determine if a specific character does or can have special meaning in a regex).