powershell -replace: surround captured regex group with dollar signs like: $group$ - regex

I want to replace strings like url: `= this.url` with url: $url$
I got quite close with this:
(Get-Content '.\file') -Replace "``= this.(\w+)``", "$ `$1$"
with output url: $ url$.
But when I remove extra space then the output breaks.
How can I escape/modify "$`$1$" so that it works?

You can use
-Replace "``= this\.(\w+)``", '$$$1$$'
Note that
The . must be escaped in the regex pattern
'$$$1$$' is a $$$1$$ string that contains:
$$ - a literal single $ char
$1 - the backreference to the first capturing group
$$ - a literal single $ char.

Powershell 7 version of -replace with a scriptblock 2nd argument. Just assigning $_ into $a to look at it. Note the backquote is a special character inside doublequotes, which I'm avoiding.
'url: `= this.url`' -replace '`= this\.(\w+)`', {$a = $_; '$' + $_.groups[1] + '$'}
url: $url$
$a
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 5
Length : 12
Value : `= this.url`
ValueSpan :

tl;dr
# * Consistent use of '...', obviating the need to `-escape ` and $
# * Verbatim $ chars. in the substitution string escaped as $$
# * Capture-group reference $1 represented as ${1} for visual clarity.
(Get-Content .\file) -replace '`= this\.(\w+)`', '$$${1}$$'
Background information and guidance:
In the substitution operand of PowerShell's regex-based -replaceoperator, a verbatim $ character must be escaped as $$, given that $-prefixed tokens have special meaning, namely to refer to results of the regex matching operation, such as $1 in your command (a reference to what the 1st, unnamed capture group in the search regex captured).
Unlike what the docs partially suggest, such a substitution string is not itself a regex, and any other characters are used verbatim.
To programmatically escape $ for verbatim use in a substitution string, it's simplest to use the .Replace() .NET string method, which performs _verbatim (literal) replacements (assuming that all $ instance are to be escaped; e.g. '$foo$'.Replace('$', '$$')
Note that, situationally, a capture-group reference such as $1 may need to be disambiguated as ${1}, and you may always choose to do that for visual clarity, as shown above.
It is only the search operand is a regex, and there all characters that are regex metacharacters must be \-escaped in order to be used verbatim, which can be done:
character-individually, in string literals (amount: \$)
programmatically, for entire strings, using [regex]::Escape() ([regex]::Escape('amount: $'))
To avoid confusion over up-front string interpolation by PowerShell vs. what the .NET regex engine ends up seeing, it's best to consistently use verbatim (single-quoted) strings ('...') rather than expandable (double-quoted) strings ("...").
If embedding PowerShell variable values is needed, use techniques such as:
string concatenation ('^' + [regex]::Escape($foo) + '$')
or -f, the format operator ('^{0}$' -f [regex]::Escape($foo))
In your case, using '...' helps you avoid the `-escaping that "..." requires to make PowerShell treat $ and ` (and ") verbatim, as shown above.
For a comprehensive overview of PowerShell's -replace operator, see this answer.

Related

How to add a line-break and a back reference in Powershell [duplicate]

I am completely clueless on how to use regex and need some help on the problem above. I need to replace <> with new lines but keep the string between <>. So
<'sample text'><'sample text 2'>
becomes
'sample text'
'sample text2'
\<([^>]*)\>
This regex will capture the text between < and > into a capture groups, which you can then reference again and put a newline between them.
\1\n
Check it out here.
EDIT:
In PowerShell
PS C:\Users\shtabriz> $string = "<'sample text'><'sample text 2'>"
PS C:\Users\shtabriz> $regex = "\<([^>]*)\>"
PS C:\Users\shtabriz> [regex]::Replace($string, $regex, '$1'+"`n")
'sample text'
'sample text 2'
This works for me in Textpad:
Example:
String:
" 1) Navigate to record. 2) Navigate to the tab and select. 3) Click the field. 4) Click on the tab and scroll."
Note: For search/replace blow, do NOT include the quotes, I used them to show the presence of a space in the search term
Search: "[0-9]+) "
Replace: "\n$0"
Resulting String:
Navigate to record.
Navigate to the tab and select.
Click the field.
Click on the tab and scroll.
(note... stackoverflow changed my ")" to a ".")
To complement Shawn Tabrizi's helpful answer with a more PowerShell-idiomatic solution and some background information:
PowerShell surfaces the functionality of the .NET System.Text.RegularExpressions.Regex.Replace() method ([regex]::Replace(), from PowerShell) via its own -replace operator.
The most concise solution (but see below for potential pitfalls):
# Note the escaped "$" ("`$")
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', "`$1`n"
Output:
'sample text'
'sample text 2'
$1 is a numbered capture-group substitution, referring to what the 1st (and only) capture group inside the regex ((...)) captured, which are the strings between < and > (.*? is a non-greedy expression that matches any run of characters but stops once the next construct, > in this case, is found).
However, inside a double-quoted string ("..."), also known as an expandable string, $1 would be interpreted as a PowerShell variable reference, so the $ character must be escaped in order to be preserved, using the backtick (`), PowerShell's general escape character: "`$1"
Conversely, if you want the .NET API not to interpret a $ character in the substitution string, use $$ (either $$ inside '...', or "`$`$" inside "...") - but note that inside the regex operand a verbatim $ must be escaped as \$.
"`n" is a PowerShell escape sequence that can be used inside expandable strings (only) - see the conceptual about_Special_Characters help topic.
Caveat:
While convenient here, there are pitfalls with respect to using expandable strings as the regexes and substitution operands, as it isn't always obvious what PowerShell expands (interpolates) up front, and what the .NET API ends up seeing as a result.
Therefore, it is generally preferable to use single-quoted strings ('...', also known as verbatim strings) - both for the substitution operand and the regex itself, and - if needed - use an expression ((...)) to build the overall string, which allows you to separate the verbatim (pass-through) parts from interpolated parts.
This is what Shawn did in his answer; translated to a -replace operation:
# Note the expression used to build the substitution string
# from a verbatim ('...') and an interpolated ("...") part.
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ('${1}' + "`n")
Another option, using -f, the format operator:
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ("{0}`n" -f '${1}')
Note the use of ${1} instead of just $1: Enclosing the number / name of the referenced capture group in {...} disambiguates it from the characters that follow, which avoids another pitfall, as the following example shows (incidentally, PowerShell's own variable references can be disambiguated the same way):
# FAILS and results in 'f$142', because the .NET API sees
# '$142' as the substitution string, and there is no 142nd capture group.
$suffix = '42'; 'foo' -replace '(oo)', ('$1' + $suffix)
# OK, with disambiguation via {...} -> 'foo42'
$suffix = '42'; 'foo' -replace '(oo)', ('${1}' + $suffix)

Matching a variable that contains Regex characters [duplicate]

I am doing a string replacement in PowerShell. I have no control over the strings that are being replaced, but I can reproduce the issue I'm having this way:
> 'word' -replace 'word','##$+'
##word
When the actual output I need is
> 'word' -replace 'word','##$+'
##$+
The string $+ is being expanded to the word being replaced, and there's no way that I can find to stop this from happening. I've tried escaping the $ with \ (as if it was a regex), with backtick ` (as is the normal PowerShell way). For example:
> 'word' -replace 'word',('##$+' -replace '\$','`$')
##`word
How can I replace with a literal $+ in PowerShell? For what it's worth I'm running PowerShell Core 6, but this is reproducible on PowerShell 5 as well.
Instead of using the -replace operator, you can use the .Replace() method like so:
PS> 'word'.Replace('word','##$+')
##$+
The .Replace() method comes from the .NET String class whereas the -Replace operator is implemented using System.Text.RegularExpressions.Regex.Replace().
More info here: https://vexx32.github.io/2019/03/20/PowerShell-Replace-Operator/
I couldn't find it documented but the "Visual Basic" style escaping rule worked, repeat the character.
'word' -replace 'word','##$$+' gives you: ##$+
It's confusing how these codes aren't documented under "about comparison operators". Except I have a closed bug report about them underneath if you look, 'wow, where are all these -replace 2nd argument codes documented?'. https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_comparison_operators?view=powershell-7 That's become my reference. "$+" means "Substitutes the last submatch captured". Doubling up the dollar signs works, 'substitutes a single "$" literal':
'word' -replace 'word','##$$+'
##$+
Or use the scriptblock version of the second argument in PS 6 and above (may be slower):
'word' -replace 'word',{'##$+'}
tl;dr:
Double the $ in your replacement operand to use it verbatim:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$'
##$+
PowerShell's -replace operator:
uses a regex (regular expression) as the search (1st) operand.
If you want to use a search string verbatim, you must escape it:
programmatically: with [regex]::Escape()
or, in string literals, you can alternatively \-escape individual characters that would otherwise be interpreted as regex metacharacters.
uses a non-literal string that can refer to what the regex matched as the replacement (2nd) operand, via $-prefixed tokens such as $& or $+ (see link above or Substitutions in Regular Expressions).
To use a replacement string verbatim, double any $ chars. in it, which is programmatically most easily done with .Replace('$', '$$') (see below).
If both your search string and your replacement string are to be used verbatim, consider using the [string] type's .Replace() method instead, as shown in Brandon Olin's helpful answer.
Caveat: .Replace() is case-sensitive by default, whereas -replace is case-insensitive (as PowerShell generally is); use a different .Replace() overload for case-insensitivity, or, conversely, use the -creplace variant in PowerShell to get case-sensitivity.
[PowerShell (Core) 7+ only] Case-insensitive .Replace() example:
'FOO'.Replace('o', '#', 'CurrentCultureIgnoreCase')
.Replace() only accepts a single string as input, whereas -replace accepts an array of strings as the LHS; e.g.:
'hi', 'ho' -replace 'h', 'f' # -> 'fi', 'fo'
.Replace() is faster than -replace, though that will only matter in loops with high iteration counts.
If you were to stick with the -replace operator in your case:
As stated, doubling the $ in your replacement operand ensures that they're treated verbatim in the replacement:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$$'
##$+
To do this simple escaping programmatically, you can leverage the .Replace() method:
'word' -replace 'word', '##$+'.Replace('$', '$$')
You could also do it with a nested -replace operation, but that gets unwieldy (\$ escapes a $ in the regex; $$ represent a single $ in the replacement string):
# Same as above.
'word' -replace 'word', ('##$+' -replace '\$', '$$$$')
To put it differently: The equivalent of:
'word'.Replace('word', '##$+')
is (note the use of the case-sensitive variant of the -replace operator, -creplace):
'word' -creplace [regex]::Escape('word'), '##$+'.Replace('$', '$$')
However, as stated, if both the search string and the replacement operand are to be used verbatim, using .Replace() is preferable, both for concision and performance.

Escape string in PowerShell Regex to a regular string

I have a string that contains some regex in, $(=?, Now this string is a password that I need to pass for some application that I'm building.
The code that I'm trying to use is:
$x = 'GIWs#K?hks2v&HKXb$S9=HK*AZN=i!(S?7'
[Regex]::Escape($x)
I've already tried the method with [Regex]::Escape() and it doesn't meet my requirements because I'm trying to insert the string as a password and it replacing the Regex with \.
Perhaps after I'm doing the [Regex]::Escape() should I try to delete the \ that I'm getting from the result of the command?
After running the [Regex]::Escape() this is the result I'm getting when printing the output:
GIWs#K\?hks2v&HKXb=HK\*AZN=i!\(S\?7
I'm trying to achieve the string without the ' \ ' characters but with the Escape function:
GIWs#K?hks2v&HKXb=HK*AZN=i!(S?7
This is not an answer because I don't know what the problem actually is. However, there are some inherent problems with your current attempt to handle the password string. If you use double quotes ("") around a string, PowerShell will interpolate the string inside the quotes. So any alphanumeric characters following an unescaped $, will be considered a variable name during interpolation. If that variable has no value, $variable will be replaced with a null value. You can see this behavior below:
"rt4837s$GT=\"
rt4837s=\
You should use single quotes ('') when quoting string literals (characters that will be left as is). PowerShell will not attempt interpolation when unescaped single quote pairs are encountered unless there is quote nesting. See below:
'rt4837s$GT=\'
rt4837s$GT=\
If you need a regex escaped string, the same rules apply from above and you should use single quotes.
[regex]::escape('dfaseryh$S9=r??*')
dfaseryh\$S9=r\?\?\*
If for any reason, you need to access that string later without the escape characters, then you can use the regex method Unescape().
[regex]::unescape('dfaseryh\$S9=r\?\?\*')
dfaseryh$S9=r??*
Practical Example of Using Regex Replace:
$OriginalString = 'Username = Anonymous; Password = <password>'
$regexReplace = [regex]::Escape('<password>')
$Password = 'GIWs#K?hks2v&HKXb$S9=HK*AZN=i!(S?7'
$OriginalString -replace $regexReplace,($Password -replace '\$','$$$$')
# Output
Username = Anonymous; Password = GIWs#K?hks2v&HKXb$S9=HK*AZN=i!(S?7
In the code above, $OriginalString is just an ordinary string that can be retrieved from any command or set by a coder. It contains a string <password> that we want to replace with a complex password string GIWs#K?hks2v&HKXb$S9=HK*AZN=i!(S?7.
$Password contains the complex password. Since we only care about replacing <password> and are choosing to use regex replace operator -replace, we need a valid regex expression for matching <password>. There is a caveat here though. When using -replace, the $ in the replacement string is used to prefix capture group names. So there can be cases where the literal string has an unintentional replacement. Capture group 0 is always there if there is a match. So $0 will always cause issues without proper escaping. It is probably best to just escape $ regardless.
For the regex match, we use [regex]::Escape('<password>') since we are unsure if <> are special in regex. If there are no special characters, then the string within the regex expression will not be modified. If it does contain special characters, they will be escaped with \.
As a result, <password> is replaced with GIWs#K?hks2v&HKXb$S9=HK*AZN=i!(S?7.
A recap of the syntax is as follows:
'String With Something You Want to Replace' -replace 'Regex Expression to Match String You Want to Replace','Replacement That Is a Literal String With Escaped $'

How to use a variable as part of a regular expression in PowerShell

I want to Select-String parts of a file path starting at a string value that is contained in a variable. Let me explain this in an abstracted example.
Let's assume this path: /docs/reports/test reports/document1.docx
Using a regular expression I can get the required string like so:
'^.*(?=\/test\s)'
https://regex101.com/r/6mBhLX/5
The resulting string is '/test reports/document1.docx'.
Now, for this to work I have to use the literal string 'test'. However, I would like to know how to use a variable that contains 'test', e.g. $myString.
I already looked at How do you use a variable in a regular expression?, but I couldn't figure out how to adapt this for PowerShell.
I suggest using $([regex]::escape($myString)) inside a double quoted string literal:
$myString="[test]"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
Or, in case you do not want to worry with additional escaping, use a regular concatenation using + operator:
$pattern = '^.*(?=/' + [regex]::escape($myString) +'\s)'
The resulting $pattern will look like ^.*(?=/\[test]\s). Since the $myString variable is a literal string, you need to escape all special regex metacharacters (with [regex]::escape()) that may be inside it for the regex engine to interpret it as literal chars.
In your case, you may use
$s = '/docs/reports/test reports/document1.docx'
$myString="test"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
$s -replace $pattern
Result: /test reports/document1.docx
Wiktor Stribiżew's helpful answer provides the crucial pointer:
Use [regex]::Escape() in order to escape a string for safe inclusion in a regex (regular expression) so that it is treated as a literal;
e.g., [regex]::Escape('$10?') yields \$10\? - the characters with special meaning to a regex were \-escaped.
However, I suggest using '...', i.e., building the regex from single-quoted aka verbatim strings:
$myString='test'
$regex = '^.*(?=/' + [regex]::escape($myString) + '\s)'
Using the -f operator - $regex = '^.*(?=/{0}'\s)' -f [regex]::Escape($myString) works too and is perhaps visually cleaner, but note that -f - unlike string concatenation with + - is culture-sensitive, which can lead to different results.
Using '...' strings in regex contexts in PowerShell is a good habit to form:
By avoiding "...", so-called expandable strings, you avoid additional up-front interpretation (interpolation a.k.a expansion) of the string, which can have unexpected effects, given that $ has special meaning in both contexts: the start of
a variable reference or subexpression when string-expanding, and the end-of-input marker in regexes.
Using "..." can be especially tricky in the replacement string of the regex-based -replace operator, in whose replacement string operand tokens such as $1 refer to capture-group results, and if you used "$1", PowerShell would try to expand a $1 variable, which presumably doesn't exist, resulting in the empty string.
Just write the variable within double quotes ("pattern"), like this:
PS > $pattern = "^\d+\w+"
PS > "357test*&(fdnsajkfj" -match $pattern # return true
PS > "357test*&(fdnsajkfj" -match "$pattern.*\w+$" # return true
PS > "357test*&(fdnsajkfj" -match "$pattern\w+$" # return false
Please have a try. :)

Perl: how to use string variables as search pattern and replacement in regex

I want to use string variables for both search pattern and replacement in regex. The expected output is like this,
$ perl -e '$a="abcdeabCde"; $a=~s/b(.)d/_$1$1_/g; print "$a\n"'
a_cc_ea_CC_e
But when I moved the pattern and replacement to a variable, $1 was not evaluated.
$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/g; print "$a\n"'
a_$1$1_ea_$1$1_e
When I use "ee" modifier, it gives errors.
$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/gee; print "$a\n"'
Scalar found where operator expected at (eval 1) line 1, near "$1$1"
(Missing operator before $1?)
Bareword found where operator expected at (eval 1) line 1, near "$1_"
(Missing operator before _?)
Scalar found where operator expected at (eval 2) line 1, near "$1$1"
(Missing operator before $1?)
Bareword found where operator expected at (eval 2) line 1, near "$1_"
(Missing operator before _?)
aeae
What do I miss here?
Edit
Both $p and $r are written by myself. What I need is to do multiple similar regex replacing without touching the perl code, so $p and $r have to be in a separate data file. I hope this file can be used with C++/python code later.
Here are some examples of $p and $r.
^(.*\D)?((19|18|20)\d\d)年 $1$2<digits>年
^(.*\D)?(0\d)年 $1$2<digits>年
([TKZGD])(\d+)/(\d+)([^\d/]) $1$2<digits>$3<digits>$4
([^/TKZGD\d])(\d+)/(\d+)([^/\d]) $1$3分之$2$4
With $p="b(.)d"; you are getting a string with literal characters b(.)d. In general, regex patterns are not preserved in quoted strings and may not have their expected meaning in a regex. However, see Note at the end.
This is what qr operator is for: $p = qr/b(.)d/; forms the string as a regular expression.
As for the replacement part and /ee, the problem is that $r is first evaluated, to yield _$1$1_, which is then evaluated as code. Alas, that is not valid Perl code. The _ are barewords and even $1$1 itself isn't valid (for example, $1 . $1 would be).
The provided examples of $r have $Ns mixed with text in various ways. One way to parse this is to extract all $N and all else into a list that maintains their order from the string. Then, that can be processed into a string that will be valid code. For example, we need
'$1_$2$3other' --> $1 . '_' . $2 . $3 . 'other'
which is valid Perl code that can be evaluated.
The part of breaking this up is helped by split's capturing in the separator pattern.
sub repl {
my ($r) = #_;
my #terms = grep { $_ } split /(\$\d)/, $r;
return join '.', map { /^\$/ ? $_ : q(') . $_ . q(') } #terms;
}
$var =~ s/$p/repl($r)/gee;
With capturing /(...)/ in split's pattern, the separators are returned as a part of the list. Thus this extracts from $r an array of terms which are either $N or other, in their original order and with everything (other than trailing whitespace) kept. This includes possible (leading) empty strings so those need be filtered out.
Then every term other than $Ns is wrapped in '', so when they are all joined by . we get a valid Perl expression, as in the example above.
Then /ee will have this function return the string (such as above), and evaluate it as valid code.
We are told that safety of using /ee on external input is not a concern here. Still, this is something to keep in mind. See this post, provided by Håkon Hægland in a comment. Along with the discussion it also directs us to String::Substitution. Its use is demonstrated in this post. Another way to approach this is with replace from Data::Munge
For more discussion of /ee see this post, with several useful answers.
Note on using "b(.)d" for a regex pattern
In this case, with parens and dot, their special meaning is maintained. Thanks to kangshiyin for an early mention of this, and to Håkon Hægland for asserting it. However, this is a special case. Double-quoted strings directly deny many patterns since interpolation is done -- for example, "\w" is just an escaped w (what is unrecognized). The single quotes should work, as there is no interpolation. Still, strings intended for use as regex patterns are best formed using qr, as we are getting a true regex. Then all modifiers may be used as well.