Matching a variable that contains Regex characters [duplicate] - regex

I am doing a string replacement in PowerShell. I have no control over the strings that are being replaced, but I can reproduce the issue I'm having this way:
> 'word' -replace 'word','##$+'
##word
When the actual output I need is
> 'word' -replace 'word','##$+'
##$+
The string $+ is being expanded to the word being replaced, and there's no way that I can find to stop this from happening. I've tried escaping the $ with \ (as if it was a regex), with backtick ` (as is the normal PowerShell way). For example:
> 'word' -replace 'word',('##$+' -replace '\$','`$')
##`word
How can I replace with a literal $+ in PowerShell? For what it's worth I'm running PowerShell Core 6, but this is reproducible on PowerShell 5 as well.

Instead of using the -replace operator, you can use the .Replace() method like so:
PS> 'word'.Replace('word','##$+')
##$+
The .Replace() method comes from the .NET String class whereas the -Replace operator is implemented using System.Text.RegularExpressions.Regex.Replace().
More info here: https://vexx32.github.io/2019/03/20/PowerShell-Replace-Operator/

I couldn't find it documented but the "Visual Basic" style escaping rule worked, repeat the character.
'word' -replace 'word','##$$+' gives you: ##$+

It's confusing how these codes aren't documented under "about comparison operators". Except I have a closed bug report about them underneath if you look, 'wow, where are all these -replace 2nd argument codes documented?'. https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_comparison_operators?view=powershell-7 That's become my reference. "$+" means "Substitutes the last submatch captured". Doubling up the dollar signs works, 'substitutes a single "$" literal':
'word' -replace 'word','##$$+'
##$+
Or use the scriptblock version of the second argument in PS 6 and above (may be slower):
'word' -replace 'word',{'##$+'}

tl;dr:
Double the $ in your replacement operand to use it verbatim:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$'
##$+
PowerShell's -replace operator:
uses a regex (regular expression) as the search (1st) operand.
If you want to use a search string verbatim, you must escape it:
programmatically: with [regex]::Escape()
or, in string literals, you can alternatively \-escape individual characters that would otherwise be interpreted as regex metacharacters.
uses a non-literal string that can refer to what the regex matched as the replacement (2nd) operand, via $-prefixed tokens such as $& or $+ (see link above or Substitutions in Regular Expressions).
To use a replacement string verbatim, double any $ chars. in it, which is programmatically most easily done with .Replace('$', '$$') (see below).
If both your search string and your replacement string are to be used verbatim, consider using the [string] type's .Replace() method instead, as shown in Brandon Olin's helpful answer.
Caveat: .Replace() is case-sensitive by default, whereas -replace is case-insensitive (as PowerShell generally is); use a different .Replace() overload for case-insensitivity, or, conversely, use the -creplace variant in PowerShell to get case-sensitivity.
[PowerShell (Core) 7+ only] Case-insensitive .Replace() example:
'FOO'.Replace('o', '#', 'CurrentCultureIgnoreCase')
.Replace() only accepts a single string as input, whereas -replace accepts an array of strings as the LHS; e.g.:
'hi', 'ho' -replace 'h', 'f' # -> 'fi', 'fo'
.Replace() is faster than -replace, though that will only matter in loops with high iteration counts.
If you were to stick with the -replace operator in your case:
As stated, doubling the $ in your replacement operand ensures that they're treated verbatim in the replacement:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$$'
##$+
To do this simple escaping programmatically, you can leverage the .Replace() method:
'word' -replace 'word', '##$+'.Replace('$', '$$')
You could also do it with a nested -replace operation, but that gets unwieldy (\$ escapes a $ in the regex; $$ represent a single $ in the replacement string):
# Same as above.
'word' -replace 'word', ('##$+' -replace '\$', '$$$$')
To put it differently: The equivalent of:
'word'.Replace('word', '##$+')
is (note the use of the case-sensitive variant of the -replace operator, -creplace):
'word' -creplace [regex]::Escape('word'), '##$+'.Replace('$', '$$')
However, as stated, if both the search string and the replacement operand are to be used verbatim, using .Replace() is preferable, both for concision and performance.

Related

powershell -replace: surround captured regex group with dollar signs like: $group$

I want to replace strings like url: `= this.url` with url: $url$
I got quite close with this:
(Get-Content '.\file') -Replace "``= this.(\w+)``", "$ `$1$"
with output url: $ url$.
But when I remove extra space then the output breaks.
How can I escape/modify "$`$1$" so that it works?
You can use
-Replace "``= this\.(\w+)``", '$$$1$$'
Note that
The . must be escaped in the regex pattern
'$$$1$$' is a $$$1$$ string that contains:
$$ - a literal single $ char
$1 - the backreference to the first capturing group
$$ - a literal single $ char.
Powershell 7 version of -replace with a scriptblock 2nd argument. Just assigning $_ into $a to look at it. Note the backquote is a special character inside doublequotes, which I'm avoiding.
'url: `= this.url`' -replace '`= this\.(\w+)`', {$a = $_; '$' + $_.groups[1] + '$'}
url: $url$
$a
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 5
Length : 12
Value : `= this.url`
ValueSpan :
tl;dr
# * Consistent use of '...', obviating the need to `-escape ` and $
# * Verbatim $ chars. in the substitution string escaped as $$
# * Capture-group reference $1 represented as ${1} for visual clarity.
(Get-Content .\file) -replace '`= this\.(\w+)`', '$$${1}$$'
Background information and guidance:
In the substitution operand of PowerShell's regex-based -replaceoperator, a verbatim $ character must be escaped as $$, given that $-prefixed tokens have special meaning, namely to refer to results of the regex matching operation, such as $1 in your command (a reference to what the 1st, unnamed capture group in the search regex captured).
Unlike what the docs partially suggest, such a substitution string is not itself a regex, and any other characters are used verbatim.
To programmatically escape $ for verbatim use in a substitution string, it's simplest to use the .Replace() .NET string method, which performs _verbatim (literal) replacements (assuming that all $ instance are to be escaped; e.g. '$foo$'.Replace('$', '$$')
Note that, situationally, a capture-group reference such as $1 may need to be disambiguated as ${1}, and you may always choose to do that for visual clarity, as shown above.
It is only the search operand is a regex, and there all characters that are regex metacharacters must be \-escaped in order to be used verbatim, which can be done:
character-individually, in string literals (amount: \$)
programmatically, for entire strings, using [regex]::Escape() ([regex]::Escape('amount: $'))
To avoid confusion over up-front string interpolation by PowerShell vs. what the .NET regex engine ends up seeing, it's best to consistently use verbatim (single-quoted) strings ('...') rather than expandable (double-quoted) strings ("...").
If embedding PowerShell variable values is needed, use techniques such as:
string concatenation ('^' + [regex]::Escape($foo) + '$')
or -f, the format operator ('^{0}$' -f [regex]::Escape($foo))
In your case, using '...' helps you avoid the `-escaping that "..." requires to make PowerShell treat $ and ` (and ") verbatim, as shown above.
For a comprehensive overview of PowerShell's -replace operator, see this answer.

How to add a line-break and a back reference in Powershell [duplicate]

I am completely clueless on how to use regex and need some help on the problem above. I need to replace <> with new lines but keep the string between <>. So
<'sample text'><'sample text 2'>
becomes
'sample text'
'sample text2'
\<([^>]*)\>
This regex will capture the text between < and > into a capture groups, which you can then reference again and put a newline between them.
\1\n
Check it out here.
EDIT:
In PowerShell
PS C:\Users\shtabriz> $string = "<'sample text'><'sample text 2'>"
PS C:\Users\shtabriz> $regex = "\<([^>]*)\>"
PS C:\Users\shtabriz> [regex]::Replace($string, $regex, '$1'+"`n")
'sample text'
'sample text 2'
This works for me in Textpad:
Example:
String:
" 1) Navigate to record. 2) Navigate to the tab and select. 3) Click the field. 4) Click on the tab and scroll."
Note: For search/replace blow, do NOT include the quotes, I used them to show the presence of a space in the search term
Search: "[0-9]+) "
Replace: "\n$0"
Resulting String:
Navigate to record.
Navigate to the tab and select.
Click the field.
Click on the tab and scroll.
(note... stackoverflow changed my ")" to a ".")
To complement Shawn Tabrizi's helpful answer with a more PowerShell-idiomatic solution and some background information:
PowerShell surfaces the functionality of the .NET System.Text.RegularExpressions.Regex.Replace() method ([regex]::Replace(), from PowerShell) via its own -replace operator.
The most concise solution (but see below for potential pitfalls):
# Note the escaped "$" ("`$")
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', "`$1`n"
Output:
'sample text'
'sample text 2'
$1 is a numbered capture-group substitution, referring to what the 1st (and only) capture group inside the regex ((...)) captured, which are the strings between < and > (.*? is a non-greedy expression that matches any run of characters but stops once the next construct, > in this case, is found).
However, inside a double-quoted string ("..."), also known as an expandable string, $1 would be interpreted as a PowerShell variable reference, so the $ character must be escaped in order to be preserved, using the backtick (`), PowerShell's general escape character: "`$1"
Conversely, if you want the .NET API not to interpret a $ character in the substitution string, use $$ (either $$ inside '...', or "`$`$" inside "...") - but note that inside the regex operand a verbatim $ must be escaped as \$.
"`n" is a PowerShell escape sequence that can be used inside expandable strings (only) - see the conceptual about_Special_Characters help topic.
Caveat:
While convenient here, there are pitfalls with respect to using expandable strings as the regexes and substitution operands, as it isn't always obvious what PowerShell expands (interpolates) up front, and what the .NET API ends up seeing as a result.
Therefore, it is generally preferable to use single-quoted strings ('...', also known as verbatim strings) - both for the substitution operand and the regex itself, and - if needed - use an expression ((...)) to build the overall string, which allows you to separate the verbatim (pass-through) parts from interpolated parts.
This is what Shawn did in his answer; translated to a -replace operation:
# Note the expression used to build the substitution string
# from a verbatim ('...') and an interpolated ("...") part.
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ('${1}' + "`n")
Another option, using -f, the format operator:
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ("{0}`n" -f '${1}')
Note the use of ${1} instead of just $1: Enclosing the number / name of the referenced capture group in {...} disambiguates it from the characters that follow, which avoids another pitfall, as the following example shows (incidentally, PowerShell's own variable references can be disambiguated the same way):
# FAILS and results in 'f$142', because the .NET API sees
# '$142' as the substitution string, and there is no 142nd capture group.
$suffix = '42'; 'foo' -replace '(oo)', ('$1' + $suffix)
# OK, with disambiguation via {...} -> 'foo42'
$suffix = '42'; 'foo' -replace '(oo)', ('${1}' + $suffix)

Regex-based string replacement operation not working [duplicate]

This question already has an answer here:
powershell: how to escape all regex characters from a string
(1 answer)
Closed 3 years ago.
I have a file exp.txt with below content
hi. what are you doing. are you fine?
This is my code:
$a = "are you fine?"
for($i=0;$i -lt $ar.Length; $i++)
{
(Get-Content "exp.txt") -replace $ar[$i], "$&~" | Set-Content "exp.txt"
}
I want to append ~ at the end of whatever text i put in $ar variable. But the above code does not work if i type "are you fine?" in $ar variable. Symbols like "*" "$" "+" also dont work. i need help
The -replace uses regular expressions, so special characters are interpreted. Looks like you don't want that, so try using
(Get-Content "exp.txt").Replace("something", "something else")
I'm not clear on exactly what you want in the end, but this should get you past the problem with special characters.
In order to treat a search expression passed to the regex-based -replace operator as a literal string, you must escape regex metacharacters such as ?, which you can do programmatically by calling [regex]::Escape() (in a literal regex string, you could manually escape indiv. chars. with \):
(Get-Content exp.txt) -replace [regex]::Escape($ar[$i]), '$&~'
Note how I've used '...' (single-quoting) for the replacement operand, which is preferable so as to avoid confusion with the string expansion that PowerShell itself performs up front in "..." (double-quoted) strings. Similarly, '...' is preferable for a literal search regex.
Sticking with the regex-based -replace operator in your case makes sense for two reasons:
You want to (directly and succinctly) refer to whatever was matched by the regex in the replacement expression, via placeholder $&.
You want to perform replacements on an array of input strings in a single operation (like many PowerShell operators, -replace supports an array of strings as its LHS, and then operates on each element).
By contrast, in cases where literal string substitutions with a single input string is needed, use of the [string] type's .Replace() method is the simpler and faster option, but note that it is case-sensitive by default (unlike -replace).
See this answer for details.

How to use a variable as part of a regular expression in PowerShell

I want to Select-String parts of a file path starting at a string value that is contained in a variable. Let me explain this in an abstracted example.
Let's assume this path: /docs/reports/test reports/document1.docx
Using a regular expression I can get the required string like so:
'^.*(?=\/test\s)'
https://regex101.com/r/6mBhLX/5
The resulting string is '/test reports/document1.docx'.
Now, for this to work I have to use the literal string 'test'. However, I would like to know how to use a variable that contains 'test', e.g. $myString.
I already looked at How do you use a variable in a regular expression?, but I couldn't figure out how to adapt this for PowerShell.
I suggest using $([regex]::escape($myString)) inside a double quoted string literal:
$myString="[test]"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
Or, in case you do not want to worry with additional escaping, use a regular concatenation using + operator:
$pattern = '^.*(?=/' + [regex]::escape($myString) +'\s)'
The resulting $pattern will look like ^.*(?=/\[test]\s). Since the $myString variable is a literal string, you need to escape all special regex metacharacters (with [regex]::escape()) that may be inside it for the regex engine to interpret it as literal chars.
In your case, you may use
$s = '/docs/reports/test reports/document1.docx'
$myString="test"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
$s -replace $pattern
Result: /test reports/document1.docx
Wiktor Stribiżew's helpful answer provides the crucial pointer:
Use [regex]::Escape() in order to escape a string for safe inclusion in a regex (regular expression) so that it is treated as a literal;
e.g., [regex]::Escape('$10?') yields \$10\? - the characters with special meaning to a regex were \-escaped.
However, I suggest using '...', i.e., building the regex from single-quoted aka verbatim strings:
$myString='test'
$regex = '^.*(?=/' + [regex]::escape($myString) + '\s)'
Using the -f operator - $regex = '^.*(?=/{0}'\s)' -f [regex]::Escape($myString) works too and is perhaps visually cleaner, but note that -f - unlike string concatenation with + - is culture-sensitive, which can lead to different results.
Using '...' strings in regex contexts in PowerShell is a good habit to form:
By avoiding "...", so-called expandable strings, you avoid additional up-front interpretation (interpolation a.k.a expansion) of the string, which can have unexpected effects, given that $ has special meaning in both contexts: the start of
a variable reference or subexpression when string-expanding, and the end-of-input marker in regexes.
Using "..." can be especially tricky in the replacement string of the regex-based -replace operator, in whose replacement string operand tokens such as $1 refer to capture-group results, and if you used "$1", PowerShell would try to expand a $1 variable, which presumably doesn't exist, resulting in the empty string.
Just write the variable within double quotes ("pattern"), like this:
PS > $pattern = "^\d+\w+"
PS > "357test*&(fdnsajkfj" -match $pattern # return true
PS > "357test*&(fdnsajkfj" -match "$pattern.*\w+$" # return true
PS > "357test*&(fdnsajkfj" -match "$pattern\w+$" # return false
Please have a try. :)

Perl split pattern

According to the perldoc, the syntax for split is:
split /PATTERN/,EXPR,LIMIT
But the PATTERN can also be a single- or double-quoted string: split "PATTERN", EXPR. What difference does it make?
Edit: A difference I'm aware of is splitting on backslashes: split /\\/ vs split '\\'. The second form doesn't work.
It looks like it uses that as "an expression to specify patterns":
The pattern /PATTERN/ may be replaced
with an expression to specify patterns
that vary at runtime. (To do runtime
compilation only once, use
/$variable/o .)
edit: I tested it with this:
my $foo = 'a:b:c,d,e';
print join(' ', split("[:,]", $foo)), "\n";
print join(' ', split(/[:,]/, $foo)), "\n";
print join(' ', split(/\Q[:,]\E/, $foo)), "\n";
Except for the ' ' special case, it looks just like a regular expression.
PATTERN is always interpreted as... well, a pattern -- never as a literal value. It can be either a regex1 or a string. Strings are compiled to regexes. For the most part the behavior is the same, but there can be subtle differences caused by the double interpretation.
The string '\\' only contains a single backslash. When interpreted as a pattern, it's as if you had written /\/, which is invalid:
C:\>perl -e "print join ':', split '\\', 'a\b\c'"
Trailing \ in regex m/\/ at -e line 1.
Oops!
Additionally, there are two special cases:
The empty pattern //, which splits on the empty string.
A single space ' ', which splits on whitespace after first trimming any
leading or trailing whitespace.
1. Regexes can be supplied either inline /.../ or via a precompiled qr// quoted string.
I believe there's no difference. A string pattern is also interpreted as a regular expression.
perl -e 'print join("-",split("[a-e]","regular"))';
r-gul-r
As you see, the delimiter is interpreted as a regular expression, not a string literal.
So, it's mostly the same - with one important exception: split(" ",... ) and split(/ /,... ) are different.
I prefer to use /PATTERN/ to avoid confusion, it's easy to forget that it's a regexp otherwise.
Two observable rules:
the special case split(" ") is equivalent to split(/\s+/).
for everything else (it seems—don't nail me), split("something") is equal to split(/something/)