Regex-based string replacement operation not working [duplicate] - regex

This question already has an answer here:
powershell: how to escape all regex characters from a string
(1 answer)
Closed 3 years ago.
I have a file exp.txt with below content
hi. what are you doing. are you fine?
This is my code:
$a = "are you fine?"
for($i=0;$i -lt $ar.Length; $i++)
{
(Get-Content "exp.txt") -replace $ar[$i], "$&~" | Set-Content "exp.txt"
}
I want to append ~ at the end of whatever text i put in $ar variable. But the above code does not work if i type "are you fine?" in $ar variable. Symbols like "*" "$" "+" also dont work. i need help

The -replace uses regular expressions, so special characters are interpreted. Looks like you don't want that, so try using
(Get-Content "exp.txt").Replace("something", "something else")
I'm not clear on exactly what you want in the end, but this should get you past the problem with special characters.

In order to treat a search expression passed to the regex-based -replace operator as a literal string, you must escape regex metacharacters such as ?, which you can do programmatically by calling [regex]::Escape() (in a literal regex string, you could manually escape indiv. chars. with \):
(Get-Content exp.txt) -replace [regex]::Escape($ar[$i]), '$&~'
Note how I've used '...' (single-quoting) for the replacement operand, which is preferable so as to avoid confusion with the string expansion that PowerShell itself performs up front in "..." (double-quoted) strings. Similarly, '...' is preferable for a literal search regex.
Sticking with the regex-based -replace operator in your case makes sense for two reasons:
You want to (directly and succinctly) refer to whatever was matched by the regex in the replacement expression, via placeholder $&.
You want to perform replacements on an array of input strings in a single operation (like many PowerShell operators, -replace supports an array of strings as its LHS, and then operates on each element).
By contrast, in cases where literal string substitutions with a single input string is needed, use of the [string] type's .Replace() method is the simpler and faster option, but note that it is case-sensitive by default (unlike -replace).
See this answer for details.

Related

powershell -replace: surround captured regex group with dollar signs like: $group$

I want to replace strings like url: `= this.url` with url: $url$
I got quite close with this:
(Get-Content '.\file') -Replace "``= this.(\w+)``", "$ `$1$"
with output url: $ url$.
But when I remove extra space then the output breaks.
How can I escape/modify "$`$1$" so that it works?
You can use
-Replace "``= this\.(\w+)``", '$$$1$$'
Note that
The . must be escaped in the regex pattern
'$$$1$$' is a $$$1$$ string that contains:
$$ - a literal single $ char
$1 - the backreference to the first capturing group
$$ - a literal single $ char.
Powershell 7 version of -replace with a scriptblock 2nd argument. Just assigning $_ into $a to look at it. Note the backquote is a special character inside doublequotes, which I'm avoiding.
'url: `= this.url`' -replace '`= this\.(\w+)`', {$a = $_; '$' + $_.groups[1] + '$'}
url: $url$
$a
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 5
Length : 12
Value : `= this.url`
ValueSpan :
tl;dr
# * Consistent use of '...', obviating the need to `-escape ` and $
# * Verbatim $ chars. in the substitution string escaped as $$
# * Capture-group reference $1 represented as ${1} for visual clarity.
(Get-Content .\file) -replace '`= this\.(\w+)`', '$$${1}$$'
Background information and guidance:
In the substitution operand of PowerShell's regex-based -replaceoperator, a verbatim $ character must be escaped as $$, given that $-prefixed tokens have special meaning, namely to refer to results of the regex matching operation, such as $1 in your command (a reference to what the 1st, unnamed capture group in the search regex captured).
Unlike what the docs partially suggest, such a substitution string is not itself a regex, and any other characters are used verbatim.
To programmatically escape $ for verbatim use in a substitution string, it's simplest to use the .Replace() .NET string method, which performs _verbatim (literal) replacements (assuming that all $ instance are to be escaped; e.g. '$foo$'.Replace('$', '$$')
Note that, situationally, a capture-group reference such as $1 may need to be disambiguated as ${1}, and you may always choose to do that for visual clarity, as shown above.
It is only the search operand is a regex, and there all characters that are regex metacharacters must be \-escaped in order to be used verbatim, which can be done:
character-individually, in string literals (amount: \$)
programmatically, for entire strings, using [regex]::Escape() ([regex]::Escape('amount: $'))
To avoid confusion over up-front string interpolation by PowerShell vs. what the .NET regex engine ends up seeing, it's best to consistently use verbatim (single-quoted) strings ('...') rather than expandable (double-quoted) strings ("...").
If embedding PowerShell variable values is needed, use techniques such as:
string concatenation ('^' + [regex]::Escape($foo) + '$')
or -f, the format operator ('^{0}$' -f [regex]::Escape($foo))
In your case, using '...' helps you avoid the `-escaping that "..." requires to make PowerShell treat $ and ` (and ") verbatim, as shown above.
For a comprehensive overview of PowerShell's -replace operator, see this answer.

matching cond in perl using double exclaimation

if ($a =~ m!^$var/!)
$var is a key in a two dimensional hash and $a is a key in another hash.
What is the meaning of this expressions?
This is a regular expression ("regex"), where the ! character is used as the delimiter for the pattern that is to be matched in the string that it binds to via the =~ operator (the $a† here).
It may clear it up to consider the same regex with the usual delimiter instead, $a =~ /^$var\// (then m may be omitted); but now any / used in the pattern clearly must be escaped. To avoid that unsightly and noisy \/ combo one often uses another character for the delimiter, as nearly any character may be used (my favorite is the curlies, m{^$var/}). ‡ §
This regex in the question tests whether the value in the variable $a begins with (by ^ anchor) the value of the variable $var followed by / (variables are evaluated and the result used). §
† Not a good choice for a variable name since $a and $b are used by the builtin sort
‡ With the pattern prepared ahead of time the delimiter isn't even needed
my $re = qr{^$var/};
if ($string =~ $re) ...
(but I do like to still use // then, finding it clearer altogether)
Above I use qr but a simple q() would work just fine (while I absolutely recommend qr). These take nearly any characters for the delimiter, as well.
§ Inside a pattern the evaluated variables are used as regex patterns, what is wrong in general (when this is intended they should be compiled using qr and thus used as subpatterns).
An unimaginative example: a variable $var = q(\s) (literal backslash followed by letter s) evaluated inside a pattern yields the \s sequence which is then treated as a regex pattern, for whitespace. (Presumably unintended; we just wanted \ and s.)
This is remedied by using quotemeta, /\Q$var\E/, so that possible metacharacters in $var are escaped; this results in the correct pattern for the literal characters, \\s. So a correct way to write the pattern is m{^\Q$var\E/}.
Failure to do this also allows the injection bug. Thanks to ikegami for commenting on this.
The match operator (m/.../) is one of Perl's "quote-like" operators. The standard usage is to use slashes before and after the regex that goes in the middle of the operator (and if you use slashes, then you can omit the m from the start of the operator). But if the regex itself contains a slash then it is convenient to use a different delimiter instead to avoid having to escape the embedded slash. In your example, the author has decided to use exclamation marks, but any non-whitespace character can be used.
Many Perl operators work like this - m/.../, s/.../.../, tr/.../.../, q/.../, qq/.../, qr/.../, qw/.../, qx/.../ (I've probably forgotten some).

Matching a variable that contains Regex characters [duplicate]

I am doing a string replacement in PowerShell. I have no control over the strings that are being replaced, but I can reproduce the issue I'm having this way:
> 'word' -replace 'word','##$+'
##word
When the actual output I need is
> 'word' -replace 'word','##$+'
##$+
The string $+ is being expanded to the word being replaced, and there's no way that I can find to stop this from happening. I've tried escaping the $ with \ (as if it was a regex), with backtick ` (as is the normal PowerShell way). For example:
> 'word' -replace 'word',('##$+' -replace '\$','`$')
##`word
How can I replace with a literal $+ in PowerShell? For what it's worth I'm running PowerShell Core 6, but this is reproducible on PowerShell 5 as well.
Instead of using the -replace operator, you can use the .Replace() method like so:
PS> 'word'.Replace('word','##$+')
##$+
The .Replace() method comes from the .NET String class whereas the -Replace operator is implemented using System.Text.RegularExpressions.Regex.Replace().
More info here: https://vexx32.github.io/2019/03/20/PowerShell-Replace-Operator/
I couldn't find it documented but the "Visual Basic" style escaping rule worked, repeat the character.
'word' -replace 'word','##$$+' gives you: ##$+
It's confusing how these codes aren't documented under "about comparison operators". Except I have a closed bug report about them underneath if you look, 'wow, where are all these -replace 2nd argument codes documented?'. https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_comparison_operators?view=powershell-7 That's become my reference. "$+" means "Substitutes the last submatch captured". Doubling up the dollar signs works, 'substitutes a single "$" literal':
'word' -replace 'word','##$$+'
##$+
Or use the scriptblock version of the second argument in PS 6 and above (may be slower):
'word' -replace 'word',{'##$+'}
tl;dr:
Double the $ in your replacement operand to use it verbatim:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$'
##$+
PowerShell's -replace operator:
uses a regex (regular expression) as the search (1st) operand.
If you want to use a search string verbatim, you must escape it:
programmatically: with [regex]::Escape()
or, in string literals, you can alternatively \-escape individual characters that would otherwise be interpreted as regex metacharacters.
uses a non-literal string that can refer to what the regex matched as the replacement (2nd) operand, via $-prefixed tokens such as $& or $+ (see link above or Substitutions in Regular Expressions).
To use a replacement string verbatim, double any $ chars. in it, which is programmatically most easily done with .Replace('$', '$$') (see below).
If both your search string and your replacement string are to be used verbatim, consider using the [string] type's .Replace() method instead, as shown in Brandon Olin's helpful answer.
Caveat: .Replace() is case-sensitive by default, whereas -replace is case-insensitive (as PowerShell generally is); use a different .Replace() overload for case-insensitivity, or, conversely, use the -creplace variant in PowerShell to get case-sensitivity.
[PowerShell (Core) 7+ only] Case-insensitive .Replace() example:
'FOO'.Replace('o', '#', 'CurrentCultureIgnoreCase')
.Replace() only accepts a single string as input, whereas -replace accepts an array of strings as the LHS; e.g.:
'hi', 'ho' -replace 'h', 'f' # -> 'fi', 'fo'
.Replace() is faster than -replace, though that will only matter in loops with high iteration counts.
If you were to stick with the -replace operator in your case:
As stated, doubling the $ in your replacement operand ensures that they're treated verbatim in the replacement:
PS> 'word' -replace 'word', '##$$+' # note the doubled '$$'
##$+
To do this simple escaping programmatically, you can leverage the .Replace() method:
'word' -replace 'word', '##$+'.Replace('$', '$$')
You could also do it with a nested -replace operation, but that gets unwieldy (\$ escapes a $ in the regex; $$ represent a single $ in the replacement string):
# Same as above.
'word' -replace 'word', ('##$+' -replace '\$', '$$$$')
To put it differently: The equivalent of:
'word'.Replace('word', '##$+')
is (note the use of the case-sensitive variant of the -replace operator, -creplace):
'word' -creplace [regex]::Escape('word'), '##$+'.Replace('$', '$$')
However, as stated, if both the search string and the replacement operand are to be used verbatim, using .Replace() is preferable, both for concision and performance.

How to use Regular expression in perl if both regular expression and strings are variables

I have two variables coming from some user inputs. One is a string that needs to be checked and other one is a regular expression as below.
Following code doesn't work.
my $pattern = "/^current.*$/";
my $name = "currentStateVector";
if($name =~ $pattern) {
print "matches \n";
} else {
print "doesn't match \n";
}
And following does.
if($name =~ /^current.*$/) {
print "matches \n";
} else {
print "doesn't match \n";
}
What's the reason for this. I've the regular expression stored in a variable. Is there another way to store this variable or modify it?
The double-quotes that you use interpolate -- they first evaluate what's inside them (variables, escapes, etc) and return a string built with evaluations' results and remaining literals. See Gory details of parsing quoting constructs for an illuminating discussion, with lots of detail.
And your example string happens to have a $/ there, which is one of Perl's global variables (see perlvar) so $pattern is different than expected; print it to see. (In this case the / is erroneous as discussed below but the point stands.)
Instead, either use single quotes to avoid interpretation of characters like $ and \ (etc) so that they are used in regex as such
my $pattern = q(^current.*$);
or, better, use the regex-specific qr operator
my $pattern = qr/^current.*$/;
which builds from its string a proper regex pattern (a special type of Perl value), and allows use of modifiers. In this case you need to escape characters that have a special meaning in regex if you want them to be treated as literals.
Note that there's no need for // for the regex, and they wouldn't be a part of the pattern anyway -- having them around the actual pattern is wrong.
Also, carefully consider all circumstances under which user input may end up being used.
It is brought up in a comment that users may submit a "pattern" with extra /'s. That'd be wrong, as mentioned above; only the pattern itself should be given (surrounded on the command-line by ', so that the shell doesn't interpret particular characters in it). More detail follows.
The /'s are clearly not meant as a part of the pattern, but are rather intended to come with the match operator, to delimit (quote) the regex pattern itself (in the larger expression) so that one can use string literals in the pattern. Or they are used for clarity, and/or to be able to specify global modifiers (even though those can be specified inside patterns as well).
But then if users still type them around the pattern the regex will use those characters as a part of the pattern and will try to match a leading /, etc; it will fail, quietly. Make sure that users know that they need to give a pattern alone, with no delimiters.
If this is likely to be a problem I'd check for delimiters and if found carry on with a "loud" (clear) warning. What makes this tricky is the fact that a pattern starting and ending with a slash is legitimate -- it is possible, if somewhat unlikely, that a user may want actual /'s in their pattern. So you can only ask, or raise a warning, not abort.
Note that with a pattern given in a variable, or with an expression yielding a pattern at runtime, the explicit match operator and delimiters aren't needed for matching; the variable or the expression's return is taken as a search pattern and used for matching. See The basics (perlre) and Binding Operators (perlop).
So you can do simply $name =~ $pattern. Of course $name =~ /$pattern/ is fine as well, where you can then give global modifiers after the closing /
The slashes are part of the matching operator m//, not part of the regex.
When I populate the regex from user input
my $pattern = shift;
and run the script as
58663971.pl '^current.*$'
it matches.

How to use a variable as part of a regular expression in PowerShell

I want to Select-String parts of a file path starting at a string value that is contained in a variable. Let me explain this in an abstracted example.
Let's assume this path: /docs/reports/test reports/document1.docx
Using a regular expression I can get the required string like so:
'^.*(?=\/test\s)'
https://regex101.com/r/6mBhLX/5
The resulting string is '/test reports/document1.docx'.
Now, for this to work I have to use the literal string 'test'. However, I would like to know how to use a variable that contains 'test', e.g. $myString.
I already looked at How do you use a variable in a regular expression?, but I couldn't figure out how to adapt this for PowerShell.
I suggest using $([regex]::escape($myString)) inside a double quoted string literal:
$myString="[test]"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
Or, in case you do not want to worry with additional escaping, use a regular concatenation using + operator:
$pattern = '^.*(?=/' + [regex]::escape($myString) +'\s)'
The resulting $pattern will look like ^.*(?=/\[test]\s). Since the $myString variable is a literal string, you need to escape all special regex metacharacters (with [regex]::escape()) that may be inside it for the regex engine to interpret it as literal chars.
In your case, you may use
$s = '/docs/reports/test reports/document1.docx'
$myString="test"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
$s -replace $pattern
Result: /test reports/document1.docx
Wiktor Stribiżew's helpful answer provides the crucial pointer:
Use [regex]::Escape() in order to escape a string for safe inclusion in a regex (regular expression) so that it is treated as a literal;
e.g., [regex]::Escape('$10?') yields \$10\? - the characters with special meaning to a regex were \-escaped.
However, I suggest using '...', i.e., building the regex from single-quoted aka verbatim strings:
$myString='test'
$regex = '^.*(?=/' + [regex]::escape($myString) + '\s)'
Using the -f operator - $regex = '^.*(?=/{0}'\s)' -f [regex]::Escape($myString) works too and is perhaps visually cleaner, but note that -f - unlike string concatenation with + - is culture-sensitive, which can lead to different results.
Using '...' strings in regex contexts in PowerShell is a good habit to form:
By avoiding "...", so-called expandable strings, you avoid additional up-front interpretation (interpolation a.k.a expansion) of the string, which can have unexpected effects, given that $ has special meaning in both contexts: the start of
a variable reference or subexpression when string-expanding, and the end-of-input marker in regexes.
Using "..." can be especially tricky in the replacement string of the regex-based -replace operator, in whose replacement string operand tokens such as $1 refer to capture-group results, and if you used "$1", PowerShell would try to expand a $1 variable, which presumably doesn't exist, resulting in the empty string.
Just write the variable within double quotes ("pattern"), like this:
PS > $pattern = "^\d+\w+"
PS > "357test*&(fdnsajkfj" -match $pattern # return true
PS > "357test*&(fdnsajkfj" -match "$pattern.*\w+$" # return true
PS > "357test*&(fdnsajkfj" -match "$pattern\w+$" # return false
Please have a try. :)