The regular expression used in JavaScript does not work in Java - regex

replace(/[.?+^$[\]\\(){}|-]/g, '\\$&');
But it doesn't work in Java.
So I changed the code as follows.
replace(/[.?+^$[\\]\\\\(){}|-]/g, '\\\\$&');
It doesn't work when I change it. Please help me :(

In Java, replace does not take a regex in the constructor, for that you need replaceFirst.
But as you are using the /g flag in Javascript for all replacements, you can use replaceAll.
In Javascript, this part $& in the replacement points to the full match.
So you want to replace the full match (which is one of these characters [.?+^$[\]\\(){}|-]) prepended by a \
In Java you can use $0 instead to refer to the full match.
You can also escape the opening square bracket in the character class \\[
For example
System.out.println("{test?test^}".replaceAll("[.?+^$\\[\\]\\\\()\\{}|-]", "\\\\$0"));
See a Java demo
Output
\{test\?test\^\}
The same output in Javascript
console.log("{test?test^}".replace(/[.?+^$[\]\\(){}|-]/g, '\\$&'));

Related

Using Ruby gsub with regex as replacement

Ruby gsub supports using regex as pattern to detect input
and it also may allow to use match group number in replacement
for example, if that's a regex detecting lowercase letters at the beginning of any word, and puts a x before it and a y after it
this would give perfect result:
"testing gsub".gsub(/(?<=\b)[a-z]/,'x\0y')
#=> "xtyesting xgysub"
But if I want to use regex to convert this match group to uppercase
in normal regex, one can normally do this \U\$0 as explained here
unfortunately when I try like this:
"testing gsub".gsub(/(?<=\b)[a-z]/,'\U\0')
#=> "\\Utesting \\Ugsub"
also, if I try using raw regex in replacement field like this:
"testing gsub".gsub(/(?<=\b)[a-z]/,/\U\0/)`
I get type error:
TypeError (no implicit conversion of Regexp into String)
I'm totally aware of the option to do it using maps like this:
"testing gsub".gsub(/(?<=\b)[a-z]/,&:upcase)
But unfortunately, the rules (pattern, replacement) are being loaded from a .yaml file and they are applied to string this way:
input.gsub(rule['pattern'], rule['replacement'])
and I am not able to store &:upcase in .yaml to be taken as a raw string
A workaround I may do is to detect if upcase is the replacement got "upcase"
and do it this way
"testing gsub".gsub(/(?<=\b)[a-z]/) {|l| l.send("upcase")}
But I don't want to modify this logic:
input.gsub(rule['pattern'], rule['replacement'])
If there is a workaround to either use regex in gsub replacement, or to store methods like &:upcase in YAML without being loaded as a string, it'd be perfect.
Thanks!
TL;DR
You can't do what you want the way you want. This is documented in the Onigmo source. You'll have to use a different approach, or refactor other areas of your code to simulate the behavior you want.
Escapes Like \U Not Available in Ruby
Special escapes like \U are extensions to GNU sed or ported from the PCRE library. They are not part of Ruby's current regular expression engine. The Onigmo source clearly mentions that these escapes are missing:
A-3. Missing features compared with perl 5.18.0
+ \N{name}, \N{U+xxxx}, \N
+ \l,\u,\L,\U, \C
+ \v, \V, \h, \H
+ (?{code})
+ (??{code})
+ (?|...)
+ (?[])
+ (*VERB:ARG)
Other Approaches
You can do what you want in a number of different ways, such as using the block form of String#gsub to call String#upcase on each match. For example:
"testing gsub".gsub(/\b\p{Lower}+/) { |m| m.upcase }
#=> "TESTING GSUB"
You will also have to use the block form if you want to reliably reference certain match variables like $& or $1, as the variables might otherwise refer to text from previous matches. For illustration, consider:
"foo bar".gsub /\b\p{Lower}+/, "#{$&.upcase}"
#=> "BAR BAR"
As this is primarily an X/Y problem, you may be happier with the answers you receive if you post a related question with an example of your YAML source and your current code for parsing your regular expression matches/substitutions. Perhaps there's a way to wrap or refactor your code that you haven't considered, but you aren't going to be able to solve this the way you want.

Regex doesn't match. Online generator does

I've want to check with a regex this kind of string:
2020_2021_01_01
I've putted it in a variable, say $session
so i do:
if [[ "$session" =~ \d{4}[_]\d{4}[_]\d{2}[_]\d{2} ]]; then
stuff
fi
you see...it doesn't work... but I don't know why....
any help?
THANKS!
The bash manual rather tersely explains that when the =~ operator "is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3))".
Here, regex(3) is a reference to man 3 regex, which might explain what an "extended regular expression" is. A longer description would be "Posix standard extended regular expressions", and you can find the documentation for those in the Posix document. If you're using an online regular expression tester, make sure you select "Posix regular expressions".
In short, they don't include Perlisms like \d. You can write [[:digit:]] or (if you are using the C locale) [0-9].
So your regex could have been written:
([[:digit:]]{4}_){2}[[:digit:]]{2}_[[:digit:]]{2}
(there is no need to quote _). However, be aware that the =~ operator looks for a substring which matches the pattern, rather than testing whether the left-hand operator precisely matches the pattern. So you quite possibly actually wanted an anchored match:
^([[:digit:]]{4}_){2}[[:digit:]]{2}_[[:digit:]]{2}$
The backslash character is an escape character in bash shell. In your example, I think that's making the the regular expression read like this:
d{4}[_]d{4}[_]d{2}[_]d{2}
You could confirm this by testing, setting $session to dddd_dddd_dd_dd
To workaround this, to preserve the backslash character in the regular expression, you'll need to "escape" it. In your case, preceding each backslash with an "extra" backslash may do the trick. The shell will see the two backslashes, and leave the second one, as part of the string.
if [[ "$session" =~ \\d{4}[_]\\d{4}[_]\\d{2}[_]\\d{2} ]]; then
I'm not sure if there are other characters that are going to need to be escaped. This calls for a real short script, one that you can change and run, to figure out what's working and whats not. Can you match the start of the string, a single digit character, etc.
(The whole escaping thing gets funkier... inside double quotes, inside single quotes, ...)
There was a website I used to use, put in the string I wanted, and it would give me back what it needed to look like in the shell script, I don't have a link to that anymore. There's probably a regular expression tester that let's you test "bash" regular expressions.

How to escape regular expression characters from variable in JMeter?

Problem is simple. I have regular expression used to extract some data from response. It looks like that:
<input type="hidden" +name="reportpreset_id" +value="(\w+)" *>${reportPresetName}</td>
Problem is that variable ${reportPresetName} may contain characters used by regular expression like parenthesis or dots.
I've tried to surround this variable with \Q and \E (based on that) but apparently these markers don't work (apparently Java supports this markers so I'm confused).
When I'm adding that markers even then this expression fails for any content of ${reportPresetName} variable (even for cases when it was working without those markers).
I've checked list of functions in JMeter, but I didn't found anything useful.
Does anyone know how to escape regular expression characters in JMeter?
update:
When I'm using this \Q and \E with assertion it fails. When I'm doing a copy of regular expression from assertion log in "View Results Tree" and testing it on recorded response data it works! So it looks like some kind bug in JMeter.
Jmeter uses jakarta ORO as its regexp engine in Regexp Extractor and Regexp Tester:
http://jmeter.apache.org/usermanual/regular_expressions.html
But it uses Java Regexp Engine for search in HTML/Text Viewer.
Read:
http://jmeter.apache.org/usermanual/regular_expressions.html $20.4
Please note that ORO does not support the \Q and \E meta-characters.
[In other RE engines, these can be used to quote a portion of an RE so that the
meta-characters stand for themselves.]
A solution for you would be to add a JSR223 post processor using Groovy after regexp that extracts the var and escapes regexp chars using:
org.apache.oro.text.regex.Perl5Compiler.quotemeta(String valueToEscape)
As of upcoming version 2.9, a new function has been created to do so:
__escapeOroRegexpChars(String to escape, Variable Name)
\Q and \E work in Java, see Pattern.
In Java, we use to double the backslash characters, though, so you might need to use (\\w+) and, of course, \\Q and \\E.
I am not sure in your case, as I don't understand your context, actually (never used JMeter so far).
In case JMeter does not support \\Q and \\E (which I don't know if does...), you can write your own function/procedure, where you will split string into characters and replace each character with escaped sequence as follows:
if the character is \, then replace it with \\\\
otherwise add before the character a prefix \\
This is not the optimal method, but for sure it will work as needed.
For example for input
This is-a\string 12&$34|!`^5
you will get
\\T\\h\\i\\s\\ \\i\\s\\-\\a\\\\s\\t\\r\\i\\n\\g\\ \\1\\2\\&\\$\\3\\4\\|\\!\\`\\^\\5

How to search (using regex) for a regex literal in text?

I just stumbled on a case where I had to remove quotes surrounding a specific regex pattern in a file, and the immediate conclusion I came to was to use vim's search and replace util and just escape each special character in the original and replacement patterns.
This worked (after a little tinkering), but it left me wondering if there is a better way to do these sorts of things.
The original regex (quoted): '/^\//' to be replaced with /^\//
And the search/replace pattern I used:
s/'\/\^\\\/\/'/\/\^\\\/\//g
Thanks!
You can use almost any character as the regex delimiter. This will save you from having to escape forward slashes. You can also use groups to extract the regex and avoid re-typing it. For example, try this:
:s#'\(\\^\\//\)'#\1#
I do not know if this will work for your case, because the example you listed and the regex you gave do not match up. (The regex you listed will match '/^\//', not '\^\//'. Mine will match the latter. Adjust as necessary.)
Could you avoid using regex entirely by using a nice simple string search and replace?
Please check whether this works for you - define the line number before this substitute-expression or place the cursor onto it:
:s:'\(.*\)':\1:
I used vim 7.1 for this. Of course, you can visually mark an area before (onto which this expression shall be executed (use "v" or "V" and move the cursor accordingly)).

regex implementation to replace group with its lowercase version

Is there any implementation of regex that allow to replace group in regex with lowercase version of it?
If your regex version supports it, you can use \L, like so in a POSIX shell:
sed -r 's/(^.*)/\L\1/'
In Perl, you can do:
$string =~ s/(some_regex)/lc($1)/ge;
The /e option causes the replacement expression to be interpreted as Perl code to be evaluated, whose return value is used as the final replacement value. lc($x) returns the lowercased version of $x. (Not sure but I assume lc() will handle international characters correctly in recent Perl versions.)
/g means match globally. Omit the g if you only want a single replacement.
If you're using an editor like SublimeText or TextMate1, there's a good chance you may use
\L$1
as your replacement, where $1 refers to something from the regular expression that you put parentheses around. For example2, here's something I used to downcase field names in some SQL, getting everything to the right of the 'as' at the end of any given line. First the "find" regular expression:
(as|AS) ([A-Za-z_]+)\s*,$
and then the replacement expression:
$1 '\L$2',
If you use Vim (or presumably gvim), then you'll want to use \L\1 instead of \L$1, but there's another wrinkle that you'll need to be aware of: Vim reverses the syntax between literal parenthesis characters and escaped parenthesis characters. So to designate a part of the regular expression to be included in the replacement ("captured"), you'll use \( at the beginning and \) at the end. Think of \ as—instead of escaping a special character to make it a literal—marking the beginning of a special character (as with \s, \w, \b and so forth). So it may seem odd if you're not used to it, but it is actually perfectly logical if you think of it in the Vim way.
1 I've tested this in both TextMate and SublimeText and it works as-is, but some editors use \1 instead of $1. Try both and see which your editor uses.
2 I just pulled this regex out of my history. I always tweak regexen while using them, and I can't promise this the final version, so I'm not suggesting it's fit for the purpose described, and especially not with SQL formatted differently from the SQL I was working on, just that it's a specific example of downcasing in regular expressions. YMMV. UAYOR.
Several answers have noted the use of \L. However, \E is also worth knowing about if you use \L.
\L converts everything up to the next \U or \E to lowercase. ... \E turns off case conversion.
(Source: https://www.regular-expressions.info/replacecase.html )
So, suppose you wanted to use rename to lowercase part of some file names like this:
artist_-_album_-_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
artist_-_album_-_Another_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
you could do something like:
rename -v 's/^(.*_-_)(.*)(_-_.*.m4a)/$1\L$2\E$3/g' *
In Perl, there's
$string =~ tr/[A-Z]/[a-z]/;
Most Regex implementations allow you to pass a callback function when doing a replace, hence you can simply return a lowercase version of the match from the callback.