Double-escaping regex from inside a Groovy expression - regex

Note: I had to simplify my actual use case to spare SO a lot of backstory. So if your first reaction to this question is: why would you ever do this, trust me, I just need to.
I'm trying to write a Groovy expression that replaces double-quotes (""") that appear in a string with single-quotes ("'").
// BEFORE: Replace my "double" quotes with 'single' quotes.
String toReplace = "Replace my \"double-quotes\" with 'single' quotes.";
// Wrong: compiler error
String replacerExpression = "toReplace.replace(""", "'");";
Binding binding = new Binding();
binding.setVariable("toReplace", toReplace);
GroovyShell shell = new GroovyShell(binding);
// AFTER: Replace my 'double' quotes with 'single' quotes.
String replacedString = (String)shell.evaluate(replacerExpression);
The problem is, I'm getting a compile error on the line where I assign replacerExpression:
Syntax error on token ""toReplace.replace("", { expected
I think it's because I need to escape the string that contains the double-quote character (""") but since it's a string-inside-a-string, I'm not sure how to properly escape it here. Any ideas?

You need to escape the quote within quotes in this line:
String replacerExpression = "toReplace.replace(""", "'");";
The string will be evaluated twice: once as a string literal, and once as a script. This means you have to escape it with a backslash, and escape the backslash too. Also, with the embedded quotes, it'll be much more readable if you use triple quotes.
Try this (in groovy):
String replacerExpression = """toReplace.replace("\\"", "'");""";
In Java, you're stuck with using backslashes to escape all the quotes and the embedded backslash:
String replacerExpression = "toReplace.replace(\"\\\"\", \"\'\");";

Triple-quotes work well, but one can also use single-quoted string to specify a double-quote, and a double-quoted string for a single-quote.
Consider this:
String toReplace = "Replace my \"double-quotes\" with 'single' quotes."
// key line:
String replacerExpression = """toReplace.replace('"', "'");"""
Binding binding = new Binding(); binding.setVariable("toReplace", toReplace)
GroovyShell shell = new GroovyShell(binding)
String replacedString = (String)shell.evaluate(replacerExpression)
That is, after the string literal evaluation, this is evaluated in the Groovy shell:
toReplace.replace('"', "'");
If that is too hard on the eyes, replace the "key line" above with another style (using slashy strings):
String ESC_DOUBLE_QUOTE = /'"'/
String ESC_SINGLE_QUOTE = /"'"/
String replacerExpression = """toReplace.replace(${ESC_DOUBLE_QUOTE}, ${ESC_SINGLE_QUOTE});"""

Please try to use regular expressions to solve this kind of problems, instead of messing your head to tackle the escaping of quotes.
I have put up a solution using groovy console. Please see if that helps.

Related

Replace every " with \" in Lua

X-Problem: I want to dump an entire lua-script to a single string-line, which can be compiled into a C-Program afterwards.
Y-Problem: How can you replace every " with \" ?
I think it makes sense to try something like this
data = string.gsub(line, "c", "\c")
where c is the "-character. But this does not work of course.
You need to escape both quotes and backslashes, if I understand your Y problem:
data = string.gsub(line, "\"", "\\\"")
or use the other single quotes (still escape the backslash):
data = string.gsub(line, '"', '\\"')
A solution to your X-Problem is to safely escape any sequence that could interfere with the interpreter.
Lua has the %q option for string.format that will format and escape the provided string in such a way, that it can be safely read back by Lua. It should be also true for your C interpreter.
Example string: This \string's truly"tricky
If you just enclosed it in either single or double-quotes, there'd still be a quote that ended the string early. Also there's the invalid escape sequence \s.
Imagine this string was already properly handled in Lua, so we'll just pass it as a parameter:
string.format("%q", 'This \\string\'s truly"tricky')
returns (notice, I used single-quotes in code input):
"This \\string's truly\"tricky"
Now that's a completely valid Lua string that can be written and read from a file. No need to manually escape every special character and risk implementation mistakes.
To correctly implement your Y approach, to escape (invalid) characters with \, use proper pattern matching to replace the captured string with a prefix+captured string:
string.gsub('he"ll"o', "[\"']", "\\%1") -- will prepend backslash to any quote

Escaping dollars groovy

I'm having trouble escaping double dollars from a string to be used with regex functions pattern/matcher.
This is part of the String:
WHERE oid_2 = $$test$$ || oid_2 = $$test2$$
and this is the closest code I've tried to get near the solution:
List<String> strList = new ArrayList<String>();
Pattern pattern = Pattern.compile("\$\$.*?\$\$");
log.debug("PATTERN: "+pattern)
Matcher matcher = pattern.matcher(queryText);
while (matcher.find()) {
strList.add(matcher.group());
}
log.debug(strList)
This is the debug output i get
- PATTERN: $$.*?$$
- []
So the pattern is actually right, but the placeholders are not found in the string.
As a test I've tried to replace "$$test$$" with "XXtestXX" and everything works perfectly. What am I missing? I've tried "/$" strings, "\\" but still have no solution.
Note that a $ in regex matches the end of the string. To use it as a literal $ symbol, you need to escape it with a literal backslash.
You used "\$\$.*?\$\$" that got translated into a literal string like $$.*?$$ that matches 2 end of string positions, any 0+ chars as few as possible and then again 2 end of strings, which has little sense. You actually would need a backslash to first escape the $ that is used in Groovy to inject variables into a double quoted string literal, and then use 2 backslashes to define a literal backslash - "\\\$\\\$.*?\\\$\\\$".
However, when you work with regex, slashy strings are quite helpful since all you need to escape a special char is a single backslash.
Here is a sample code extracting all matches from the string you have in Groovy:
def regex = /\$\$.*?\$\$/;
def s = 'WHERE oid_2 = $$test$$ || oid_2 = $$test2$$'
def m = s =~ regex
(0..<m.count).each { print m[it] + '\n' }
See the online demo.
Anyone who gets here might like to know another answer to this, if you want to use Groovy slashy strings:
myComparisonString ==~ /.*something costs [$]stuff.*/
I couldn't find another way of putting a $ in a slashy string, at least if the $ is to be followed by text. If, conversely, it is followed by a number (or presumably any non-letter), this will work:
myComparisonString ==~ /.*something costs \$100.*/
... the trouble being, of course, that the GString "compiler" (if that's its name) would recognise "$stuff" as an interpolated variable.

Regex is grabbing preceding character

So I am experiencing some inconsistent behavior in my regex
My regex:
(?<=test\\\\)(.*)(?=\",)
The input string:
"test.exe /c echo teststring > \\\\.\\test\\teststring",
When I run this in https://Regex101.com
I get the value teststring however when I run this in F#
Regex.Match(inputString, "(?<=test\\\\)(.*)(?=\",)")
I get \teststring back. My goal is to get just teststring. I'm not sure what I'm doing wrong.
I had success using triple quoted strings. Then only the regex escapes need be considered, and not the F# string escapes.
let inputString = """test.exe /c echo teststring > \\\\.\\test\\teststring","""
let x = Regex.Match(inputString, """(?<=test\\\\)(.*)(?=\",)""")
"teststring" comes out
The string in your source comes out as
(?<=test\\)(.*)(?=",)
If you don't want to use triple quotes or verbatim, you will have to write this in F# :
"(?<=test\\\\\\\\)(.*)(?=\\\",)"
This string in F# uses backslashes to escape backslashes and a quote character. There are eight backslashes in a row in one place, and this then becomes four actual backslashes in the string value. There is also this:
\\\"
which translates to one actual \ and one actual " in the actual string value.
So then we end up with a string value of
(?<=test\\\\)(.*)(?=\",)
This then is the actual string value fed to the regex engine. The regex engine, like the F# compiler, also uses the backslash to escape characters. That's why any actual backslash had to be doubled and then doubled again.

How to change "It's" to "It is" (without apostrophe) using str_replace?

I want to replace from Facebook's relationships string "It's complicated" to other text.
The line is like this:
$user->relationship = str_replace(array('single', 'It's complicated'), array('Soltero(a)', 'Es complicado'),$data['relationship_status']);
Using: 'It's complicated' , 'It&apos;s complicated' or 'It's complicated' ,
do not work.
Any suggestions?
Thanks a lot.
Regards.
If you want to use literal single quoted character ('), you have to escape them.
like:
$str = '\''; // single quote
You could try this.
$user->relationship = str_replace(array('single', 'It\'s complicated'), array('Soltero(a)', 'Es complicado'),$data['relationship_status']);
The PHP could not recognize the single literal quoted character (') without escape sequences character. Here is the explanation about it:
Strings literal
It's also happen for double literal quoted character (").

replaceFirst in Groovy throws Illegal group reference

I have the following code:
String newStr = "aa\$";
print newStr;
print "wwwww ? eeee".replaceFirst("\\?", "'${newStr}'"); // (3)
and I keep getting -- at line 3 -- the following error:
Caught: java.lang.IllegalArgumentException: Illegal group reference
at com.example.MyBuilder.main(MyBuilder.groovy:196)
It looks like that replaceFirst ignores that $ was escaped. How could I let my code run? Does anybody experience such an error?
First
String newStr == "aa\$"
should be
String newStr = "aa\$"
Then, because you are using normal strings to declare your regex, you need to double escape the dollar sign:
String newStr = "aa\\$"
Or, use slashy strings:
String newStr = /aa\$/
I have found a working solution for my problem: String newStr == "aa\\\$";
You need to have three backslashes. The first backslash (from right to left) escapes $ so Groovy Interpreter does not understand $ as a mark for a variable.
The two following slashes has to escape $ for replaceFirst, because $ is interpreted by Matcher.appendReplacement() -- called inside replaceFirst -- as a grouping. It is an unexpected but well documented in JavaDoc behavior:
backslashes (\) and dollar signs ($) in the
replacement string may cause the
results to be different than if it
were being treated as a literal
replacement string
ps. After fighting with escaping other "special" symbol -- backslash -- I switched to String.replace(CharSequence,CharSequence).