Regex replace escaped characters - regex

I would like to define a regex pattern which replaces escaped characters with the corresponding value.
For example the string
xy\tz\\x
Should be converted to
xy{tab}z\x
The problem is how to handle things like
xy\\\\\t
this string should become
xy\\{tab}
I don't know how to create a pattern which matches only odd backslashes.

This isn't something that can be accomplished using a single pattern. To start, strip out collections of backslashes:
s/\\\\/\\/g
This replaces two backslashes with a single one.
Then you can just apply one pattern per escaped character:
s/\\t/\t/g
The trick here is to escape the backslash you want to replace. What this'll do is replace the literal string "\t" with a tab character.

Related

Regex in Flutter to find double quotes enclosed words and escaped single quotes

I am using this Regex in my Flutter App to find words enclosed by single-quotes that end with a .tr:
r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b"
Now I need another expression that is almost the same but looks for words enclosed by dobule-quotes, ending with .tr and might contain escaped single-quotes.
I tried simply changing the single quotes to double quotes from the first expression, but Flutter is giving me errors... I need to escaped some characters but I can not make it work. Any idea?
An edge case it should match is:
"Hello, I\'m Chris".tr
You may use this regex for double quoted text that can have any escaped character followed by .tr and word boundary:
r""""[^"\\]*(?:\\.[^"\\]*)*"\s*\.tr\b"""
RegEx Demo
you need to use \ before every " in your RegExp's source, try this:
RegExp regExp = new RegExp(r'\"[^\"\\]*(?:\\.[^\"\\]*)*\"\s*\.tr\b');
print("${regExp.hasMatch('"Hello, I\'m Chris".tr')}"); // result = true

How to exclude part of string using regex and change add this part and the and of string?

I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)

Escaping filename suffix in a regex?

I'm trying to match a variable length string followed by the filetype suffix in an XML filename using a regex:
varrrrrriableLengthString.xml
Currently I'm using this regex with a greedy match, the second backslash is to escape the first, which is to escape the dot.
[A-Za-z0-9]+\\.[xX][mM][lL]
I've tested this on RegExr, and it matches with only one backslash. However my CPP parser requires the double backslash.
How can I properly escape the filename suffix?
You can also escape chars using the [] notation, in your case [.]. The main advantage is that there is no "one or two backslashes?" question anymore, and I find it more readable IMHO.
It just does not work with brackets, i.e. to escape a [ (or ]), you still have to use \[ (or \\[ for a string literal) and not [[].
Backslashes still have to be escaped using another backslash too.

What regex expression will match all characters except ", except when it is \"?

I'm trying to parse an apache log, and I'm having problems with the right syntax for the referer because it is a string inside " (double-quotes), that can also have \" inside it.
"([^"]*)" doesn't work when there is a \" in the string.
How do I start at the 1st double-quote, then take all characters that are not double-quotes, unless it's \", in which case I include it, and keep going?
You could use this:
"((?:[^"]|\\")*)"
It will match zero or more of any character other than a double-quote or a slash-double-quote pair, all surrounded by double-quotes.
Could there be other escapes in the string, for example "hello \\"? In that case, you need a more general approach:
"((?:\\.|[^"\\])*)"
How about this? A negative-lookbehind to exclude a \ before the closing "
"(.+?)(?<!\\)"
This will match two quotes with any number of escaped quotes in-between:
"\([^"]\|\\"\)*"
First it looks for a quote. Next it searches for zero to infinity of the following:
a non-quote character
a quote character preceded by a backslash

Regex match backslash star

Can't work this one out, this matches a single star:
// Escaped multiply
Text = Text.replace(new RegExp("\\*", "g"), '[MULTIPLY]');
But I need it to match \*, I've tried:
\\*
\\\\*
\\\\\*
Can't work it out, thanks for any help!
You were close, \\\\\\* would have done it.
Better use verbatim strings, that makes it easier:
RegExp(#"\\\*", "g")
\\ matches a literal backslash (\\\\ in a normal string), \* matches an asterisk (\\* in a normal string).
Remember that there are two 'levels' of escaping.
First, you are escaping your strings for the C# compiler, and you are also escaping your strings for the Regex engine.
If you want to match "\*" literally, then you need to escape both of these characters for the regex engine, since otherwise they mean something different. We escape these with backslashes, so you will have "\\\*".
Then, we have to escape the backslashes in order to write them as a literal string. This means replacing each backslash with two backslashes: "\\\\\\*".
Instead of this last part, we could use a "verbatim string", where no escapes are applied. In this case, you only need the result from the first escaping: #"\\\*".
Your syntax is completely wrong. It looks more like Javascript than C#.
This works fine:
string Text = "asdf*sadf";
Text = Regex.Replace(Text, "\\*", "[MULTIPLY]");
Console.WriteLine(Text);
Output:
asdf[MULTIPLY]sadf
To match \* you would use the pattern "\\\\\\*".