Regex: How to replace with the string literal "\1"? - regex

I have a string, say r"a". I want to replace every r"a" with the string r"\1", but my regex engine does not understand this.
I have tried:
r"\1" -- crashes (can't match group 1 because there is no group 1)
r"\\1" -- crashes (not sure why)
Is this a limitation of my (proprietary) regex engine, or is it a general problem? Is there an elegant way of solving it? (I could e.g. replace "a" by "/1" and then StrReplace( "/", r"\" )... but that's not nice!)

The correct way would be to use r"\\1" as a replace string. So if your proprietary regex engine/language chokes on a \\, you should fix this bug.
If you look at your example, you don't need a regex engine at all. But perhaps the example is simpler than the actual requirement...

Related

Using "#" in regular expression with VB.NEt

Assuming I have to check if "#" exists on a given string - should I use back slash before or not? So far I found they're both working for me, but I'm not sure if it always works on any Windows host (this is part of a VB.NET application that has to work world-wide)
The string: Hello #world
Pattern1: Hello #world
Pattern 2: Hello \#world
Which one should I use to get the most precise matching? pattern1 or pattern2?
I work with VB.NET on VS2010 (.NET FW 3.5)
Thank you
# is not a special regex character, at least not in VB.NET. Which means that both patterns are pretty much the same, and you can use whichever you prefer. Although for readability sake you probably should stick to the pattern without backslash.
You can find complete list of special regex characters in .NET here.
I would suggest you to leave this option on Regex engine. Just use its Regex.Escape function. It will escape the necessary things.

Regex match first characters of string

I am trying to create a regex that will match the first 3 characters of a string,
If I have a string ABCFFFF I want to verify that the first 3 characters are ABC.
It's pretty straightforward, the pattern would be ^ABC
As others may point out, using regular expressions for such a task is an overkill. Any programming language with regex support can do it better with simple string manipulations.
Just simple regex will work:
/^ABC/
But is it a good use case for using regex, I am not sure. Consider using substring in whatever language/platform you're using.
"^ABC" should work. '^' matches the start in most regex implementations.

RegExp extraction

Here's the input string:
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif)
With this RegExp: \'.+.xml\'
... we get this:
'mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml'
... but I want to extract only this:
http://www.something.com/videos/JohnsAwesomeCaption.xml
Any suggestions? I'm sure this problem has been asked before, but it's difficult to search for. I'll be happy to Accept a solution.
Thanks!
If you want to get everything within quotes that starts with http:
(?<=')http:[^']+(?=')
If you only want those ending with .xml
(?<=')http:[^']+\.xml(?=')
It doesn't select the quotation marks (as you asked)
It's fast!
Fair warning: it only works if the regex engine you're using can handle lookbehind
Knowing the language would be helpful. Basically, you are having a problem because the + quantifier is greedy, meaning it will match the largest part of the string that it can. you need to use a non-greedy quantifier, which will match as little as possible.
We will need to know the language you're in to know what the syntax for the non-greedy quantifier should be.
Here is a perl recipe. Just as a sidenote, instead of .+, you probably want to match [^.]+.xml.
\'.+?.xml\'
should work if your language supports perl-like regexes.
This should work (tested in javascript, but pretty sure it would work in most cases)
'[^']+?\.xml'
it looks for these rules
starts with '
is followed by anything but '
ends in .xml'
you can demo it at http://RegExr.com?2tp6q
in .net this regex works for me:
\'[\w:/.]+\.xml\'
breaking it down:
a ' character
followed by a word character or ':' or '/' or '.' any number of times (which matches the url bit)
followed by '.xml' (which differentiates the sought string from the other urls which it will match without this)
followed by another ' character
I tested it here
Edit
I missed that you don't want the quotes in the result, in which case as has been pointed out you need to use look behind and look ahead to include the quotes in the search, but not in the answer. again in .net:
(?<=')[\w:/.]+\.xml(?=')
but I think the best solution is a combination of those offered already:
(?<=')[^']+\.xml(?=')
which seems the simplest to read, at least to me.

Replacing char in a String with Regular Expression

I got a string like this:
PREFIX-('STRING WITH SPACES TO REPLACE')
and i need this:
PREFIX-('STRING_WITH_SPACES_TO_REPLACE')
I'm using Notepad++ for the Regex Search and Replace, but i'm shure every other Editor capable of regex replacements can do it to.
I'm using:
PREFIX-\('(.*)(\s)(.*)'\)
for search and
PREFIX-('\1_\3')
for replace
but that replaces only one space from the string.
The regex search feature in Notepad++ is very, very weak. The only way I can see to do this in NPP is to manually select the part of the text you want to work on, then do a standard find/replace with the In selection box checked.
Alternatively, you can run the document through an external script, or you can get a better editor. EditPad Pro has the best regex support I've ever seen in an editor. It's not free, but it's worth paying for. In EPP all I had to do was this:
search: ((?:PREFIX-\('|\G)[^\s']+)\s+
replace: $1_
EDIT: \G matches the position where the previous match ended, or the beginning of the input if there was no previous match. In other words, the first time you apply the regex, \G acts like \A. You can prevent that by adding a negative lookahead, like so:
((?:PREFIX-\('|(?!\A)\G)[^\s']+)\s+
If you want to prevent a match at the very beginning of the text no matter what it starts with, you can move the lookahead outside the group:
(?!\A)((?:PREFIX-\('|\G)[^\s']+)\s+
And, just in case you were wondering, a lookbehind will work just as well as a lookahead:
((?:PREFIX-\('|(?<!\A)\G)[^\s']+)\s+
You have to keep matching from the beggining of the string untill you can match no more.
find /(PREFIX-\('[^\s']*)\s([^']*'\))/
replace $1_$2
like: while (/(PREFIX-\('[^\s']*)\s([^']*'\))/$1_$2/) {}
How about using Replace all for about 20 times? Or until you're sure no string contains more spaces
Due to nature of regex, it's not possible to do this in one step by normal regular expression.
But if I be in your place, I do such replaces in several steps:
find such patterns and mark them with special character
(Like replacing STRING WITH SPACES TO REPLACE with #STRING WITH SPACES TO REPLACE#
Replace #([^#\s]*)\s to #\1_ server times.
Remove markers!
I studied a little the regex tool in Notepad++ because I didn't know their possibilities.
I conclude that they aren't powerful enough to do what you want.
Your are obliged to learn and use a programming language having a real regex capability. There are a number of them. Personnaly, I use Python. It would take 1 mn to do what you want with it
You'd have to run the replace several times for each space but this regex will work
/(?<=PREFIX-\(')([^\s]+)\s+/g
Replace with
\1_ or $1_
See it working at http://refiddle.com/10z

Boost wregex throwing exception, regex syntax wrong?

I have imported Boost library in to a .dll that I am using. I am trying to parse a string using:
boost::wregex regPlayerAtSeat(L"*Governor: Seat.?[1-9].*");
But all I get is an 'interop service exception. Is the syntax of my regex wrong?
Thanks, R.
The first * doesn't appear to have any characters before it. In regex it acts as a quantifier, not a wildcard like in UNIX command lines and so forth. You probably want something like .* in its place, but that's partly just a guess. The full regex would then look like this:
boost::wregex regPlayerAtSeat(L".*Governor: Seat.?[1-9].*");
.* will match zero or more repetitions of (almost) any character (probably not newlines, but I don't know the inner workings of the boost regex engine). Is that what you were going for at the beginning of your string? Alternatively, since you haven't anchored your regex, you might be able to just use:
boost::wregex regPlayerAtSeat(L"Governor: Seat.?[1-9]");
This will depend on what exactly you're trying to match and what format it is in, however.