Find replace named groups regexp in Geany - regex

I am trying to replace public methods to protected methods for methods that have a comment.
This because I am using phpunit to test some of those methods, but they really don't need to be public, so I'd like to switch them on the production server and switch back when testing.
Here is the method declaration:
public function extractFile($fileName){ //TODO: change to protected
This is the regexp:
(?<ws>^\s+)(?<pb>public)(?<fn>[^/\n]+)(?<cm>//TODO: change to protected)
If I replace it with:
\1protected\3\//TODO: change back to public for testing
It seems to be working, but what I cannot get to work is naming the replace with. I have to use \1 to get the first group. Why name the groups if you can't access them in the replacing texts? I tried things like <ws>, $ws, $ws, but that doesn't work.
What is the replacing text if I want to replace \1 with the <ws> named group?

The ?<ws> named group syntax is the same as that used by .NET/Perl. For those regex engines the replacement string reference for the named group is ${ws}. This means your replacement string would be:
${ws}protected${fn}\//TODO: change back to public for testing
The \k<ws> reference mentioned by m.buettner is only used for backreferences in the actual regex.
Extra Information:
It seems like Geany also allows use of Python style named groups:
?P<ws> is the capturing syntax
\g<ws> is the replacement string syntax
(?P=ws) is the regex backreference syntax
EDIT:
It looks my hope for a solution didn't pan out. From the manual,
A subpattern can be named in one of three ways: (?...) or (?'name'...) as in Perl, or (?P...) as in Python. References to capturing parentheses from other parts of the pattern, such as backreferences, recursion, and conditions, can be made by name as well as by number.
And further down:
Back references to named subpatterns use the Perl syntax \k or \k'name' or the Python syntax (?P=name).
and
A subpattern that is referenced by name may appear in the pattern before or after the reference.
So, my inference of the syntax for using named groups was correct. Unfortunately, they can only be used in the matching pattern. That answers your question "Why name groups...?".
How stupid is this? If you go to all the trouble to implement named groups and their usage in the matching pattern, why not also implement usage in the replacement string?

Related

Regex Jersey Rest Service

I have the following regex in jersey, that works:
/artist_{artistUID: [1-9][0-9]*}
however, if i do
/{artistUID: [artist_][1-9][0-9]*}
it does not, what i do not understand how the regexes are being build and do not find any good documentation for it. What i want to do is something like this:
/{artistUID: ([uartist_]|[artist_])[1-9][0-9]*}
to recognize terms like "artist_123" and "uartist_123" and store them in the artistUID value.
You can use the alternation group ((...|...)) rather than a characrter class [...] (that matches 1 single character defined inside it).
Use
/{artistUID: (uartist|artist)_[1-9][0-9]*}
Or to make it shorter, use a ? quantifier after u to make it optional:
/{artistUID: u?artist_[1-9][0-9]*}
See the regex demo

What is the purpose of this non-capturing group?

I am trying to understand the inner pipings of express.js, but I'm having a little trouble on one thing.
If you add a new route, like such:
app.get("/hello/darkness/myold/:name", ...)
The string I provided internally becomes a regular expression. Now, I worked out what I thought the regex should be internally, and I came up with:
^\/hello\/darkness\/myold\/([^\/]+?)\/?$
The ([^\/]+?) will capture the name parameter, \/? is present if strict routing is disabled, and the whole thing is encapsulated in ^...$. However, when I went and looked what is actually stored inside express, it's actually this:
^\/hello\/darkness\/myold\/(?:([^\/]+?))\/?$
As you can see, there is a non-capturing group around the capturing group. My question is: what is the purpose of this non-capturing group?
The method I used to see what regex express.js was using internally was simply to make an invalid regex and view the error console:
app.get('/hello/darkness/myold/:friend/[', function(req, res){});
yields
SyntaxError: Invalid regular expression: ^\/hello\/darkness\/myold\/(?:([^\/]+?))\/[\/?$
The answer to this question is that the non-capturing group is a relic of the case where a parameter is optional. Consider the difference between the following two routes:
/hello/:world/goodbye
/hello/:world?/goodbye
They will generate, respectively:
^\/hello\/(?:([^\/]+?))\/goodbye\/?$
^\/hello(?:\/([^\/]+?))?\/goodbye\/?$
Note the important but subtle change that happens to the non-capturing group when an optional parameter is present.

Regex find/replace

I am attempting to do some find and replace on a java source file.
Currently my classes have invalid names (imported from a tool that did poor auto naming) of the form:
public class [0-9]{2}[A-Za-z]+
I would like to insert underscores around the digits, resulting in a valid class name of the form
public class [_][0-9]{2}[_][A-Za-z]+
However using Eclipses find and replace tool, with the regex box check on both the find and replace strings does not format the output as I'd like.
It takes
02ListOfValidAppIDs
and makes it
[-][0-9]{2}[_][A-Za-z]+
instead of
_02_ListOfValidAppIDs
How can you make the regex keep the arbitrary number and text and just plug them in for the replace string?
(Edit: As a note, with the preview feature I can see that eclipse is correctly finding all of the names I wish to replace, and nothing else)
I'm not sure of the exact flavor of Regex that you'll need, but something like this should get you started in the right direction.
Update your "Find" pattern to use capture groups:
(public class )([0-9]{2})([A-Za-z]+)
And then reference those captures in the replacement:
\1_\2_\3
NOTE: Some flavors of Regex will use {} instead of () to represent a captured group, and some flavors will use $1, $2, $3, etc. as the reference instead of using \#.
In Eclipse's Find/Replace dialogue, this works fine if you use
([0-9]+)([A-Za-z]+)
in the Find box and
_$1_$2
in the Replace with box.
Of course, the Regular expressions box must be checked too.

Regex capturing named groups in a language that doesn't support them using a meta regex?

I am using Haskell and I don't seem to find a REGEX package that supports Named Groups so I have to implement it somehow myself.
basically a user of my api would use some regex with named groups to get back captured groups in a map
so
/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj on /foo/hhhh/bar/jjj
would give
[("name","foo"),("surname","bar")]
I am doing a specification trivial implementation with relatively small strings so for now performance is not a main issue.
To solve this, I thought I'd write a meta regex that will apply on the user's regex
/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj
to extract the names of groups and replace them with nothing to get
0 -> name
1 -> surname
and the regex becomes
/([a-z]*)/hhhh/([a-z]*)/jjj
then apply it to the string and use the index to group names with matched.
Two questions:
does it seem like a good idea?
what is the meta regex that I need to capture and replace the named groups syntax
for those unfamiliar with named groups http://www.regular-expressions.info/named.html
note: all what I need from named groups is that the user give names to matches, so a subset of named groups that only gives me this is ok.
The more generally you want to apply your solution, the more complex your problem becomes. For instance, in your approach, you want to remove the named groups and use the indexes (indices?) to match. This seems like a good start, but you have consider a few things:
If you replace the (?<name>blah) with (blah) then you also have to replace the /name with /1 or /2 or whatever.
What happens if the user includes non named groups as well? for eg: ([a-z]{3})/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj on /foo/hhhh/bar/jjj. In this case, your numbering will not work b/c group 1 is the user defined non named group.
See this post for some insipration, as it seems other have successfully tried the same (albeit in Java)
Regex Named Groups in Java
Perhaps you should use parser combinators. This looks sufficiently complicated that it would be cleaner and more maintainable to step out and use Parsec or Attoparsec, instead of trying to push regexes further towards parsing.

LogStash Grok regex backreferences

I'm really hoping I'm doing something silly and just can't see the problem... this would be trivial in Perl or other languages. Apparently backreferences are supported in grok https://grokconstructor.appspot.com/RegularExpressionSyntax.txt, but I can't make them work. I need to match on something basic:
identifier - Static Text identifier Rest Of Line
So my grok expression would be something like:
%{DATA:id_name} - Static Text \1 %{GREEDYDATA:rest_of_line}
But using http://grokdebug.herokuapp.com/ always produces a compile error. If I use any of the \k notation, same thing. I've tried wrapping the first variable in parentheses, double backslashes, random permutations, can't make it work.
Any help would be much appreciated. Thanks!
I don't think that the %{DATA:id_name} produces a named capture that you can use with custom regex back references. Instead, you could wrap %{DATA} in a named capture and then back reference to it, like so:
(?<id_name>%{DATA}) - Static Text \k<id_name> %{GREEDYDATA:rest_of_line}