Regex replacement character only if group matches - regex

In a sed/egrep-style regular expression is it possible to print a character in the replacement string only if one of the groups matched?
For example, suppose I have an expression such as:
/^func([ \t]+\([^)]+\))?[ \t]+([a-zA-Z0-9_]+)/\1.\2/
Is it possible to print the period in the replacement only if the group \1 matched?
Specifically I'm trying to write an expression for the --regex-<LANG> option as described in http://ctags.sourceforge.net/ctags.html

The only thing I can think of is two replace commands:
/^func[ \t]+([a-zA-Z0-9_]+)/\1/
/^func([ \t]+\([^)]+\))?[ \t]+([a-zA-Z0-9_]+)/\1.\2/
The documentation of ctags suggests that this is supported by simply specifying two --regex-<LANG> options:
The regular expression defined by this option is added to the current list of regular expressions for the specified language unless the parameter is omitted, in which case the current list is cleared.
In Perl, you can call arbitrary function on the group matches, but this doesn't help here.

Related

A regular expression that replaces a group with hard coded text

First of all, I'm not sure if this is something you can even do in regular expressions. If you can, I have no idea on how to search for how to do it.
Let's say I have text:
Click this link for more information.
And a regular expression:
<a[^>]*>([^<]*)</a>
The application of the regular expression would yield this for group 1:
this link
Let's say I wanted to write the regular expression to instead return hard coded text for group 1
<a[^>]*>(${{replacement text}}[^<]*)</a>
(this is made up syntax by the way)
So that the application of the regular expression to the text would yield this for group 1:
replacement text
Is this possible?
Here's another example just to solidify my objective:
Examples of text:
serverNode1/appPortal
serverNode1/appPortal2
serverNode1/appPortal3
My regular expression
appPortal((?:?{{"1"}}\b)|(?:\d))
(using the same made up syntax)
The expected output for the first character group should be
1
2
3
(The point of the expression is to match the word break and replace it with "1" or otherwise use the digit character class to match a digit. The sub-groups are made optional with the ?: so the outside group is still group 1).
What is the point of this you may ask? I am using Splunk to do field extractions, and I'd like for the field to be extracted as 1, 2, or 3, like in my above example, and I can only rely on the regular expression groups to give me the fields (as in, I don't have anywhere to put code to say if group 1 == "" then change to "1").
Basically, as the regular expressions defined, it is not possible. By definition, regular expressions match the patterns in the text. To be clear, regexp engine returns matches that are always part of the original string, nothing more. There are some regex extensions that allows to specify name of the capturing group, but it does not transform the match.
The behaviour you described can be easy achieved processing the regex match in any programming language, but it also can be achieved by combining regex substitution and parsing.
For example, s/appPortal(?!\d)/appPortal1/ will replace "appPortal" without the digit after it with "appPortal1" and then you can apply another regex to build the match you want.

Regular expressions middle of string

How I can get part of SIP URI?
For example I have URI sip:username#sip.somedomain.com, I need get just username and I use [^sip:](.*)[$#]+ expression, but appeared result is username#. How I can exclude from matching #?
this should do the job
(?<=^sip:)(.*)(?=[$#])
Use a lookahead instead of actually matching #:
^sip:(.*?)(?=#|\$)
Either you are using a very strange regex flavor, or your starting character class is a mistake. [^sip:] matches a single character that isn't any of s,i,p or :. I am also not certain what the $ character is for, since that isn't a part of SIP syntax.
If lookaheads are not available in your regex flavour (for instance POSIX regexes lack them), you can still match parts of the string in your regex you don't eventually want to return, if you use capture groups and only grab the contents of some of them.
For example
^sip:(.*?)[$#]+ Then only return the contents of the first capture group

(vim) regex: masking text with help of pattern

Am i correct to understand, that the definition
:range s[ubstitute]/pattern/string/cgiI
suggests that in the string part indeed only strings are to be used, that is patterns not allowed? What i would like to do is do replacement of say any N symbols at position M with X*N symbols, so i would have liked to use something like this:
:%s/^\(.\{10}\).\{28}/\1X\{28}/g
Which does not work because \{28} is interpreted literally.
Is writing the 28 XXXXX...X in the replace part the only possibility?
You can use expressions in the replacement part via \=. You have to access the match via submatch(), and join it together with the static string, which you can generate via repeat():
:%s/^\(.\{10}\).\{28}/\=submatch(1) . repeat('X',28)/g
The only regex constructs allowed in the replacement part are numbered groups: \1 \2 \3 etc. The repeating construct {28} is not valid there, though it's a clever idea. You'll have to use 28 X's.
Another alternative is using a expression in the replacement part:
:%s/^\(.\{10}\).\{28}/\=submatch(1).repeat("X",28)/g
The first matched group is obtained with submatch(1). For more information see :h sub-replace-expression.

TR1 regex: capture groups?

I am using TR1 Regular Expressions (for VS2010) and what I'm trying to do is search for specific pattern for a group called "name", and another pattern for a group called "value". I think what I want is called a capture group, but I'm not sure if that's the right terminology. I want to assign matches to the pattern "[^:\r\n]+):\s" to a list of matches called "name", and matches of the pattern "[^\r\n]+)\r\n)+" to a list of matches called "value".
The regex pattern I have so far is
string pattern = "((?<name>[^:\r\n]+):\s(?<value>[^\r\n]+)\r\n)+";
But the regex T4R1 header keeps throwing an exception when the program runs. What's wrong with the syntax of the pattern I have? Can someone show an example pattern that would do what I'm trying to accomplish?
Also, how would it be possible to include a substring within the pattern to match, but not actually include that substring in the results? For example, I want to match all strings of the pattern
"http://[[:alpha:]]\r\n"
, but I don't want to include the substring "http://" in the returned results of matches.
The C++ TR1 and C++11 regular expression grammars don't support named capture groups. You'll have to do unnamed capture groups.
Also, make sure you don't run into escaping issues. You'll have to escape some characters twice: one for being in a C++ string, and another for being in a regex. The pattern (([^:\r\n]+):\s\s([^\r\n]+)\r\n)+ can be written as a C++ string literal like this:
"([^:\\r\\n]+:\\s\\s([^\\r\\n]+)\\r\\n)+"
// or in C++11
R"xxx(([^:\r\n]+:\s\s([^\r\n]+)\r\n)+)xxx"
Lookbehinds are not supported either. You'll have to work around this limitation by using capture groups: use the pattern (http://)([[:alpha:]]\r\n) and grab only the second capture group.

Extract and use a part of string with a regex in GVIM

I've got a string:
doCall(valA, val.valB);
Using a regex in GVIM I would like to change this to:
valA = doCall(valA, val.valB);
How would I go about doing this? I use %s for basic regex search and replace in GVIM, but this a bit different from my normal usages.
Thanks
You can use this:
%s/\vdoCall\(<(\w*)>,/\1 = doCall(\1,/
\v enables “more magic” in regular expressions – not strictly necessary here but I usually use it to make the expressions simpler. <…> matches word boundaries and the in-between part matches the first parameter and puts it in the first capture group. The replacement uses \1 to access that capture group and insert into the right two places.