Vim: Get content of syntax element under cursor - regex

I'm on a highlighted complex syntax element and would like to get it's content. Can you think of any way to do this?
Maybe there's some way to search for a regular expression so that it contains the cursor?
Okay, example. The cursor is inside a string, and I want to get the text, the content of this syntactic element. Consider the following line:
String myString = "Foobar, [CURSOR]cool \"string\""; // And some other "rubbish"
I want to write a function that returns
"Foobar, cool \"string\""

if I understood the question. I found this gem some time ago and don't remember where but i used to understand how syntax hilighting works in vim:
" Show syntax highlighting groups for word under cursor
nmap <leader>z :call <SID>SynStack()<CR>
function! <SID>SynStack()
if !exists("*synstack")
echo map(synstack(line('.'), col('.')), 'synIDattr(v:val, "name")')

The textobj-syntax plugin might help. It creates a custom text object, so that you can run viy to visually select the current syntax highlighted element. The plugin depends on the textobj-user plugin, which is a framework for creating custom text objects.

This is a good use for text objects (:help text-objects). To get the content you're looking for (Foobar, cool \"string\"), you can just do:
y = yank
i" = the text object "inner quoted string"
The yank command by default uses the unnamed register ("", see :help registers), so you can access the yanked contents programmatically using the getreg() function or the shorthand #{register-name}:
:echo 'String last yanked was:' getreg('"')
:echo 'String last yanked was:' #"
Or you can yank the contents into a different register:
yanks the inner quoted string into the "q register, so it doesn't conflict with standard register usage (and can be accessed as the #q variable).

EDIT: Seeing that the plugin mentioned by nelstrom works similar to my original approach, I settled on this slightly more elegant solution:
fu s:OnLink()
let stack = synstack(line("."), col("."))
return !empty(stack)
normal mc
normal $
let lineLength = col(".")
normal `c
while col(".") > 1
normal h
if !s:OnLink()
normal l
normal ma`c
while col(".") < lineLength
normal l
if !s:OnLink()
normal h
normal mb`av`by


RegEx to format Wikipedia's infoboxes code [SOLVED]

I am a contributor to Wikipedia and I would like to make a script with AutoHotKey that could format the wikicode of infoboxes and other similar templates.
Infoboxes are templates that displays a box on the side of articles and shows the values of the parameters entered (they are numerous and they differ in number, lenght and type of characters used depending on the infobox).
Parameters are always preceded by a pipe (|) and end with an equal sign (=). On rare occasions, multiple parameters can be put on the same line, but I can sort this manually before running the script.
A typical infobox will be like this:
{{Infobox XYZ
| first parameter = foo
| second_parameter =
| 3rd parameter = bar
| 4th = bazzzzz
| 5th =
| etc. =
But sometime, (lazy) contributors put them like this:
{{Infobox XYZ
|first parameter=foo
|3rd parameter=bar
Which isn't very easy to read and modify.
I would like to know if it is possible to make a regex (or a serie of regexes) that would transform the second example into the first.
The lines should start with a space, then a pipe, then another space, then the parameter name, then any number of spaces (to match the other lines lenght), then an equal sign, then another space, and if present, the parameter value.
I try some things using multiple capturing groups, but I'm going nowhere... (I'm even ashamed to show my tries as they really don't work).
Would someone have an idea on how to make it work?
Thank you for your time.
The lines should start with a space, then a pipe, then another space, then the parameter name, then a space, then an equal sign, then another space, and if present, the parameter value.
First the selection, it's relatively trivial:
Then the replacement, literally your description of what you want (note the space at the beginning):
| $1 = $2
See it in action here.
The best code I have found so far is the following :
The problem is it doesn't align the equal signs vertically...
I got an answer on AutoHotKey forums:
out := ""
Send, ^x
regex := "O)\s*\|\s*(.*?)\s*=\s*(.*)", width := 1
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
width := Max(width, StrLen(_[1]))
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
out .= Format(" | {:-" width "} = {2}", _[1],_[2]) "`n"
out .= A_LoopField "`n"
Clipboard := out
Send, ^v
With this script, pressing Ctrl+i formats the infobox code just right (I guess a simple regex isn't enough to do the job).

Regex to Capture and wrap outline formatted text

I have source text that is not particularly clean or well formed but I have a need to find text and wrap a line in a tag. The text is in outline format.
1. becomes a <h1> tag
A. becomes a <h2> tag
(1) becomes a <h3> tag
and so on...
Here are some examples of the source.
PREPARE FOR TEST A. Open the door. B. Turn on the light.
The desired result would be
<h1>1. PREPARE FOR TEST</h1>
<h2>A. Open the door.</h2>
<h2>B. Turn on the light.</h2>
Unfortunately, the text could be the same line or it could be on multiple lines or even have a different number of spaces between the outline number and the text. Another example
(1) Check air inlet and air outlet valves are shown open if OAT is above > 53.6 deg F., or closed if OAT is below
48.2 deg F.
In this case the desired result would be
<h3>(1) Check skin air inlet and skin air outlet valves are shown open if temperature is above 53.6 deg F., or closed if temperature is below 48.2 deg F.</h3>
My questions are
How do I find an entire line of text that is associated with an outline level, i.e., the 1., A., (1) and so on.
How do I then wrap that text with the appropriate tag.
I'm not particularly strong at regex, I have been able to do some of the simpler things required of this project but this has me stumped a bit. Here's what I used to try to find the H1 lines, but as anyone that knows regex can plainly see, this won't work past the first word.
I'm using Python at the moment but am better with PHP and can move to that if needed and still may because I'm better at PHP then Python.
Thank you.
Since every regex needs a different substitution, you need to apply each regex in turn. Assuming that you want the match to always span an entire line, I'd suggest something like this:
import re
s = """1. becomes a h1 tag
A. becomes a h2 tag
(1) becomes a h3 tag
and so on..."""
regexes = {r"\d+\.": "h1",
r"[A-Z]+\.": "h2",
r"\(\d+\)": "h3",
for regex in regexes:
repl = regexes[regex]
s = re.sub("(?m)^" + regex + ".*", "<" + repl + ">" + r"\g<0>" + "</" + repl + ">", s)
<h1>1. becomes a h1 tag</h1>
<h2>A. becomes a h2 tag</h2>
<h3>(1) becomes a h3 tag</h3>
and so on...
Each of the regexes (which only match the actual identifiers) is modified to match from the start of the line until the end of the line:
"(?m)^" + regex + ".*" # (?m) allows ^ to match at the start of lines
The entire match is contained in group 0 which can be accessed in the replacement string via \g<0>.
"<" + repl + ">" + r"\g<0>" + "</" + repl + ">" # add tags around line
For future reference and to close this, what I eventually came up with was to run through the entire string of text and remove some trash first. There are actually 15 of these that I use for this step.
$regexes['lf'] = "/[\n\r]*/";
$regexes['tab-cr-lf'] = "/\t[\r\n]/";
preg_replace($regexes,"", $string);
I then discovered that I could count on space and \t after each header identifier, so then I run some more regexes on the string
$regexes['step1'] = "/(\d{1,2}\..\t)/";
$regexes['step2'] = "/([A-Z]\. \t)/";
$replacements['step1'] = "\n\n<step1>$0";
$replacements['step2'] = "\n\n<step2>$0";
preg_replace($this->headerRegexes, $replacements, $string);
These steps have given me some usable text that I can work with.
Thanks to everyone that chimed in, it gave me somethings to think about as I tackled this problem.

Have Tabulize ignore some lines and align the others

I would want Tabulize to ignore lines which do not have a particular character and then align/tabularize the lines ..
text1_temp = text_temp;
text2 = text_temp;
In the end i would like the following :
text1_temp = text_temp;
text2 = text_temp;
// The 2nd "=" is spaced/tabbed with relation to the first "="
If i run ":Tabularize /=" for the 3 lines together I get :
text1_temp = text_temp;
text2 = text_temp;
Where the two lines with "=" are aligned with respect to the length of the middle line
Any suggestions .. ?
PS: I edited the post possibly to explain the need better ..
I am not sure how to do this with Tabular directly. You might be able to use Christian Brabandt's NrrwRgn plugin to filter out only lines with = using :NRP then running :NRM. This will give you a new buffer with only the lines with = so you can run :tabularize/=/ and then save the the buffer (:w, :x, etc).
The easiest option is probably to use vim-easy-align which supports such behavior out of the box it seems. Example of using EasyAlign (Using ga as EasyAlign's mapping you):
What about a simple replace, like :g/=/s/\t/ /g ?
If that doesn't work, you can try this too: :g/=/s/ \+= \+/ = /g
The :/g/=/s will find all the lines that contain '=', and do the replacement for them.
So, s/\t/ /g will replace tabs with spaces. These two things combined will do what you need.

In Vim, is there a "matching braces/parenthesis/etc" equivalent in substitute/search symbols?

I would like to replace for instance every occurrence of "foo{...}" with anything except newlines inside the bracket (there may be spaces, other brackets opened AND closed, etc) NOT followed by "bar".
For instance, the "foo{{ }}" in "foo{{ }}, bar" would match but not "foo{hello{}}bar".
I've tried /foo{.*}\(bar\)\#! and /foo{.\{-}}\(bar\)\#! but the first one would match "foo{}bar{}" and the second would match "foo{{}}bar" (only the "foo{{}" part).
this regex:
foo{{ }}
foo{{ }}, bar
but not:
It is impossible to correctly match an arbitrary level of nested
parentheses using regular expressions. However, it is possible to
construct a regex to match supporting a limited amount of nesting (I
think this answer did not attempt to do so). – Ben
This does ...
for up to one level of inner braces:
for up to two levels of inner braces:
for up to three levels of inner braces:
Depends on what replacement you want to perform exactly, you might be able to do that with macros.
For example: Given this text
line 1 -- -- -- -- array[a][b[1]]
line 2 -- array[c][d]
line 3 -- -- -- -- -- -- -- array[e[0]][f] + array[g[0]][h[0]]
replace array[A][B] with get(A, B).
To do that:
Position the cursor at the begin of the text
qq to begin recording a macro
Do something to change the data independent of the content inside (use % to go to matching bracket, and some register/mark/plugin to delete around the bracket). For example cwget(<esc>ldi[vhpa, <esc>ldi[vhpa)<esc>n -- but macros are usually unreadable.
n to go to next match, q to stop recording
#q repeatedly (## can be used from the second time)
This is probably not very convenient because it's easy to make a mistake (press I, <home>, A for example) and you have to redo the macro from the beginning, but it works.
Alternatively, you can do something similar to eregex.vim plugin to extend vim's regex format to support this (so you don't have to retype the huge regex every time).
Proof of concept:
"does not handle different magic levels
"does not handle '\/' or different characters for substitution ('s#a#b#')
"does not handle brackets inside strings
" usage: `:M/pattern, use \zm for matching block/replacement/flags`
command -range -nargs=* M :call SubstituteWithMatching(<q-args>, <line1>, <line2>)
":M/ inspired from eregex.vim
function SubstituteWithMatching(command, line1, line2)
let EscapeRegex={pattern->escape(pattern, '[]\')}
let openbracket ='([{'
let closebracket=')]}'
let nonbracketR='[^'.EscapeRegex(openbracket.closebracket).']'
let nonbracketsR=nonbracketR.'*'
let LiftLevel={pattern->
let matchingR=LiftLevel(LiftLevel(LiftLevel(nonbracketsR)))
if v:false " optional test suite
echo "return 0:"
echo match('abc', '^'.matchingR.'$')
echo match('abc(ab)de', '^'.matchingR.'$')
echo match('abc(ab)d(e)f', '^'.matchingR.'$')
echo match('abc(a[x]b)d(e)f', '^'.matchingR.'$')
echo match('abc(a]b', '^'.matchingR.'$')
"current flaw (not a problem if there's only one type of bracket, or if
"the code is well-formed)
echo "return -1:"
echo match('abc(a(b', '^'.matchingR.'$')
echo match('abc)a(b', '^'.matchingR.'$')
let [pattern, replacement, flags]=split(a:command, "/")
let pattern=substitute(pattern, '\\zm', EscapeRegex(matchingR), 'g')
execute a:line1.','.a:line2.'s/'.pattern.'/'.replacement.'/'.flags
After this, :'<,'>M/array\[\(\zm\)\]\[\(\zm\)\]/get(\1, \2)/g can be used to do the same task above (after selecting the text in visual mode)

Notepad++ RegeEx group capture syntax

I have a list of label names in a text file I'd like to manipulate using Find and Replace in Notepad++, they are listed as follows:
I want to rename them in Notepad++ to the following:
The Regex I'm using in the Notepad++'s replace dialog to capture the label name is the following:
I want to replace each capture group as follows:
\1 = Label_
\2 = A_One
\3 = A_Two
\4 = A_Three
\5 = B_One
\6 = B_Two
\7 = B_Three
My problem is that Notepad++ doesn't register the syntax of the regex above. When I hit Count in the Replace Dialog, it returns with 0 occurrences. Not sure what's misesing in the syntax. And yes I made sure the Regular Expression radio button is selected. Help is appreciated.
Tried escaping the parenthesis, still didn't work:
Ed's response has shown a working pattern since alternation isn't supported in Notepad++, however the rest of your problem can't be handled by regex alone. What you're trying to do isn't possible with a regex find/replace approach. Your desired result involves logical conditions which can't be expressed in regex. All you can do with the replace method is re-arrange items and refer to the captured items, but you can't tell it to use "A" for values 1-3, and "B" for 4-6. Furthermore, you can't assign placeholders like that. They are really capture groups that you are backreferencing.
To reach the results you've shown you would need to write a small program that would allow you to check the captured values and perform the appropriate replacements.
EDIT: here's an example of how to achieve this in C#
var numToWordMap = new Dictionary<int, string>();
numToWordMap[1] = "A_One";
numToWordMap[2] = "A_Two";
numToWordMap[3] = "A_Three";
numToWordMap[4] = "B_One";
numToWordMap[5] = "B_Two";
numToWordMap[6] = "B_Three";
string pattern = #"\bMyLabel_(\d+)\b";
string filePath = #"C:\temp.txt";
string[] contents = File.ReadAllLines(filePath);
for (int i = 0; i < contents.Length; i++)
contents[i] = Regex.Replace(contents[i], pattern,
m =>
int num = int.Parse(m.Groups[1].Value);
if (numToWordMap.ContainsKey(num))
return "Label_" + numToWordMap[num];
// key not found, use original value
return m.Value;
File.WriteAllLines(filePath, contents);
You should be able to use this easily. Perhaps you can download LINQPad or Visual C# Express to do so.
If your files are too large this might be an inefficient approach, in which case you could use a StreamReader and StreamWriter to read from the original file and write it to another, respectively.
Also be aware that my sample code writes back to the original file. For testing purposes you can change that path to another file so it isn't overwritten.
Bar bar bar - Notepad++ thinks you're a barbarian.
(obsolete - see update below.) No vertical bars in Notepad++ regex - sorry. I forget every few months, too!
Use [123456] instead.
Update: Sorry, I didn't read carefully enough; on top of the barhopping problem, #Ahmad's spot-on - you can't do a mapping replacement like that.
Update: Version 6 of Notepad++ changed the regular expression engine to a Perl-compatible one, which supports "|". AFAICT, if you have a version 5., auto-update won't update to 6. - you have to explicitly download it.
A regular expression search and replace for
works on Notepad 6.3.2
The outermost pair of brackets is for grouping, they limit the scope of the first alternation; not sure whether they could be omitted but including them makes the scope clear. The pattern searches for a fixed string followed by one of the two-digit pairs. (The leading zero could be factored out and placed in the fixed string.) Each digit pair is wrapped in round brackets so it is captured.
In the replacement expression, the clause (?4A_Three) says that if capture group 4 matched something then insert the text A_Three, otherwise insert nothing. Similarly for the other clauses. As the 6 alternatives are mutually exclusive only one will match. Thus only one of the (?...) clauses will have matched and so only one will insert text.
The easiest way to do this that I would recommend is to use AWK. If you're on Windows, look for the mingw32 precompiled binaries out there for free download (it'll be called gawk).
FS = "_0";
printf("Label_%s\n", a[$2]);
Execute on Windows as follows:
C:\Users\Mydir>gawk -f test.awk