Vim matches a rectangle area - regex

I want to match a rectangle area in Vim using a regex expression, for example:
abcd test1
abcd test2
I want to match test1 and test2 at once, but not abcds.
(test1 and test2 are constant, we don't need to consider [0-9], that's just an example)
I want to match every column-aligned test1 test2
This
test1
test2
the rectangle area may appear anywhere, I can't assume it is at "column 3" or something of that sort.
If they are not aligned, don't match it.
I tried \1\#<=test1\n\(.*\)\#<=test2 but no luck, because lookahead breaks a group. (from :help \\#<=)
Does anyone know how to do it with only vim-regex? Thanks.
Edit:
A complicated example may be this one:
aaaaaaaaa
b test1 b
c test2 c
ddddddddd
match only test1 and test2.
Usin two or more regex is acceptable (one for test1 and the other for test2?)
Edit2:
This is just for fun, I am just curious about how much vim can achieve, it's not a serious problem, it may be boring and meaningless for many people and that is fine with me, please don't be bothered, good night :)

Simply searching for /test[0-9] will suffice. But I think the spirit of the question is really more about visual blocks. In visual mode you can use text objects for movement. So, in this case:
Search for test1.
Press Control-V (to turn on visual block mode)
Press w to visually select the entire word.
Press j to visually select the next word in the column below the first one. (use a range to extend this rectangular block, e.g. 10j would visually select the next ten items in that column.)

Try the following to find the match you need
The syntax is
/somethingWeAreLooking\(_.\)*followedByTheOtherThing
In this case it will be like this:
/test\(_.\)*[1-9]

Related

RegEx - Order of OR'd values in capture group changes results

Visual Studio / XPath / RegEx:
Given Expression:
(?<TheObject>(Car|Car Blue)) +(?<OldState>.+) +---> +(?<NewState>.+)
Given Searched String:
Car Blue Flying ---> Crashed
I expected:
TheObject = "Car Blue"
OldState = "Flying"
NewState = "Crashed"
What I get:
TheObject = "Car"
OldState = "Blue Flying"
NewState = "Crashed"
Given new RegEx:
(?<TheObject>(Car Blue|Car)) +(?<OldState>.+) +---> +(?<NewState>.+)
Result is (what I want):
TheObject = "Car Blue"
OldState = "Flying"
NewState = "Crashed"
I conceptually get what's happening under the hood; the RegEx is putting the first (left-to-right) match it finds in the OR'd list into the <TheObject> group and then goes on.
The OR'd list is built at run time and cannot guarantee the order that "Car" or "Car Blue" is added to the OR'd list in <TheObject> group. (This is dramatically simplified OR'd list)
I could brute force it, by sorting the OR'd list from longest to shortest, but, I was looking for something a little more elegant.
Is there a way to make <TheObject> group capture the largest it can find in the OR'd list instead of the first it finds? (Without me having to worry about the order)
Thank you,
I would normally automatically agree with an answer like ltux's, but not in this case.
You say the alternation group is generated dynamically. How frequently is it generated dynamically? If it's every user request, it's probably faster to do a quick sort (either by longest length first, or reverse-alphabetically) on the object the expression is built from than to write something that turns (Car|Car Red|Car Blue) into (Car( Red| Blue)?).
The regex may take a bit longer (you probably won't even notice a difference in the speed of the regex) but the assembly operation may be much faster (depending on the architecture of the source of your data for the alternation list).
In simple test of an alternation with 702 options, in three methods, results are comparable using an option set like this, but none of these results are taking into calculation the amount of time to build the string, which grows as the complexity of the string grows.
The options are all the same, just in different formats
zap
zap
yes
xerox
...
apple
yes
zap
yes
xerox
...
apple
xerox
zap
yes
xerox
...
apple
...
apple
zap
yes
xerox
...
apple
Using Google Chrome and Javascript, I tried three (edit: four) different formats and saw consistent results for all between 0-2ms.
'Optimized factoring' a(?:4|3|2|1)?
Reverse alphabetically sorting (?:a4|a3|a2|a1|a)
Factoring a(?:4)?|a(?:3)?|a(?:2)?|a(?:1)?. All are consistently coming in at 0 to 2ms (the difference being what else my machine might be doing at the moment, I suppose).
Update: I found a way that you may be able to do this without sorting in Regular Expressions, using a lookahead like this (?=a|a1|a2|a3|a4|a5)(.{15}|.(14}|.{13}|...|.{2}|.) where 15 is the upper bound counting all the way down to the lower bound.
Without some restraints on this method, I feel like it can lead to a lot of problems and false positives. It would be my least preferred result. If the lookahead matches, the capture group (.{15}|...) will capture more than you'll desire on any occasion where it can. In other words, it will reach ahead past the match.
Though I made up the term Optimized Factoring in comparison to my Factoring example, I can't recommend my Factoring example syntax for any reason. Sorted would be the most logical, coupled with easier to read/maintain than exploiting a lookahead.
You haven't given much insight into your data but you may still need to sort the sub groups or factor further if the sub-options can contain spaces and may overlap, further diminishing the value of "Optimized Factoring".
Edit: To be clear, I am providing a thorough examination as to why no form of factoring is a gain here. At least not in any way that I can see. A simple Array.Sort().Reverse().Join("|") gives exactly what anyone in this situation would need.
The | operator of regular expression usually uses Aho–Corasick algorithm under the hood. It will always stop at the left most match it found. We can't change the behaviour of | operator.
So the solution is to avoid using | operator. Instead of (Car Blue|Car) or (Car|Car Blue), use (Car( Blue)?).
(?<TheObject>(Car( Blue)?) +(?<OldState>.+) +---> +(?<NewState>.+)
Then the <TheObject> group will always be Car Blue in the presence of Blue.

RegEx for square brackets' string but not vector's index, it's possible?

I'm using Habour in Sublime Text 3.
How can I create a regex for square brackets string like below:
a:= [text] // same as a:= "text"
b:= [3] // same as b:= "3"
c:= {2,[text]} // same as c:= {2,"text"}
d:=[text] // same as d:="text"
Funtion([text]) // Same as Function("text")
but not include vector index, like:
aVet[index] // Same as aVet[1], aVet[2]...
e:= aVet[index] // Same as aVet[1], aVet[2]...
f:= aVet[2,3] // Same as aVet[1,2], aVet[2,5]...
g:= aVet[CONSTANT] // Same as aVet[FOO], aVet[BAR]...
this should work for you:
[^a-zA-Z0-9\s]\s*(\[.*?\])
Regex101
vet\[.*?\]|(\[.*?\])
This assumes that vector indices always starts with a vet. You must add a tag of whichever language you're using to clear up this confusion. Anyway, the the code above should do the trick. Follow the link for a detailed breakdown of what's happening behind the curtain.
The code above might not be that intuitive, but the basic idea is this: the engine looks for the statements that has the word vet followed by square brackets in it. If there is one it matches it. If it doesn't it captures the one on the right side (what we want). The only issue is, if you add comments in your code that has square brackets in them, it might capture those too. If you plan to do so, the regex needs to be modified for more conditions, but this will work as long you don't do that. Let me know if that is not the case.
first things, first. I'm using Harbour in Sublime Text 3.
iismathwizard, your code almost does the magic.
[^a-zA-Z0-9\s]\s*([.*?])
It always gets the first character before the right's square brackets, like:
"=" in a, b and d examples
"," in c example
"(" in function example
However, it doesn't gets any vector's index.
user41235, your code not exclude the vector's index.
vet[.?]|([.?])
I added a few examples for more detail.
Sorry my english...

EditPad: Need a regex that handles multiple possible data formats

First, I'm using EditPadPro for my regex cleaning, so any answers given should work within that environment.
I get a large spreadsheet full of data that I have to clean every day. I've managed to get it down to a couple of different regexes that I run, and this works... but I'm curious to see if it's possible to reduce down to a single regex.
Here is some sample data:
3-CPC_114851_70095_70095_CAN-bre
3-CPC_114851_70095_70095_CAN
b11-ao1-113775-bre
b7-ao-114441
b7-ao-114441-bre
b7-ao1-114441
b7-ao1-114441-bre
http://go.nlvid.com/results1/?http://bo
go.nlv/results1/?click
b4-sm-1359
b6-sm-1356-bre
1359_195_1453814569-bre
1356_104_1456856729
b15-rad-8905
b15-rad-8905-bre
Here is how the above data needs to end up:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
So, there are numerous rules, such as:
In cases of more than 2 underscores, the result needs to contain only the value immediately after the first underscore, and everything from the dash onwards.
In cases where the string contains "-ao-", "-ao1-", everything prior to the final numeric string should be removed.
If a question mark is present, everything from the mark onwards should be removed.
If the string contains "-sm-" or "-rad-", everything prior to those alpha strings should be removed.
If the string contains 2 underscores, averything after the first numeric string up to a dash
(if present) should be removed, and the string "sm-" should be prepended.
Additionally there is other data that must be left untouched, including but not limited to:
113535|24905|24905
as well as many variations on this pattern of xxxxxx|yyyyy|zzzzz (and not always those string lengths)
This may be asking way too much of regex, I'm not sure as I'm not great with it. But I've seen some pretty impressive things done with it, so I thought I'd put this out to the community and see what you come back with.
Jonathan, I can wrap all of those into one regex, except the last one (where you prepend sm- to a string that does not contain sm). It is not possible in this context, because we cannot capture "sm" to reuse in the replacement, and because there is no "conditional replacement" syntax in EPP.
That being said, you can achieve what you want in EPP with two regexes and one macro to chain the two.
Here is how.
The solution below is tested in EPP.
Regex 1
Press Ctrl + Sh + F to enter Search / Replace mode
Enter the following Search and Replace in the appropriate boxes
At the top right of the Search bar, click the Favorite Searches pull-down, select "Add", give it a name, e.g. Regex 1
Search:
(?mx)^
(?=(?:[^_\r\n]*?_){3})[^_\r\n]+?_([^_\r\n]+)[^-\r\n]+(-[^\r\n]+)?
|
[^\r\n]*?-ao1?-\D*([^\r\n]+)
|
([^\r\n?]*)(?=\?)[^\r\n]+
|
[^\r\n]*?-((?:sm|rad)-[^\r\n]+)
Replace:
\1\2\3\4\5
Regex 2
Same 1-2-3 steps as above.
Search
^(?!(?:[^_\r\n]*?_){3})(?=(?:[^_\r\n]*?_){2})(\d+)(?:[^-\r\n]+(-[^\r\n]+)?)
Replace
sm-\1\2
Chaining Regex 1 and Regex 2
Top menu: Macros, Record Macro, give it a name.
Click the Favorite searches pulldown, select Regex 1
Hit Replace All.
Click the Favorite searches pulldown, select Regex 2
Hit Replace All.
Macros, Stop recording.
Whenever you want to do your sequence of replacements, pull it by name under the Macros menu.
Testing This
I have tested my "Jonathan macro" on your input. Here is the result:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
Try this:
Toggle the Search Panel : SHIFT+CTRL+F
SEARCH: .*?((?:sm-|rad-)?(?:(?:\d+|[\w\.]+\/.*?))(?:-\w+)?$)
REPLACE: $1
Check REGEX and WORDS
Click Replace All or Hit CTRL+ALT+F3
Check the image below:

How do I substitute selected contents despite any regex characters in vim?

In following code:
int return_int_func() { return 0; }
float fv = return_int_func();
Obviously, compiler will warn me fv may lost precisions because of auto-casting. Face lots of those things, I want replace all stuffs with substitute command. In short, I want this:
float fv = static_cast<float>(return_int_func());
But real codes has lots of forms like that:
float fv = obj.int_field;
float fv = obj->load_int_field("name");
float fv = xx.yy->zz;
I want select my target (obj.int_field,obj->load_int_field("name") or xx.yy->zz) and replace it with static_cast<float>(\1). I tried this:
:'<,'>s/\%V/static_cast<float>(&)/g
But vim replaces all characters in selected word with static_cast... and that isn't what I want at all. Then I tried this:
:'<,'>s/\(\%V\)/static_cast<float>(\1)/g
Vim also do the same thing. I have no idea how to replace whole content (and despite any regex characters) with my pattern. Any suggestions?
The solution is almost too easy! Here it is.
:s/\%V.*\%V./static_cast<float>(&)/
This is actually almost the same as the example from the :help. We can take away from this that we should all just have looked up :h /\%V first thing in the morning ...
\%V is a zero-width atom that matches stuff that is selected in Visual mode. Here it can match at the start of the Visual area. .* then matches (greedily) as much as it can; its greediness is reined in by the final \%V., which requires the last character of the match also to lie within the Visual area.
Tip: If you need to make this change many times over many lines, define the following mappings (even better: put them in your vimrc permanently).
nnoremap & :&&<CR>
xnoremap & :&&<CR>
Then you can repeat the substitution shown above by simply selecting something, and then pressing & to perform the substitution.
Let me try to paraphrase your question: You would like to Visual select some text, and then perform a substitution, where the selected text is also part of the replacement text.
I think in this case a macro is a much better tool.
To create the macro, first select the first piece of text that you want to wrap in the static cast. For example, select return_int_func(). (For each step, I'll show what the buffer looks like.)
When you're ready, press qq to start recording into register q, then press c.
float fv = |;
Type the left part of your wrapper text, static_cast<float>(.
float fv = static_cast<float>(|;
Press CTRL-R " (Control-R followed by "): this will reinsert the original text.
float fv = static_cast<float>(return_int_func()|;
Type ) to complete the change, and then Escape to leave insert mode.
float fv = static_cast<float>(return_int_func()▉;
Finally, press q to stop recording.
At this point you have made the first change and also recorded it as a macro in register q.
For all remaining changes, simply select a target such as obj->load_int_field("name") and press #q to repeat the change.
Look up :help 10.1 for more information about macros.
The \%V facility is really not for acting on the selected text as a whole; more for searching inside of that text.
Assuming that you are going to be putting this into a function or maybe mapping this to a key combination, here is an alternative approach that does what you are looking for:
:exec 'normal! gv"adistatic_cast<float>('|exec 'normal! "apa)'
Note that this will use your a buffer, so if you want to use another buffer you can change the two instances of "a with "x, where x is the buffer you wish to use.
Basically this is going to programmatically yank the selected text, insert static_cast<float>(, paste the text that was selected, and then insert ).
You can try
:%s/float fv =(.*)$/float fv = static_cast<float>(\1)/g
if I understood you right, you want to do text substitution only on selected text. This is not so easy to do, at least no so easy as a :s command can do. Because, your visual selection can be in single line, can cross multi lines, also it could char-wise, line-wise, block-wise..
but it can be done with this function:
function! SubVisualText(pat, repl,flag)
try
let v_save = #v
normal! gv"vy
let s = #v
let s = substitute(s, a:pat, a:repl,a:flag)
call setreg('v',s,visualmode())
normal! gv"vp
finally
let #v = v_save
endtry
endfunction
you use it by:
source the function
visual select the area (could be done by v, V or Ctrl-V)
:<ctrl-u>call SubVisualText(pattern, replacement, flag)<Enter>
the <ctrl-u> is for removing the leading range, since the function doesn't need the range.
when you run it, it looks like:(I just tested with Ctrl-V selection)

Merge multiple REGEX statements

I have an example string.
This should be **bold**, *indented* or ***bold and indented***.
The string is parsed using 3 regex that run one after another to get the following result:
This should be <b>bold</b>, <i>indented</i> or <b><i>bold and indented</i></b>.
It's simple and works fine. However, I'd like to save me a few lines (if it's possible, prettier, and more efficient, then why not eh?), and merge them. To make all the replacements in a single regex statement. Is it possible with extra efficiency? or should I leave it as is? (even if I should, I'd like to see a possible solution?)
My matching statements:
\*\*\*(.+?)\*\*\* -> <b><i>$1</b></i>
\*\*(.+?)\*\* -> <b>$1</b>
\*(.+?)\* -> <i>$1</i>
Honestly, keeping them as 3 separate regexes is almost certainly...
More readable
Simpler
(Due to #1 and #2) More maintainable.
Fewer lines is not always better, especially when it comes to regexes.
Also, you only actually need 2 regexes - the bold one and the italic one. Just always run the bold one first:
***foo***
becomes, after the bold regex...
*<b>foo</b>*
and then the italic regex makes that...
<i><b>foo</b></i>
Which is the correct output. (The reason for running the bold one first is because the italic one would match *** as <i>*</i> which is wrong.)