putting text before the nth occurence in each line in Vim - regex

I have a situation like
something 'text' something 'moretext'
I forgot to add more spacing the first time I created this file and now on each line I should put some whitespace before the 2nd occurence of ' .
Now I can't build a regex for this.
I know that:
my command should begin with :%s because I want it to be executed on all lines
I should use the {2} operator to pick the 2nd occurence ?
If my regex will match something I can put stuff before the match with &
The main problem for me is how to build a regex to match the second ' using the {} notation, it's frustrating because I don't where it's supposed to be inserted or if I should use the magic or non-magic regex in vim.
The result I'm looking for
something 'text ' something 'moretext'

You can use
:%s:\v(^[^']*'[^']*)':\1 ':
[^'] means everything except '
\1 is a backreference to the first captured group (...)
Basically what your doing here is capturing everything up to the second quote, and replacing the line up to (and including) this quote with what you've captured, a space, and a quote.

{2} doesn't mean "the second match", it means "two matches" so it's completely useless for the task.
You could use a substitution like this one or the one in Robin's answer:
:%s/[^']*'[^']*\zs'/ '
Or you could use something like this:
:g/ '[^']*' /norm 2f'i<space>

Yet another way to do it:
:%s/\v%([^']*\zs\ze'){2}/ /
Note: I am using very magic, \v, to reduce amount of escaping.
This approach uses \zs and \ze to set the start and end of the match . The \zs and \ze get set multiple times because of the quantifier, {2} but each occurrence of the group will change the \zs and \ze positions.
For more help see:
:h /\zs
:h /\v
Of course there is always sed, but the trick is getting the quote escaped correctly.
:%!sed -e 's/'\''/ &/2'

Related

Append End of Line with Substring from Current Line [duplicate]

This question already has an answer here:
Replace with whole match value using Notepad++ regex search and replace
(1 answer)
Closed 9 months ago.
I've scoured Stack Overflow for something just like this and can't seem to come up with a solution. I've got some text that looks like this:
command.Parameters.Add("#Id
command.Parameters.Add("#IsDeleted
command.Parameters.Add("#MasterRecordId
command.Parameters.Add("#Name
...
And I would like the text to end up like this:
command.Parameters.Add("#Id", acct.Id);
command.Parameters.Add("#IsDeleted", acct.IsDeleted);
command.Parameters.Add("#MasterRecordId", acct.MasterRecordId);
command.Parameters.Add("#Name", acct.Name);
...
As you can see, I essentially want to append the end of the line with: ", acct.<word between # and second ">);
I'm trying this:
Find What: (?<=#).+?(?=\r) - This works, it finds the appropriate word.
Replace: \1", acct.\1); - This doesn't. It changes the line to (for Id):
command.Parameters.Add("#", acct.
Not sure what I'm doing wrong. I thought that \1 is supposed to be the "capture" from the "Find what" box, but it's not I guess?
The \1 backreference will only work if you have a capturing group in your pattern:
(?<=#)(.+?)(?=\r)
If you're not using a capturing group, you should use $& instead of \1 as a backreference for the entire match. Additionally, parentheses in the replacement string need to be escaped. So, the replacement string should be:
$&", acct.$&\);
You might also want to use $ instead of the Lookahead (?=\r) in case the last line isn't followed by an EOL character.
Having said all that, I personally prefer to be more explicit/strict when doing regex substitution to avoid messing up other lines (i.e., false positives). So I would go with something like this:
Find: (\bcommand\.Parameters\.Add\("#)(\w+)$
Replace: \1\2", acct.\2\);
Note that \w will only match word characters, which is likely the desired behavior here. Feel free to replace it with a character class if you think your identifiers might have other characters.
You could also omit the lookbehind, and match the # and then use \K to clear the current match buffer.
Then you can match the rest of the line using .+
Note that you don't have to make the quantifier non greedy .*? as you are matching the rest of the line.
In the replacement, use the full match using $0
See a regex demo for the matches:
Find what:
#\K.+
Replace with:
$0", acct.$0\)
If there must be a newline to the right, you might also write the pattern as one of:
#\K.+(?=\r)
#\K.+(?=\R)

Escaping single quote for a specific pattern using vim

Consider the below line for example
'{"place":"buddy's home"}'
I want to replace the single quote in buddy's only. Single quotes at the start and end of line had to be intact. So the resulting line would look like.
'{"place":"buddy\'s home"}'
There could be multiple lines with multiple occurrences of such single quotes in each line. I have to escape all of them except at the start and end of line.
I'm able to find out such pattern using vim regex :/.'. This pattern ensures that single quote is surrounded by two characters and is not at start or at the end of line. But I'm having trouble how to replace the y's into y\'s at all places.
If the regex .'. is accurate enough then you can substitute all occurrences with:
:%s/.\zs'\ze./\\'/g
Instead of using \ze and \zs you could use groups (...) as well. However I find this version slightly more readable.
See :h /\zs and :h /\ze for further information.
:%s/\(.\)'\(.\)/\1\\'\2/gc
:%s/ substitute over the whole buffer (see :help range to explain the %)
\(.\) match a character and save it in capture group 1 (see :help \()
' a literal '
\(.\) match a character and save it in capture group 2
/ replace by
\1 capture group 1 (see :help \1)
\\' this is a \' (you need to escape the backslash)
\2 capture group 2
/gc replace globally (the whole line) and ask for confirmation (see :help :s_flags)
You can omit the c option if you are sure all replaces are legit.
As kongo2002 says in his answer you could replace the capture groups by \zs and \ze:
\zs will start a match and discard everything before
\ze will end a match and discard everything after
See :help \ze and :help \zs.

deciphering vim regex

I'm playing with vim-ruby indent, and there are some pretty complex regexes there:
" Regex used for words that, at the start of a line, add a level of indent.
let s:ruby_indent_keywords = '^\s*\zs\<\%(module\|class\|def\|if\|for' .
\ '\|while\|until\|else\|elsif\|case\|when\|unless\|begin\|ensure' .
\ '\|rescue\):\#!\>' .
\ '\|\%([=,*/%+-]\|<<\|>>\|:\s\)\s*\zs' .
\ '\<\%(if\|for\|while\|until\|case\|unless\|begin\):\#!\>'
With the help of vim documentation I deciphered it to mean:
start-of-line <any number of spaces> <start matching> <beginning of a word> /atom
<one of provided keywords> <colon character> <nothing> <end of word> ...
I have some doubts:
Is it really matching ':'? Doesn't seem to work like that, but I don't see anything about colon being some special character in regexes.
why is there \zs (start of the match) and no \ze (end of the match)?
what does \%() do? Is it just some form of grouping?
:\#! says to match only if there is not a colon, if I read it correctly. I am not familiar with the ruby syntax that this is matching against so this may not be quite correct. See :help /\#! and the surrounding topics for more info on lookarounds.
You can have a \zs with no \ze, it just means that the end of the match is at the end of the regex. The opposite is also true.
\%(\) just creates a grouping just as \(\) would except that the group is not available as a backreference (like would be used in a :substitute command).
you can check about matching ':' or any other string by copying the regex and using it to perform a search with / on the code you are working. Using :set incsearch may help you to see what is being matched while you type the regex.
the \zs and \ze don't affect what is matched, but instead determine which part of matched text is used in functions as :s/substitute(). You can check that by performing searches with / and 'incsearch' option set - you can start a search for a string in the text, which will be highlighted, then adding \zsand \ze will change the highlight on the matched text. There is no need to "close" \zsand \ze, as one can discard only the start or the end of the match.
It is a form of grouping that is not saved in temporary variables for use with \1, \2 or submatch(), as stated in :h \%():
\%(\) A pattern enclosed by escaped parentheses.
Just like \(\), but without counting it as a sub-expression. This
allows using more groups and it's a little bit faster.

Vim regex backreference

I want to do this:
%s/shop_(*)/shop_\1 wp_\1/
Why doesn't shop_(*) match anything?
There's several issues here.
parens in vim regexen are not for capturing -- you need to use \( \) for captures.
* doesn't mean what you think. It means "0 or more of the previous", so your regex means "a string that contains shop_ followed by 0+ ( and then a literal ). You're looking for ., which in regex means "any character". Put together with a star as .* it means "0 or more of any character". You probably want at least one character, so use .\+ (+ means "1 or more of the previous")
Use this: %s/shop_\(.\+\)/shop_\1 wp_\1/.
Optionally end it with g after the final slash to replace for all instances on one line rather than just the first.
If I understand correctly, you want %s/shop_\(.*\)/shop_\1 wp_\1/
Escape the capturing parenthesis and use .* to match any number of any character.
(Your search is searching for "shop_" followed by any number of opening parentheses followed by a closing parenthesis)
If you would like to avoid having to escape the capture parentheses and make the regex pattern syntax closer to other implementations (e.g. PCRE), add \v (very magic!) at the start of your pattern (see :help \magic for more info):
:%s/\vshop_(*)/shop_\1 wp_\1/
#Luc if you look here: regex-info, you'll see that vim is behaving correctly. Here's a parallel from sed:
echo "123abc456" | sed 's#^([0-9]*)([abc]*)([456]*)#\3\2\1#'
sed: -e expression #1, char 35: invalid reference \3 on 's' command's RHS
whereas with the "escaped" parentheses, it works:
echo "123abc456" | sed 's#^\([0-9]*\)\([abc]*\)\([456]*\)#\3\2\1#'
456abc123
I hate to see vim maligned - especially when it's behaving correctly.
PS I tried to add this as a comment, but just couldn't get the formatting right.

RegEx: Grabbing values between quotation marks

I have a value like this:
"Foo Bar" "Another Value" something else
What regex will return the values enclosed in the quotation marks (e.g. Foo Bar and Another Value)?
In general, the following regular expression fragment is what you are looking for:
"(.*?)"
This uses the non-greedy *? operator to capture everything up to but not including the next double quote. Then, you use a language-specific mechanism to extract the matched text.
In Python, you could do:
>>> import re
>>> string = '"Foo Bar" "Another Value"'
>>> print re.findall(r'"(.*?)"', string)
['Foo Bar', 'Another Value']
I've been using the following with great success:
(["'])(?:(?=(\\?))\2.)*?\1
It supports nested quotes as well.
For those who want a deeper explanation of how this works, here's an explanation from user ephemient:
([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.
I would go for:
"([^"]*)"
The [^"] is regex for any character except '"'
The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct.
Lets see two efficient ways that deal with escaped quotes. These patterns are not designed to be concise nor aesthetic, but to be efficient.
These ways use the first character discrimination to quickly find quotes in the string without the cost of an alternation. (The idea is to discard quickly characters that are not quotes without to test the two branches of the alternation.)
Content between quotes is described with an unrolled loop (instead of a repeated alternation) to be more efficient too: [^"\\]*(?:\\.[^"\\]*)*
Obviously to deal with strings that haven't balanced quotes, you can use possessive quantifiers instead: [^"\\]*+(?:\\.[^"\\]*)*+ or a workaround to emulate them, to prevent too much backtracking. You can choose too that a quoted part can be an opening quote until the next (non-escaped) quote or the end of the string. In this case there is no need to use possessive quantifiers, you only need to make the last quote optional.
Notice: sometimes quotes are not escaped with a backslash but by repeating the quote. In this case the content subpattern looks like this: [^"]*(?:""[^"]*)*
The patterns avoid the use of a capture group and a backreference (I mean something like (["']).....\1) and use a simple alternation but with ["'] at the beginning, in factor.
Perl like:
["'](?:(?<=")[^"\\]*(?s:\\.[^"\\]*)*"|(?<=')[^'\\]*(?s:\\.[^'\\]*)*')
(note that (?s:...) is a syntactic sugar to switch on the dotall/singleline mode inside the non-capturing group. If this syntax is not supported you can easily switch this mode on for all the pattern or replace the dot with [\s\S])
(The way this pattern is written is totally "hand-driven" and doesn't take account of eventual engine internal optimizations)
ECMA script:
(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*')
POSIX extended:
"[^"\\]*(\\(.|\n)[^"\\]*)*"|'[^'\\]*(\\(.|\n)[^'\\]*)*'
or simply:
"([^"\\]|\\.|\\\n)*"|'([^'\\]|\\.|\\\n)*'
Peculiarly, none of these answers produce a regex where the returned match is the text inside the quotes, which is what is asked for. MA-Madden tries but only gets the inside match as a captured group rather than the whole match. One way to actually do it would be :
(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)
Examples for this can be seen in this demo https://regex101.com/r/Hbj8aP/1
The key here is the the positive lookbehind at the start (the ?<= ) and the positive lookahead at the end (the ?=). The lookbehind is looking behind the current character to check for a quote, if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character. The lookbehind group (the ["']) is wrapped in brackets to create a group for whichever quote was found at the start, this is then used at the end lookahead (?=\1) to make sure it only stops when it finds the corresponding quote.
The only other complication is that because the lookahead doesn't actually consume the end quote, it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched. Putting a word boundary on the opening quote (["']\b) helps with this, though ideally I'd like to move past the lookahead but I don't think that is possible. The bit allowing escaped characters in the middle I've taken directly from Adam's answer.
The RegEx of accepted answer returns the values including their sourrounding quotation marks: "Foo Bar" and "Another Value" as matches.
Here are RegEx which return only the values between quotation marks (as the questioner was asking for):
Double quotes only (use value of capture group #1):
"(.*?[^\\])"
Single quotes only (use value of capture group #1):
'(.*?[^\\])'
Both (use value of capture group #2):
(["'])(.*?[^\\])\1
-
All support escaped and nested quotes.
I liked Eugen Mihailescu's solution to match the content between quotes whilst allowing to escape quotes. However, I discovered some problems with escaping and came up with the following regex to fix them:
(['"])(?:(?!\1|\\).|\\.)*\1
It does the trick and is still pretty simple and easy to maintain.
Demo (with some more test-cases; feel free to use it and expand on it).
PS: If you just want the content between quotes in the full match ($0), and are not afraid of the performance penalty use:
(?<=(['"])\b)(?:(?!\1|\\).|\\.)*(?=\1)
Unfortunately, without the quotes as anchors, I had to add a boundary \b which does not play well with spaces and non-word boundary characters after the starting quote.
Alternatively, modify the initial version by simply adding a group and extract the string form $2:
(['"])((?:(?!\1|\\).|\\.)*)\1
PPS: If your focus is solely on efficiency, go with Casimir et Hippolyte's solution; it's a good one.
A very late answer, but like to answer
(\"[\w\s]+\")
http://regex101.com/r/cB0kB8/1
The pattern (["'])(?:(?=(\\?))\2.)*?\1 above does the job but I am concerned of its performances (it's not bad but could be better). Mine below it's ~20% faster.
The pattern "(.*?)" is just incomplete. My advice for everyone reading this is just DON'T USE IT!!!
For instance it cannot capture many strings (if needed I can provide an exhaustive test-case) like the one below:
$string = 'How are you? I\'m fine, thank you';
The rest of them are just as "good" as the one above.
If you really care both about performance and precision then start with the one below:
/(['"])((\\\1|.)*?)\1/gm
In my tests it covered every string I met but if you find something that doesn't work I would gladly update it for you.
Check my pattern in an online regex tester.
This version
accounts for escaped quotes
controls backtracking
/(["'])((?:(?!\1)[^\\]|(?:\\\\)*\\[^\\])*)\1/
MORE ANSWERS! Here is the solution i used
\"([^\"]*?icon[^\"]*?)\"
TLDR;
replace the word icon with what your looking for in said quotes and voila!
The way this works is it looks for the keyword and doesn't care what else in between the quotes.
EG:
id="fb-icon"
id="icon-close"
id="large-icon-close"
the regex looks for a quote mark "
then it looks for any possible group of letters thats not "
until it finds icon
and any possible group of letters that is not "
it then looks for a closing "
I liked Axeman's more expansive version, but had some trouble with it (it didn't match for example
foo "string \\ string" bar
or
foo "string1" bar "string2"
correctly, so I tried to fix it:
# opening quote
(["'])
(
# repeat (non-greedy, so we don't span multiple strings)
(?:
# anything, except not the opening quote, and not
# a backslash, which are handled separately.
(?!\1)[^\\]
|
# consume any double backslash (unnecessary?)
(?:\\\\)*
|
# Allow backslash to escape characters
\\.
)*?
)
# same character as opening quote
\1
string = "\" foo bar\" \"loloo\""
print re.findall(r'"(.*?)"',string)
just try this out , works like a charm !!!
\ indicates skip character
My solution to this is below
(["']).*\1(?![^\s])
Demo link : https://regex101.com/r/jlhQhV/1
Explanation:
(["'])-> Matches to either ' or " and store it in the backreference \1 once the match found
.* -> Greedy approach to continue matching everything zero or more times until it encounters ' or " at end of the string. After encountering such state, regex engine backtrack to previous matching character and here regex is over and will move to next regex.
\1 -> Matches to the character or string that have been matched earlier with the first capture group.
(?![^\s]) -> Negative lookahead to ensure there should not any non space character after the previous match
Unlike Adam's answer, I have a simple but worked one:
(["'])(?:\\\1|.)*?\1
And just add parenthesis if you want to get content in quotes like this:
(["'])((?:\\\1|.)*?)\1
Then $1 matches quote char and $2 matches content string.
All the answer above are good.... except they DOES NOT support all the unicode characters! at ECMA Script (Javascript)
If you are a Node users, you might want the the modified version of accepted answer that support all unicode characters :
/(?<=((?<=[\s,.:;"']|^)["']))(?:(?=(\\?))\2.)*?(?=\1)/gmu
Try here.
echo 'junk "Foo Bar" not empty one "" this "but this" and this neither' | sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/>\1</g'
This will result in: >Foo Bar<><>but this<
Here I showed the result string between ><'s for clarity, also using the non-greedy version with this sed command we first throw out the junk before and after that ""'s and then replace this with the part between the ""'s and surround this by ><'s.
From Greg H. I was able to create this regex to suit my needs.
I needed to match a specific value that was qualified by being inside quotes. It must be a full match, no partial matching could should trigger a hit
e.g. "test" could not match for "test2".
reg = r"""(['"])(%s)\1"""
if re.search(reg%(needle), haystack, re.IGNORECASE):
print "winning..."
Hunter
If you're trying to find strings that only have a certain suffix, such as dot syntax, you can try this:
\"([^\"]*?[^\"]*?)\".localized
Where .localized is the suffix.
Example:
print("this is something I need to return".localized + "so is this".localized + "but this is not")
It will capture "this is something I need to return".localized and "so is this".localized but not "but this is not".
A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5.5 and this gives the following code
Sub TestRegularExpression()
Dim oRE As VBScript_RegExp_55.RegExp '* Tools->References: Microsoft VBScript Regular Expressions 5.5
Set oRE = New VBScript_RegExp_55.RegExp
oRE.Pattern = """([^""]*)"""
oRE.Global = True
Dim sTest As String
sTest = """Foo Bar"" ""Another Value"" something else"
Debug.Assert oRE.test(sTest)
Dim oMatchCol As VBScript_RegExp_55.MatchCollection
Set oMatchCol = oRE.Execute(sTest)
Debug.Assert oMatchCol.Count = 2
Dim oMatch As Match
For Each oMatch In oMatchCol
Debug.Print oMatch.SubMatches(0)
Next oMatch
End Sub