How to trim characters of match in Vim? - regex

I want to trim the last five characters of a match in Vim. The search pattern is not a direct word, instead it is something like foo.*bar
Here I want to trim the last five characters of the above match.
I tried :g/foo.*bar/norm $5X
but this trims five characters at the end of the lines matching this pattern

:help :global is not the right tool for the job.
With a substitution:
:%s/foo.*bar/\=submatch(0)[:-6]/g
See :help sublist and :help submatch().

Related

Vim substitute - encase `\ref{eq:x}` with brackets

I have a latex document which has a bunch of strings of the form
Eq.~\ref{eq:x}
where x is in general a different string for each occurrence. I want to replace the above with
Eq.~(\ref{eq:x})
I can match some of the occurrences searching with /\\ref{eq:.*\} but this doesn't work if you have something like
blah Eq.~\ref{eq:x} something something \cite{this}
Note that I don't want to replace \ref{eq: with a latex macro which handles the brackets internally.
* is a greedy quantifier that will match as many characters as possible. So, if you have several } on the line, .*} will match every character up to the last } on the line.
You should use a non-greedy quantifier instead:
/\\ref{eq:.\{-}\}
See :help \{.

Escaping single quote for a specific pattern using vim

Consider the below line for example
'{"place":"buddy's home"}'
I want to replace the single quote in buddy's only. Single quotes at the start and end of line had to be intact. So the resulting line would look like.
'{"place":"buddy\'s home"}'
There could be multiple lines with multiple occurrences of such single quotes in each line. I have to escape all of them except at the start and end of line.
I'm able to find out such pattern using vim regex :/.'. This pattern ensures that single quote is surrounded by two characters and is not at start or at the end of line. But I'm having trouble how to replace the y's into y\'s at all places.
If the regex .'. is accurate enough then you can substitute all occurrences with:
:%s/.\zs'\ze./\\'/g
Instead of using \ze and \zs you could use groups (...) as well. However I find this version slightly more readable.
See :h /\zs and :h /\ze for further information.
:%s/\(.\)'\(.\)/\1\\'\2/gc
:%s/ substitute over the whole buffer (see :help range to explain the %)
\(.\) match a character and save it in capture group 1 (see :help \()
' a literal '
\(.\) match a character and save it in capture group 2
/ replace by
\1 capture group 1 (see :help \1)
\\' this is a \' (you need to escape the backslash)
\2 capture group 2
/gc replace globally (the whole line) and ask for confirmation (see :help :s_flags)
You can omit the c option if you are sure all replaces are legit.
As kongo2002 says in his answer you could replace the capture groups by \zs and \ze:
\zs will start a match and discard everything before
\ze will end a match and discard everything after
See :help \ze and :help \zs.

regex to remove everything after the last comma in a string

I would like to write a regex in Perl which will remove everything after the last comma in a string. I know the substring after the last comma is a number or some other substring, so no commas there.
Example: some\string,/doesnt-really.metter,5.
I would like the regex to remove the last comma and the 5 so the output would be: some\string,/doesnt-really.metter
I am not allowed to use any additional module only with regex. So which regex should I use?
Another example:
string_with,,,,,_no_point,some_string => string_with,,,,,_no_point
If the comma is always followed by one or more digits, you can use: s/,\d+$//. More generally, use s/,[^,]*$// (match a comma followed by zero or more non-comma characters followed by end-of-string).
This Regex captures everything before the last ,.
(.*),[^,]*$
perl -n -e 'chomp; s/(.+,)/$1/g; print "$_\n";' inputfile.txt
Just run this command directly on terminal, the regex just selects all text which comes before last comma).

Regex to change all past a certain pattern to Uppercase

I have an xml file that has a value like
JOBNAME="JBDSR14353_Some_other_Descriptor"
I am looking for an expression that will go through the file and change all of the characters in the quotes to Uppercase letters. Is there a Regex expression that will search for JOBNAME="Anything within the quotes" and change them to uppercase? Or a command that will find JOBNAME= and change all on that line to uppercase letters? I know that can just do a search for JOBNAME= and then use a VU command in vim to throw the line to uppercase store that to a macro and run that, but I was wondering if there was a way to get this done with a regex??
Here's an alternative with :substitute, as you had originally intended. This works better than #Zach's solution with gU_ when there's other text in the line:
:%s/JOBNAME="[^"]\+"/\U&/g
"[^"]\+" matches the quoted text (non-greedily by matching only non-quotes inside, to handle multiple quotes in a line)
\U turns the remainder of the replacement uppercase
for simplicity, the entire match (&) is uppercased here, but one could have also used capture groups (\(...\)), or match limiting with \zs
You can use the :g command which executes a command on lines that match a pattern:
:g/JOBNAME/norm! gU_
This will execute the gU_, which capitalizes all letters on a line, on all the lines that match JOBNAME
If there are other things on the same line that you don't want to capitalize, here is a solution for only the words in quotes:
:g/JOBNAME/norm! f"gU;
f" goes to the next quote. gU capitalizes with a motion. The motion used is ; which searches for the next " (repeats the last f command).
To do this with substitution you can use the \U atom which makes everything after it uppercase.
:%s/JOBNAME="\zs.*\ze"/\U&
\zs and \ze mark the start and end of the match and & is the whole match. This means that only the part between quotes is replaced.

interpreting regular expression in perl

I am trying to reverse engineer a Perl script. One of the lines contains a matching operator that reads:
$line =~ /^\s*^>/
The input is just FASTA sequences with header information. The script is looking for a particular pattern in the header, I believe.
Here is an example of the files the script is applied to:
>mm9_refGene_NM_001252200_0 range=chr1:39958075-39958131 5'pad=0 3'pad=0 strand=+
repeatMasking=none
ATGGCGAACGACTCTCCCGCGAAGAGCCTGGTGGACATTGACCTGTCGTC
CCTGCGG
>mm9_refGene_NM_001252200_1 range=chr1:39958354-39958419 5'pad=0 3'pad=0 strand=+
repeatMasking=none
GACCCTGCTGGGATTTTTGAGCTGGTGGAAGTGGTTGGAAATGGCACCTA
TGGACAAGTCTATAAG
This is a matching operator asking whether the line, from its beginning, contains white spaces of at least more than zero, but then I lose its meaning.
This is how I have parsed the regex so far:
from beginning [ (/^... ], contains white spaces [ ...\s... ] of at least more than zero [ ...*... }.
Using RegexBuddy (or, as r3mus said, regex101.com, which is free):
Assert position at the beginning of the string «^»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the beginning of the string «^»
Match the character “>” literally «>»
EDIT: Birei's answer is probably more correct if the regex in question is actually wrong.
You have to get rid of the second ^ character. It is a metacharacter and means the beginning of a line (without special flags like /m), but that meaning it's already achieved with the first one.
The character > will match at the beginning of the line without the second ^ because the initial whitespace is optional (* quantifier). So, use:
$line =~ /^\s*>/
It is much easier to reverse engineer perl script with debugger.
"perl -d script.pl" or if you have Linux ddd: "ddd cript.pl &".
For multiline regex this regex match for emptyline with spaces and begin of the next FASTA.
http://www.rexfiddle.net/c6locQg