I'm cleaning up some CSS for our company and in many of the classes there are missing semicolons. I've tried a number of different ways to make this happen in Vim, but so far I haven't found a viable solution. Below is an example of one class that seems to be part of a theme in this file.
What I'm thinking... if there's a \w \s \{ || \w\; then don't return true. Otherwise if there's a \:\s\w and no ; at the end of a line then return true.
.ribbon_table,
.tabbed_interface_section_table,
table#monthview_table,
table#weekview_table_table {
border-collapse: collapse
}
.btn.active.focus,
.btn.active:focus,
.btn.focus,
.btn:active.focus,
.btn:active:focus,
.btn:focus,
a:focus,
button:focus,
form:focus,
input:focus,
select:focus,
textarea:focus {
outline: 0;
}
blockquote {
font-size: 14px;
border: 0;
}
.caret {
border-top-style: solid
}
I would do something like :g/\v^\s+\S+:.+[^;]$/norm A;.
In case you're unfamiliar with these commands, here's the vim documentation for :g (:global) and :norm (:normal).
:g :global E147 E148
:[range]g[lobal]/{pattern}/[cmd]
Execute the Ex command [cmd] (default ":p") on the
lines within [range] where {pattern} matches.
:norm[al][!] {commands} :norm :normal
Execute Normal mode commands {commands}. This makes
it possible to execute Normal mode commands typed on
the command-line. {commands} are executed like they
are typed. For undo all commands are undone together.
Execution stops when an error is encountered.
If the [!] is given, mappings will not be used.
Without it, when this command is called from a
non-remappable mapping (:noremap), the argument can
be mapped anyway.
{commands} should be a complete command. If
{commands} does not finish a command, the last one
will be aborted as if <Esc> or <C-C> was typed.
This implies that an insert command must be completed
(to start Insert mode, see :startinsert). A ":"
command must be completed as well. And you can't use
"Q" or "gQ" to start Ex mode.
The display is not updated while ":normal" is busy.
{commands} cannot start with a space. Put a count of
1 (one) before it, "1 " is one space.
The 'insertmode' option is ignored for {commands}.
This command cannot be followed by another command,
since any '|' is considered part of the command.
This command can be used recursively, but the depth is
limited by 'maxmapdepth'.
An alternative is to use :execute, which uses an
expression as argument. This allows the use of
printable characters to represent special characters.
Example:
:exe "normal \<c-w>\<c-w>"
On the example you gave, :%s/\(\w*:\s*\w*\)$/\1; works fine.
Since it is not very readable, let's add some details:
Look for a word \w*
Followed by a :
Followed by between 0 and n spaces \s*
Followed by a word \w*
Followed by the end of a line $
You save this whole expression (let's call it expr) using parenthesis \(expr\), and use it again in the second hand using \1.
Thus, you can add the missing ; using \1;.
Related
If any of you are familiar with mainframe JCL.
I'm trying to match the last line of the job card.
Basically the first line that starts with // and ends without a comma.
In the example I need the 3rd line or up to the 3rd line matched.
I'm using Ansible's lineinfile to dynamically insert a route card after the job card.
For example:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
So far I got this, which matches the start of // and anything after, but, I cant figure out the last part
^(\Q//\E(.)*)
Parsing JCL in the general case is hard. As noted in the comments, the rules are full of caveats.
I have an ANTLR4 grammar for JCL, it's MIT licensed. Possibly of use. It reflects the beauty of JCL.
To match the whole job card (in this case 3 lines):
(?sm)\A.*?\/\/[^*]((?!\/\*)[^\n])*[^,]$
See live demo.
Breaking this down:
(?sm)
s enables the DOTALL flag (meaning . matches new lines too)
m enables the MUTLILINE flag (meaning ^ and $ match start and end of lines
\A means start of input (so it only matches at the very start)
.*? means anything, but as little as possible
//[^*]
((?!\/\*)[^\n])* means non-new lines, except the sequence /* (so don't match when a comment is put in line)
[^,] not a comma
$ end of line
In English: "match from the start until there's a non-comma at the end of a line that is not a comment, or does not end with a comment"
You would then replace with $0 (group zero is the entire match) followed by your injected content:
$0\\n*ROUTE statement
You can use a negative lookbehind for this: (?<!,).
But you'll also need to insert after the firstmatch and use backrefs.
Given the task:
- lineinfile:
path: file.jcl
regexp: '^(\/\/.*)(?<!,)$'
line: "\\1\\n//*ROUTE statement"
firstmatch: true
backrefs: true
You would end up, from your example, with:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0,
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID
//*ROUTE statement
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
For the general case this is tougher than you think because of comments allowed within the scope of the JOB card.
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
The strings you show:
<--- start of job card
LINES=(999999,WARNING),
<--- end of job card
are all valid as comments in JCL because they follow a space.
You can even have whole comment lines within the JOB card. For example:
//name JOB (accounting info),'data capture ___',
//* TYPRUN=SCAN,
// NOTIFY=&SYSUID,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=(5,00),
// REGION=5M
So you're not necessarily looking for the first card that doesn't end in a comma unless you can restrict the JCL you're looking at.
Your JOB card starts with //name JOB and ends just before the next //name card. *** edit *** As was correctly pointed out, the JOB card could be followed by a card which does not require a name field, like // SET for example. See https://www.ibm.com/docs/en/zos/2.4.0?topic=statements-jcl-statement-fields *** end of edit ***
It starts with ^(\Q//\E)[A-Z0-9]+\s+\QJOB\E.+
and ends just before the next named card ^(\Q//\E)[A-Z0-9]+\s+
But I don't know regular expressions well enough to find the "just before" point to insert your new line. Hopefully someone else can add that.
I have the following lines:
source = "git::ssh://git#github.abc.com/test//bar"
source = "git::ssh://git#github.abc.com/test//foo?ref=tf12"
resource = "bar"
I want to update any lines that contain source and git words by adding ?ref:tf12 to the end of the line but inside ". If the line already contains ?ref=tf12, it should skip
source = "git::ssh://git#github.abc.com/test//bar?ref=tf12"
source = "git::ssh://git#github.abc.com/test//foo?ref=tf12"
resource = "bar"
I have the following expression using sed, but it outputs wrongly
sed 's#source.*git.*//.*#&?ref=tf12#' file.tf
source = "git::ssh://git#github.abc.com/test//bar"?ref=tf12
source = "git::ssh://git#github.abc.com/test//foo"?ref=tf12?ref=tf12
resource = "bar"
Using simple regular expressions for this is rather brittle; if at all possible, using a more robust configuration file parser would probably be a better idea. If that's not possible, you might want to tighten up the regular expressions to make sure you don't modify unrelated lines. But here is a really simple solution, at least as a starting point.
sed -e '/^ *source *= *"git/!b' -e '/?ref=tf12" *$/b' -e 's/" *$/?ref=tf12"/' file.tf
This consists of three commands. Remember that sed examines one line at a time.
/^ * source *= *"git/!b - if this line does not begin with source="git (with optional spaces between the tokens) leave it alone. (! means "does not match" and b means "branch (to the end of this script)" i.e. skip this line and fetch the next one.)
/?ref=tf12" *$/b similarly says to leave alone lines which match this regex. In other words, if the line ends with ?ref=tf12" (with optional spaces after) don't modify it.
s/"* $/?ref=tf12"/ says to change the last double quote to include ?ref=tf12 before it. This will only happen on lines which were not skipped by the two previous commands.
sed '/?ref=tf12"/!s#\(source.*git.*//.*\)"#\1?ref=tf12"#' file.tf
/?ref=tf12"/! Only run substitude command if this pattern (?ref=tf12") doesn't match
\(...\)", \1 Instead of appending to the entire line using &, only match the line until the last ". Use parentheses to match everything before that " into a group which I can then refer with \1 in the replacement. (Where we re-add the ", so that it doesn't get lost)
The overall goal here is to remove a block of text starting with a particular string and ending with a positive lookahead. From the testing I've done, it seems that newlines are causing the problem, but I'm not sure what exactly is going on or the best way to fix it.
More context: I want to remove taxa from a .fasta file, including the taxon name and header information and the associated sequence. (fasta format begins with a header >locusname-locusnumber-species_name |locusname-locusnumber \n). Missing data in the sequence is coded as "-". Eventually I would like to do this for several species_names and do so for each of several thousand files in a directory.
I presumed this would be a simple task to do as a perl one-liner in bash (Ubuntu 18.04.2).
As an example, from the excerpt below I would like to remove the entire sequence of Pseudomymrex seminole D1367, i.e. the string that starts with >uce-483_Pseudomyrmex_seminole_D1367 |uce-483 and ends with the newline before >uce-483_Pseudomyrmex_seminole_D1435. . ..
For this, I have: perl -pe 's/>(.)+(Pseudomyrmex_seminole_D1367)[\s\S]+(?=>)//' infile.fasta > outfile.fasta
or equivalently perl -pe 's/>(.)+(Pseudomyrmex_seminole_D1367(.)+(?=>)//s' infile.fasta > outfile.fasta
Both of these seem to have no effect at all (i.e. diff infile.fasta outfile.fasta is empty.) If I remove the positive lookahead, it works correctly but only up to the first newline.
Here's an excerpt from the .fasta for context and testing:
>uce-483_Pseudomyrmex_seminole_D1366 |uce-483
------------------------------------------------------------
---------------------------------------------------tgtaaacgt
tataatacatgcgtatgaaaaaaaaaagtgaacacccggtacgtacccgtgctgaaacgt
tcagatttacatccatttgtagtagcattttcgctagttttttcaagagcaaaaaggaca
cattcaaaactgaatatacatgtcacagatgtttgtttgtgtgcaggtacctgtaatttt
gcaaacatatacctatatatgtgtgtcgcatatatatcatgtagtagatttccatgttat
gcaacatcttctcacaatgacaatcggtcgtttccttcactccgaaatgttcatgcgaac
agttaatctatatcccaagcagcgatgtaatgttatgcggcgcgcaagtctcattagact
tgtaaaccgtccgagtttcgacttaccata----tgtgtgtgtgtgcgcgcgtatgtgca
cgtac------acacgtttgtttatacatttgtctatacatttgcgtgtgaacgcgggat
gaacagagatttgcgcacacatagacatgagaaacgtcacttgtcgatgtagatactaat
tgtggaaaatacatattcctcttcagatacacgggaatgttgaattattttcactcgctc
cacgcgcgagtgttcgctccttttacgcacaacgagtccttctgctgcagc--gagatag
aaaatatttttgcgcggtaatcgtaaacgtatgagtgcctttcgacgtgaattctcttat
ggcagttctcacggtgtaaattataatcgaattaacattgcgagtgtgatctcaatataa
ttatagcgtctaagaacaaacacgtaacatgcacacacacacacacacac----------
---
>uce-483_Pseudomyrmex_seminole_D1367 |uce-483
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--ttcaaaactgaatatacatgtcacagatgtttgtttgtgtgcaggtacctgtaatttt
gcaaacatatg---atatatatgtgtcgcatatatatcatgtagtagatttccatgttat
gcaacatcttctcacaatgacaatcggtcgtttccttcactctgaaatgttcatgcgaac
agttaatctatatcccaagcagcgatgtaatgttatgcggcgcgcaagtctcattagact
tgtaaaccgtccgagtttcgacttaccata--tgtgtgtgtgtgtgtgcgcgtatgtgca
cgtacgcgcgcacacgtttgtttatacatttgtctatacatttgcgtgtgaacgcgggat
gaacagagatttgcgcacacatagacatgagaaacgtcacttgtcgatg-----------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
---
>uce-483_Pseudomyrmex_seminole_D1435 |uce-483
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-------tacatccatttgtagtagcattttcgctagttttttcaagagcaaaaaggaca
cattcaaaactgaatatacatgtcacagatgtttgtttgtgtgcaggtacctgtaatttt
gcaaacatatacctatatatgtgtgtcgcatatatatcatgtagtagatttccatgttat
gcaacatcttctcacaatgacaatcggtcgtttccttcactccgaaatgttcatgcgaac
agttaatctatatcccaagcagcgatgtaatgttatgcggcgcgcaagtctcattagact
tgtaaaccgtccgagtttcgacttaccata--tgtgtgtgtgtgtgtgcgcgtatgtgca
cgtac------acacgtttgtttatacatttgtctatacatttgcgtgtgaacgcgggat
gaacagagatttgcgcacacatagacatgagaaacgtcacttgtcgatgtagatactaat
tgtggaaaatacatattcctcttcagatacacgggaa-----------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
---
With -p (or -n) the one-liner is reading a line at a time; so it just can't match multiline patterns. One solution is to "slurp" the whole file in, if it isn't too large (see end for line-by-line solution)
perl -0777 -pe'...' in > out
See Command Switches in perlrun.
Then, the code shown in the question has an unbalanced parenthesis and it doesn't compile. Further, there is no reason to capture those .s so drop the parentheses around. Next, the pattern
s/>.+Pseudomyrmex_seminole_D1367...//;
matches everything from the very first > to the name of interest, so all preceding sequences are matched and removed as well. Instead, match >[^>]+...D1367 for example, so everything that isn't > after a >, to that phrase.
Finally, the last .+(?=>) will match everything to the very last > and thus the regex will remove all following sequences, not what you want according to the description. Instead, limit it to match to the first following >, either by making it "non-greedy" with .+?(?=>) or, more simply, with [^>]+.
All corrected
perl -0777 -pe's/>[^>]+?Pseudomyrmex_seminole_D1367[^>]+//' in > out
Note that there is no need for /s modifier now, since its purpose is to make . match a newline and here we don't need that since the [^>] does match newlines as well (anything other than >). The quantifier is +? to (hopefully) prevent backtracking each whole sequence that doesn't match.
Or, with your original use of lookahead
perl -0777 -pe's/>[^>]+?Pseudomyrmex_seminole_D1367.+?(?=>)//s' in > out
These work as expected with your sample, as well as with an extended example I made up with further sequences (>...) added.
For reference, and since a fasta file can be too big to slurp into a string, here it is line by line.
Once you see the >... line of interest set a flag; print a line if that flag isn't set (and if we aren't on that very line). Once you reach the next > clear the flag (print that line, too).
perl -ne'
if (/^>.+?Pseudomyrmex_seminole_D1367/) { $f = 1 }
elsif (not $f) { print }
elsif (/^>/) { $f = 0; print }
' in > out
I suspect that this may also perform considerably better on very large files.
The regex in the first solution has to scan each sequence whole in order to find that it is not the one of interest; it is only once it hits the next > that it can decide that the sequence doesn't match (and with no backtracking, hopefully, since +? would've stopped it had the right phrase been encountered).
Here the code mostly checks the first character and a flag.
So it's an incomparably lesser workload here -- but here the regex engine is started up on every line, and that is expensive. I can't tell with confidence how they stack against each other without trying.
You can also use > as input record separator. This way you avoid to slurp the whole file and since the main loop loads your file block by block, you only have to test which one is the target to not print it (without to describe the whole block in a pattern):
perl -ln076e's/\n$//;print ">$_" if $_ && !/Pseudomyrmex_seminole_D1367/' file
The l switch sets the output record separator to the input record separator (a newline by default).
The 0 switch sets the input record separator to > (76 in octal).
I would like to replace for instance every occurrence of "foo{...}" with anything except newlines inside the bracket (there may be spaces, other brackets opened AND closed, etc) NOT followed by "bar".
For instance, the "foo{{ }}" in "foo{{ }}, bar" would match but not "foo{hello{}}bar".
I've tried /foo{.*}\(bar\)\#! and /foo{.\{-}}\(bar\)\#! but the first one would match "foo{}bar{}" and the second would match "foo{{}}bar" (only the "foo{{}" part).
this regex:
foo{.*}\([}]*bar\)\#!
matches:
foo{{ }}
foo{{ }}, bar
but not:
foo{hello{}}bar
It is impossible to correctly match an arbitrary level of nested
parentheses using regular expressions. However, it is possible to
construct a regex to match supporting a limited amount of nesting (I
think this answer did not attempt to do so). – Ben
This does ...
for up to one level of inner braces:
/foo{[^{}]*\({[^{}]*}[^{}]*\)*}\(bar\)\#!
for up to two levels of inner braces:
/foo{[^{}]*\({[^{}]*\({[^{}]*}[^{}]*\)*}[^{}]*\)*}\(bar\)\#!
for up to three levels of inner braces:
/foo{[^{}]*\({[^{}]*\({[^{}]*\({[^{}]*}[^{}]*\)*}[^{}]*\)*}[^{}]*\)*}\(bar\)\#!
...
Depends on what replacement you want to perform exactly, you might be able to do that with macros.
For example: Given this text
line 1 -- -- -- -- array[a][b[1]]
line 2 -- array[c][d]
line 3 -- -- -- -- -- -- -- array[e[0]][f] + array[g[0]][h[0]]
replace array[A][B] with get(A, B).
To do that:
Position the cursor at the begin of the text
/array<cr>
qq to begin recording a macro
Do something to change the data independent of the content inside (use % to go to matching bracket, and some register/mark/plugin to delete around the bracket). For example cwget(<esc>ldi[vhpa, <esc>ldi[vhpa)<esc>n -- but macros are usually unreadable.
n to go to next match, q to stop recording
#q repeatedly (## can be used from the second time)
This is probably not very convenient because it's easy to make a mistake (press I, <home>, A for example) and you have to redo the macro from the beginning, but it works.
Alternatively, you can do something similar to eregex.vim plugin to extend vim's regex format to support this (so you don't have to retype the huge regex every time).
Proof of concept:
"does not handle different magic levels
"does not handle '\/' or different characters for substitution ('s#a#b#')
"does not handle brackets inside strings
" usage: `:M/pattern, use \zm for matching block/replacement/flags`
command -range -nargs=* M :call SubstituteWithMatching(<q-args>, <line1>, <line2>)
":M/ inspired from eregex.vim
function SubstituteWithMatching(command, line1, line2)
let EscapeRegex={pattern->escape(pattern, '[]\')}
let openbracket ='([{'
let closebracket=')]}'
let nonbracketR='[^'.EscapeRegex(openbracket.closebracket).']'
let nonbracketsR=nonbracketR.'*'
let LiftLevel={pattern->
\nonbracketsR
\.'\%('
\.'['.EscapeRegex(openbracket).']'
\.pattern
\.'['.EscapeRegex(closebracket).']'
\.nonbracketsR
\.'\)*'
\}
let matchingR=LiftLevel(LiftLevel(LiftLevel(nonbracketsR)))
if v:false " optional test suite
echo "return 0:"
echo match('abc', '^'.matchingR.'$')
echo match('abc(ab)de', '^'.matchingR.'$')
echo match('abc(ab)d(e)f', '^'.matchingR.'$')
echo match('abc(a[x]b)d(e)f', '^'.matchingR.'$')
echo match('abc(a]b', '^'.matchingR.'$')
"current flaw (not a problem if there's only one type of bracket, or if
"the code is well-formed)
echo "return -1:"
echo match('abc(a(b', '^'.matchingR.'$')
echo match('abc)a(b', '^'.matchingR.'$')
endif
let [pattern, replacement, flags]=split(a:command, "/")
let pattern=substitute(pattern, '\\zm', EscapeRegex(matchingR), 'g')
execute a:line1.','.a:line2.'s/'.pattern.'/'.replacement.'/'.flags
endfunction
After this, :'<,'>M/array\[\(\zm\)\]\[\(\zm\)\]/get(\1, \2)/g can be used to do the same task above (after selecting the text in visual mode)
I am having a log file for analysis, in that few of the line will have repetition of it own, but not complete repetition, say
Alex is here and Alex is here and we went out
We bothWe both went out
I want to remove the first occurrence and get
Alex is here and we went out
We both went out
Please share a regex to do in Vim in Windows.
I don't recommend trying to use regex magic to solve this problem. Just write an external filter and use that.
Here's an external filter written in Python. You can use this to pre-process the log file, like so:
python prefix_chop.py logfile.txt > chopped.txt
But it also works by standard input:
cat logfile.txt | prefix_chop.py > chopped.txt
This means you can use it in vim with the ! command. Try these commands: goto line 1, then pipe from current line through the last line through the external program prefix_chop.py:
1G
!Gprefix_chop.py<Enter>
Or you can do it from ex mode:
:1,$!prefix_chop.py<Enter>
Here's the program:
#!/usr/bin/python
import sys
infile = sys.stdin if len(sys.argv) < 2 else open(sys.argv[1])
def repeated_prefix_chop(line):
"""
Check line for a repeated prefix string. If one is found,
return the line with that string removed, else return the
line unchanged.
"""
# Repeated string cannot be more than half of the line.
# So, start looking at mid-point of the line.
i = len(line) // 2 + 1
while True:
# Look for longest prefix that is found in the string after pos 0.
# The prefix starts at pos 0 and always matches itself, of course.
pos = line.rfind(line[:i])
if pos > 0:
return line[pos:]
i -= 1
# Stop testing before we hit a length-1 prefix, in case a line
# happens to start with a word like "oops" or a number like "77".
if i < 2:
return line
for line in infile:
sys.stdout.write(repeated_prefix_chop(line))
I put a #! comment on the first line, so this will work as a stand-alone program on Linux, Mac OS X, or on Windows if you are using Cygwin. If you are just using Windows without Cygwin, you might need to make a batch file to run this, or just type the whole command python prefix_chop.py. If you make a macro to run this you don't have to do the typing yourself.
EDIT: This program is pretty simple. Maybe it could be done in "vimscript" and run purely inside vim. But the external filter program can be used outside of vim... you can set things up so that the log file is run through the filter once per day every day, if you like.
Regex:\b(.*)\1\b
Replace with:\1 or $1
If you want to deal with more than two repeating sentences you can try this
\b(.+?\b)\1+\b
--
|->avoids matching individual characters in word like xxx
NOTE
Use \< and \> instead of \b
You could do it by matching as much as possible at the beginning of the line and then using a backreference to match the repeated bit.
For example, this command solves the problem you describe:
:%s/^\(.*\)\(\1.*\)/\2