Deleting comments in a large file - regex

I am trying to delete a bunch of comments that are all in the following format:
/**
* #ngdoc
... comment body (delete me, too!)
*/
I have tried using this command: %s/\/**\n * #ngdoc.\{-}*\///g
Here is the regex without the patterns: %s/pattern1.\{-}pattern2//g
Here are the individual patterns: \/**\n * #ngdoc and *\/
When I try my pattern in vim I get the following error:
E871: (NFA regexp) Can't have a multi follow a multi !
E61: Nested *
E476: Invalid command
Thanks for any help with this regexp nightmare!

Instead of trying to cram this into one complex regex, it's much easier to search for the start of a comment and delete from there on to the end of a comment
:g/^\/\*\*$/,/\*\/$/d_
This breaks down into
:g start a global command
/^\/\*\*$/ search for start of a comment: <sol>/**<eol>
,/^\*\/$/ extend the range to the end of a comment: <sol>*/<eol>
d delete the range
_ use the black hole register (performance optimization)

Your problem is you have \{-} followed by * which are the multis referenced in the error message. Quote the *:
%s/\/\*\*\n \* #ngdoc\_.\{-}\*\/\n//g

Using embedded newlines in the pattern is the wrong approach. You should instead use an address range. Something like:
sed '\#^/\*\*$#,\#^\*/$#d' file
This will delete all lines starting from one that matches /** anchored at column 1 to the line matching */ anchored at column 1. If your comments are well behaved (eg, no trailing space after /**), this should do what you want.

Try this using gc to be careful when deleting
%s/\v\/\*\*\n\s\*\s\#ngdoc\n((\s*\n)?(\s\*.*\n)?){-}\s?\*\///gc
Match comments like
/**
* #ngdoc
* ... comment body (delete me, too!)
*
*/

My approached consists of using a macro:
qa/\/\*\*<enter><shift-v>/\*\/<enter>d
qa ........ starts recording macro "a"
/\/\*\* ... searches for the comment beginning
<Enter> ... use Ctrl-v Enter
V ......... starts visual block (until...)
/\*\/ ..... end of your comment
<Enter> ... Ctrl-v Enter agai
d ......... it will delete selected area
In order to isert etc presse followed by the keyword you want.

Related

Replace a substring in the first column using vi

I have a huge file that has multiple columns as shown below:
J02-31 23.2 ...
J30-09 -45.4 ...
J05+30 56.1 ...
J00-20 -78.2 ...
J11-54 232.0 ...
... ... ...
I would like to replace - with $-$ only in the first column, i.e., my output should be like this:
J02$-$31 23.2 ...
J30$-$09 -45.4 ...
J05+30 56.1 ...
J00$-$20 -78.2 ...
J11$-$54 232.0 ...
... ... ...
Is there a way to do this using vi. I know that python/pandas can do it, but I am interested in vi usage.
I'd go with
:%s/^\S*\zs-/$-$/
which means:
%s/: apply this substitution for every line
^\S*: read as many non-whitespace characters from the start of the line as possible
\zs: actual match start (you could also capture the \S* above instead and insert it back too)
-: match the - (note: this will only match the last - in the first column, your question isn't really clear if there can be multiple there)
/$-$/: replace the matching part (which is only - thanks to the \zs) with $-$
You could do:
:g/^\S*-/s/-/$-$/
Which performs the replacement s/-/$-$/ only on lines which match the pattern /^\S*-/ (ie, those lines which have a - in the first column).

Regex to match the last line of a JCL job card or the whole card

If any of you are familiar with mainframe JCL.
I'm trying to match the last line of the job card.
Basically the first line that starts with // and ends without a comma.
In the example I need the 3rd line or up to the 3rd line matched.
I'm using Ansible's lineinfile to dynamically insert a route card after the job card.
For example:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
So far I got this, which matches the start of // and anything after, but, I cant figure out the last part
^(\Q//\E(.)*)
Parsing JCL in the general case is hard. As noted in the comments, the rules are full of caveats.
I have an ANTLR4 grammar for JCL, it's MIT licensed. Possibly of use. It reflects the beauty of JCL.
To match the whole job card (in this case 3 lines):
(?sm)\A.*?\/\/[^*]((?!\/\*)[^\n])*[^,]$
See live demo.
Breaking this down:
(?sm)
s enables the DOTALL flag (meaning . matches new lines too)
m enables the MUTLILINE flag (meaning ^ and $ match start and end of lines
\A means start of input (so it only matches at the very start)
.*? means anything, but as little as possible
//[^*]
((?!\/\*)[^\n])* means non-new lines, except the sequence /* (so don't match when a comment is put in line)
[^,] not a comma
$ end of line
In English: "match from the start until there's a non-comma at the end of a line that is not a comment, or does not end with a comment"
You would then replace with $0 (group zero is the entire match) followed by your injected content:
$0\\n*ROUTE statement
You can use a negative lookbehind for this: (?<!,).
But you'll also need to insert after the firstmatch and use backrefs.
Given the task:
- lineinfile:
path: file.jcl
regexp: '^(\/\/.*)(?<!,)$'
line: "\\1\\n//*ROUTE statement"
firstmatch: true
backrefs: true
You would end up, from your example, with:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0,
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID
//*ROUTE statement
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
For the general case this is tougher than you think because of comments allowed within the scope of the JOB card.
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
The strings you show:
<--- start of job card
LINES=(999999,WARNING),
<--- end of job card
are all valid as comments in JCL because they follow a space.
You can even have whole comment lines within the JOB card. For example:
//name JOB (accounting info),'data capture ___',
//* TYPRUN=SCAN,
// NOTIFY=&SYSUID,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=(5,00),
// REGION=5M
So you're not necessarily looking for the first card that doesn't end in a comma unless you can restrict the JCL you're looking at.
Your JOB card starts with //name JOB and ends just before the next //name card. *** edit *** As was correctly pointed out, the JOB card could be followed by a card which does not require a name field, like // SET for example. See https://www.ibm.com/docs/en/zos/2.4.0?topic=statements-jcl-statement-fields *** end of edit ***
It starts with ^(\Q//\E)[A-Z0-9]+\s+\QJOB\E.+
and ends just before the next named card ^(\Q//\E)[A-Z0-9]+\s+
But I don't know regular expressions well enough to find the "just before" point to insert your new line. Hopefully someone else can add that.

Remove Multiple Periods Up To Bracket From String

Would like to know how to create an Emacs macro that will
Find the first instance of multiple periods in string
Set mark
Move to the first closed bracket in string
Remove all chars between mark and closed bracket
Here is an example string. I'd like to go from this:
* [This is Chapter 1.......................................................... 1-83](chapter1.md)
To this:
* [This is Chapter 1](chapter1.md)
Can anyone assist?
Thanks
Heres the hacky way I accomplished. I'm sure there is a cleaner way.
Start with cursor at the beg of line
M-x start-kbd-macro
C-s RET .. to search for first instance of ".." in the string
C-SPACE to set mark
C-s ] to search for first instance of "]" in the string
DEL to remove everything marked
BKSP BKSP to remove the final two ".."
DWN ARROW to get to next line
C-a to get to beg of line
M-x end-kbd-macro
I know its lame, but it worked!! I have ~100 pages of docs to do this to! Need to figure out how to reliably perform this on the entire doc next.

How to select section in regular expression in linux commands

I have these lines that every line begin a word then equal and several sentence so I like select every section. For example:
delete = \account
user\
admin
admin right is good.
add = \
nothing
no out
input output is not good
edit = permission
bob
admin
alice killed bob!!!
I want to select a section for example:
add = \
nothing
no out
input output is not good
I like do it with regular expression.
Your question is a bit vague but you could try the following ...
/\s*(\w+) = ([^=]*\n)*/m
... subject to the requirement that the last section is terminated with \n.
this works by:
'\s*' matching some optional leading whitespace
'(\w+)' capturing the name of the section
' = ' matches the space equals space separator
'([^=]*\n)' it then captures a string that does not include an equals and ends with a newline
'*' and it does that last bit multiple times
The m flag is then required to set multi-line.
See the following to quickly see the groups that are output for each match ...
https://regex101.com/r/oDKSy9/1
(NOTE: The g flag will probably not be required depending on how you use the regex.)
Solution by OP.
I find this solution:
csplit -k fileName '/.*=/' '{*}'
Thanks #haggisandchips

Regular expression which cannot be used

I want extract C-like comments from source, f.e. from
(updated example)
/**
* base comment
* (c) SOMEBODY SOMETIME
* something
*/
///<!-- ------metadata-XML------- -->
/// <module type="javascript"> A
///<desc> some desc
/// </desc>
(function( a /* param A */) { // programmers comment ... enclosure
/*! user doc
this module ....
* reguired
.....
*/
var b={}; // programmers in line comment
// single line comments
// The cookie spec says up to 4k per cookie, so at ~50 bytes per entry
// that gives a maximum of around 80 items as a max value for this field
b.a=a;
var str = " tttt \/\/this is not comment ! tttt "
var str2 = " tttt \/\* this is not comment too ! \
.............. \*\/ ttt ";
global.b = b;
}(global);
///</module>
regexp which I use is
^\s*\/\*(.*[\r\n]*)*\*\/
Problem is that this regexp stops (kills) regexp engine. RegexCouch becomes unresponsible,
using in browser causes unresponsible page.
What is wrong with this regexp ? How is possible, that regexp engine cannot solve it ?
Are there some regexp-es (syntactically correct, I think) which cannot be used ?
This is called Catastrophic Backtracking. Your regex has to check to many possibilities, because you are nesting quantifiers:
^\s*\/\*(.*[\r\n]*)*\*\/
^^ ^ ^
A better approach would be this:
/^\s*\/\*.*?\*\//gms
See it here in action.
You need the s option to make the . match the newline, the m option to make the ^ matches the start of he row.
.*? is matching as less characters as possible.
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)
this will work for c-like comments match
if you use pcre-like regex you can use this:
\s*+\/\*(?>[^*]++|\*++(?!\/))*\*\/
if your regex flavor doesn't support atomic groups and possessive quantifiers, use this:
\s*\/\*(?:[^*]+|\*+(?!\/))*\*\/