Exclude match in Vim with ‘\%’? - regex

I have a CSV file that looks like so:
productSku-1,attribute1,2,3
productSku-1,attribute4,5
productSku-1,attribute4,5
productSku-2,attribute1,2,3
productSku-2,attribute4,5
productSku-3,attribute1,1
I am trying to “collapse” the same product attributes into one line, while getting rid of the extra instances of productSku. So, I match product to the next line and then remove the next productSku lines as well as the line break to compress it into a single line. In the above example, the result should look like so:
productSku1,attribute1,2,3,attribute4,5,attribute4,5
productSku2,attribute1,2,3,attribute4,5
productSku3,attribute1,1
I thought the following substitution command would work, but I have never used the \% sign.
:%s/(^[A-Za-z0-9-]+)(.)((\%(\n\1)(.))+)/\1\2\3
I thought it would exclude the match from match \3… but it isn’t working.

One way to solve this problem is like this:
:sort
qqjkdt,:g/<C-r>"/norm dt,-gJ<Enter>0P+q
10000#q
Step by step
:sort - sort all of the lines by productSku
qq - start a recording, see :help qq
jk - this is a little macro hack... if we get to the last line it will throw an error and the rest of the command won't execute.
dt, - delete the productSku
:g/ - start a :global command, see :help :global
<C-r>" - insert the contents of the " register (the productSku we deleted)
norm - short for :help :normal
dt, - delete the productSku since we don't want them
- - go to the previous line
gJ - join the lines without any spaces
<Enter> - finish the :global command
0P - put the productSku back at the beginning of the line
+ - move down one line
q - finish the recording
10000#q repeat the recording 10k times (or more if you need)

I would use the following efficient command.
:g/\%(^\1,.*\n\)\#<=\([^,]*\)/s/$/,/|-j!

Related

Regex to match the last line of a JCL job card or the whole card

If any of you are familiar with mainframe JCL.
I'm trying to match the last line of the job card.
Basically the first line that starts with // and ends without a comma.
In the example I need the 3rd line or up to the 3rd line matched.
I'm using Ansible's lineinfile to dynamically insert a route card after the job card.
For example:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
So far I got this, which matches the start of // and anything after, but, I cant figure out the last part
^(\Q//\E(.)*)
Parsing JCL in the general case is hard. As noted in the comments, the rules are full of caveats.
I have an ANTLR4 grammar for JCL, it's MIT licensed. Possibly of use. It reflects the beauty of JCL.
To match the whole job card (in this case 3 lines):
(?sm)\A.*?\/\/[^*]((?!\/\*)[^\n])*[^,]$
See live demo.
Breaking this down:
(?sm)
s enables the DOTALL flag (meaning . matches new lines too)
m enables the MUTLILINE flag (meaning ^ and $ match start and end of lines
\A means start of input (so it only matches at the very start)
.*? means anything, but as little as possible
//[^*]
((?!\/\*)[^\n])* means non-new lines, except the sequence /* (so don't match when a comment is put in line)
[^,] not a comma
$ end of line
In English: "match from the start until there's a non-comma at the end of a line that is not a comment, or does not end with a comment"
You would then replace with $0 (group zero is the entire match) followed by your injected content:
$0\\n*ROUTE statement
You can use a negative lookbehind for this: (?<!,).
But you'll also need to insert after the firstmatch and use backrefs.
Given the task:
- lineinfile:
path: file.jcl
regexp: '^(\/\/.*)(?<!,)$'
line: "\\1\\n//*ROUTE statement"
firstmatch: true
backrefs: true
You would end up, from your example, with:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0,
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID
//*ROUTE statement
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
For the general case this is tougher than you think because of comments allowed within the scope of the JOB card.
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
The strings you show:
<--- start of job card
LINES=(999999,WARNING),
<--- end of job card
are all valid as comments in JCL because they follow a space.
You can even have whole comment lines within the JOB card. For example:
//name JOB (accounting info),'data capture ___',
//* TYPRUN=SCAN,
// NOTIFY=&SYSUID,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=(5,00),
// REGION=5M
So you're not necessarily looking for the first card that doesn't end in a comma unless you can restrict the JCL you're looking at.
Your JOB card starts with //name JOB and ends just before the next //name card. *** edit *** As was correctly pointed out, the JOB card could be followed by a card which does not require a name field, like // SET for example. See https://www.ibm.com/docs/en/zos/2.4.0?topic=statements-jcl-statement-fields *** end of edit ***
It starts with ^(\Q//\E)[A-Z0-9]+\s+\QJOB\E.+
and ends just before the next named card ^(\Q//\E)[A-Z0-9]+\s+
But I don't know regular expressions well enough to find the "just before" point to insert your new line. Hopefully someone else can add that.

Format a text file by regex match and replace

I have a text file that looks like the following:
Chanelle
Jettie
Winnie
Jen
Shella
Krysta
Tish
Monika
Lynwood
Danae
2649
2466
2890
2224
2829
2427
2816
2648
2833
2453
I need to make it look like this
Chanelle 2649
Jettie 2466
... ...
I tried a lot on sublime editor but couldn't figure out the regex to do that. Can somebody demonstrate if it can be done.
I tested the following in Notepad++ but it should work universally.
Use this as the search string:
(?:(\s+[A-Za-z]+)(\r?\n))((?:\s*[A-Za-z]*\r?\n)+)\s+(\d+)
and this as the replacement:
$1 $4$2$3
Running a replace with it once will do one line at a time, if you run it multiple times it'll continue to replace lines until there are no matching lines left.
Alternatively, you can use this as the replacement if you want to have the values aligned by tabs, but it's not going to match in all cases:
$1\t\t$4$2$3
While the regex answer by SeinopSys will work, you don't need a regex to do this - instead, you can take advantage of Sublime's multiple cursors.
Place your cursor at the beginning of line 1, then hold down Shift↓ to select all the names.
Hit CtrlShiftL (Selection -> Split into Lines) to split the selection into lines.
CtrlC to copy.
Place your cursor on line 11 (the first number line) and press CtrlShift↓ (Windows/OS X) or AltShift↓ (Linux) to place a cursor at the beginning of each number line.
Hit CtrlV to paste the names before the numbers.
You can now delete the names at the top and you're all set. Alternatively, you could use CtrlX to cut the names in step 3.

add character before first word in line

I want to add a minus sign "-" infront of the first word in a line on the editor VIM. The lines contains spaces for indentation. The indentation shall not be touched. E.g
As Is
list point 1
sub list point 2
and so on...
I want
- list point 1
- sub list point 2
- and so on...
I can find the first word, but i struggle with replacing it in the correct way.
^\s*\w
in Vim
/^\s*\w
But in the replacement I always remove the complete found part....
:s/^\s*\w/- \w/
Which leads to
- ist point 1
- ub list point 2
- nd so on...
Use & which is replaced with the matched string:
:%s/\w/- &
I'm late to the party but:
:%norm! I- <CR>
And another one with :s:
:%s/^\s*/&- /
An alternative to falsetrue's answer: You can capture the first word character and print it out along with the leading -:
%s/\(\w\)/- \1/
:normal cmd may help too:
:%norm! wi-
note that after - there is a space.

How to join lines adding a separator?

The command J joins lines.
The command gJ joins lines removing spaces
Is there also a command to Join lines adding a separator between the lines?
Example:
Input:
text
other text
more text
text
What I want to do:
- select these 4 lines
- if there are spaces at start and/or EOL remove them
- join lines adding a separator '//' between them
Output:
text//other text//more text//text
You can use :substitute for that, matching on \n:
:%s#\s*\n\s*#//#g
However, this appends the separator at the end, too (because the last line in the range also has a newline). You could remove that manually, or specify the c flag and quit the substitution before the last one, or reduce the range by one and :join the last one instead:
:1,$-1s#\s*\n\s*#//#g|join
I wrote a plugin "Join", could do what you wanted, and more.
https://github.com/sk1418/Join
Except for all features provided by the build-in :join command, Join can:
Join lines with separator (string)
Join lines with or without trimming the leading/trailing whitespaces
Join lines with negative count (backwards join)
Join lines in reverse
Join lines and keep joined lines (without removing joined lines)
Join lines with any combinations of above options
check the homepage for details and examples/screenshots.
There are few ways to do it, but I would recommend going by simplest route possible - recording a macro or doing multi-step command, for example by:
Appending to all lines excluding last by
Using substitution (:1,$-1s#$#//#)
Appending (:1,$-1norm A//)
And then join using visual selection (vGgJ) or by any other method.
Unless you're doing this operation very often you most likely forget any complex commands or existence of specialized plugin in your config, thus my recommendation of using generic, often used sub steps.
Another substitution, for the sake of diversity:
:%s:\n\ze.://
Will list 50 items per line seperated by spaces:
seq 0 70 | xargs -L 50 | sed 's/ /,/g'
Output:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70

Remove the first character of each line and append using Vim

I have a data file as follows.
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
Using vim, I want to reomve the 1's from each of the lines and append them to the end. The resultant file would look like this:
14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065,1
13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050,1
13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185,1
14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480,1
13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735,1
I was looking for an elegant way to do this.
Actually I tried it like
:%s/$/,/g
And then
:%s/$/^./g
But I could not make it to work.
EDIT : Well, actually I made one mistake in my question. In the data-file, the first character is not always 1, they are mixture of 1, 2 and 3. So, from all the answers from this questions, I came up with the solution --
:%s/^\([1-3]\),\(.*\)/\2,\1/g
and it is working now.
A regular expression that doesn't care which number, its digits, or separator you've used. That is, this would work for lines that have both 1 as their first number, or 114:
:%s/\([0-9]*\)\(.\)\(.*\)/\3\2\1/
Explanation:
:%s// - Substitute every line (%)
\(<something>\) - Extract and store to \n
[0-9]* - A number 0 or more times
. - Every char, in this case,
.* - Every char 0 or more times
\3\2\1 - Replace what is captured with \(\)
So: Cut up 1 , <the rest> to \1, \2 and \3 respectively, and reorder them.
This
:%s/^1,//
:%s/$/,1/
could be somewhat simpler to understand.
:%s/^1,\(.*\)/\1,1/
This will do the replacement on each line in the file. The \1 replaces everything captured by the (.*)
:%s/1,\(.*$\)/\1,1/gc
.........................
You could also solve this one using a macro. First, think about how to delete the 1, from the start of a line and append it to the end:
0 go the the start of the line
df, delete everything to and including the first ,
A,<ESC> append a comma to the end of the line
p paste the thing you deleted with df,
x delete the trailing comma
So, to sum it up, the following will convert a single line:
0df,A,<ESC>px
Now if you'd like to apply this set of modifications to all the lines, you will first need to record them:
qj start recording into the 'j' register
0df,A,<ESC>px convert a single line
j go to the next line
q stop recording
Finally, you can execute the macro anytime you want using #j, or convert your entire file with 99#j (using a higher number than 99 if you have more than 99 lines).
Here's the complete version:
qj0df,A,<ESC>pxjq99#j
This one might be easier to understand than the other solutions if you're not used to regular expressions!