vim & csv file: put header info into a new column - regex

I have a large number of csv files that look like this below:
xxxxxxxx
xxxxx
Shipment,YD564n
xxxxxxxxx
xxxxx
1,RR1760
2,HI3503
3,HI4084
4,HI1824
I need to make them look like the following:
xxxxxxxx
xxxxx
Shipment,YD564n
xxxxxxxxx
xxxxx
YD564n,1,RR1760
YD564n,2,HI3503
YD564n,3,HI4084
YD564n,4,HI1824
YD564n is a shipment number and will be different for every csv file. But it always comes right after "Shipment,".
What vim command(s) can I use?

In one file type the following in normal mode:
qqgg/^Shipment,<CR>ww"ay$}j:.,$s/^/<C-R>a,<CR>q
Note that <CR> is the ENTER key, and <C-R> is CTRL-R.
This will update that file and recrd the commands in register q.
Then in each other file type #q (also in normal mode). (this will play back register q)

You can do this using a macro, and applying it over several files.
Here's one example. Type the following in as is:
3gg$"ayiw:6,$s/^/<C-R>a/<CR>:w<CR>:bn<CR>
Now that looks horrendous. Let me see if I can explain that a bit better.
3gg$ : Go to the end of the third line.
"ayiw : Copy the last word into the register a.
:6,$s/^/<C-R>a/<CR> : In every line from the 6th onwards, replace at the beginning whatever is in register a.
:w<CR>:bn<CR> : Save and go to the next buffer.
Now you can map this to a key, by
:nnoremap <C-A> 3gg$"ayiw:6,$s/^/<C-R>a/<CR>:w<CR>:bn<CR>
Then if you have say 200 csv files, you open vim as
vim *.csv
and then
200<C-A>
Where you type Ctrl-A there, and it should be all done.
That said, I'd definitely be more comfortable doing this in a proper scripting language, it'd be much more straightforward.

This could be done as a Perl one-liner:
perl -i.bak -e' $c = do {local $/; <>};
($n) = ($c =~ /Shipment,(\w+)/);
$c =~ s/^(\d+,)/$n,$1/gm;
print $c' shipment.csv
This will read contents of shipment.csv into $c, extract the shipment ID into $n, and prepend every CSV line with the shipment number. The file will be modified in-place with a backup saved to shipment.csv.bak.
To do this from within Vim, adapt it as a filter:
:%!perl -e' $c = do {local $/; <>}; ($n) = ($c =~ /Shipment,(\w+)/); $c =~ s/^(\d+,)/$n,$1/gm; print $c'

Well, don't bash me, but... you could consider: Don't do this in vim!!
This is a classic usage example for scripting languages.
Take a basic python, perl or ruby tutorial. The solution for this would
be in it.
The regex for this might not be too difficult and it is doable in vim.
But there are much easier alternatives out there.
And much more flexible ones.

Why vim?
Try this shell script:
#!/bin/sh
input=$1
shipment=`grep Shipment $input|awk -F, '{print $2}'`
mv $input $input.orig
sed -e "s/^\([0-9]\)/$shipment,\1/" $input.orig > $input
You could iterate through specific files:
for input in *.txt
do
script.sh $i
done

I also think this isn't well suited for vim, how about in Bash instead?
FILENAME='filename.csv' && SHIPMENT=`grep Shipment $FILENAME | sed 's/^Shipment,//'` && cat $FILENAME | sed "s/^[0-9]/$SHIPMENT,&/" > $FILENAME

Related

Why isn't this regex executing?

I'm attempting to convert my personal wiki from Foswiki to Markdown files and then to a JAMstack deployment. Foswiki uses flat files and stores metadata in the following format:
%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%
I want to use a git repo for versioning and will worry about linking that to article metatada later. At this point I simply want to convert these blocks to something that looks like this:
---
author: Teoti Nathaniel
revdate: 1539108277
---
After a bit of tweaking I have constructed the following regex:
author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]
According to regex101 this works and my two capture groups contain the desired results. Attempting to actually run it:
perl -0777 -pe 's/author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]/author: $1\nrevdate: $2/gms' somefile.txt
gets me only this:
>
My previous attempt (which breaks if the details aren't in a specific order) looked like this and executed correctly:
perl -0777 -pe 's/%META:TOPICINFO\{author="(.*)"\ date="(.*)"\ format="(.*)"\ (.*)\}\%/author:$1 \nrevdate:$2/gms' somefile.txt
I think that this is an escape character problem but can't figure it out. I even went and found this tool to make sure that they are correct.
Brute-forcing my way to understanding here is feeling both inefficient and frustrating, so I'm asking the community for help.
The first major problem is that you're trying to use a single quote (') in the program, when the program is being passed to the shell in single quotes.
Escape any instance of ' in the program by using '\''. You could also use \x27 if the quote happens to be a single double-quoted string literal or regex literal (as is the case of every instance in your program).
perl -0777pe's/author=['\''"].../.../gs'
perl -0777pe's/author=[\x27"].../.../gs'
I would try to break it down into a clean data structure then process it. By seperating the data processing to printing, you can modifiy to add extra data later. It also makes it far more readable. Please see the example below
#!/usr/bin/env perl
use strict;
use warnings;
## yaml to print the data, not required for operation
use YAML::XS qw(Dump);
my $yaml;
my #lines = '%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%';
for my $str (#lines )
{
### split line into component parts
my ( $type , $subject , $data ) = $str =~ /\%(.*?):(.*?)\{(.*)\}\%/;
## break data in {} into a hash
my %info = map( split(/=/), split(/\s+/, $data) );
## strip quotes if any exist
s/^"(.*)"$/$1/ for values %info;
#add to data structure
$yaml->{$type}{$subject} = \%info;
}
## yaml to print the data, not required for operation
print Dump($yaml);
## loop data and print
for my $t (keys %{ $yaml } ) {
for my $s (keys %{ $yaml->{$t} } ) {
print "-----------\n";
print "author: ".$yaml->{$t}{$s}{"author"}."\n";
print "date: ".$yaml->{$t}{$s}{"date"}."\n";
}
}
Ok, I kept fooling around with it by reducing the execution to a single term and expanding. I soon got to here:
$ perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=\['\"\]\(\\w\+\)\['\"\]/author\: \$1\\nrevdate\: \$2/gms' somefile.txt
Unmatched [ in regex; marked by <-- HERE in m/author=["](\w+)["](?:.*)date=\["](\w+)[ <-- HERE \"\]/ at -e line 1.
This eventually got me to here:
perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=['\"]\(\\w\+\)['\"]/\nauthor\ $1\nrevdate\:$2\n/gms' somefile.txt
Which produces a messy output but works. (Note: Output is proof-of-concept and this can now be used within a Python script to programattically generate Markdown metadata.
Thanks for being my rubber duckie, StackOverflow. Hopefully this is useful to someone, somewhere, somewhen.

Extract Filename before date Bash shellscript

I am trying to extract a part of the filename - everything before the date and suffix. I am not sure the best way to do it in bashscript. Regex?
The names are part of the filename. I am trying to store it in a shellscript variable. The prefixes will not contain strange characters. The suffix will be the same. The files are stored in a directory - I will use loop to extract the portion of the filename for each file.
Expected input files:
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Expected Extract:
EXAMPLE_FILE
EXAMPLE_FILE_2
Attempt:
filename=$(basename "$file")
folder=sed '^s/_[^_]*$//)' $filename
echo 'Filename:' $filename
echo 'Foldername:' $folder
$ cat file.txt
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
$
$ cat file.txt | sed 's/_[0-9]*-[0-9]*-[0-9]*\.out$//'
EXAMPLE_FILE
EXAMPLE_FILE_2
$
No need for useless use of cat, expensive forks and pipes. The shell can cut strings just fine:
$ file=EXAMPLE_FILE_2_2017-10-12.out
$ echo ${file%%_????-??-??.out}
EXAMPLE_FILE_2
Read all about how to use the %%, %, ## and # operators in your friendly shell manual.
Bash itself has regex capability so you do not need to run a utility. Example:
for fn in *.out; do
[[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
done
With the example files, output is:
EXAMPLE_FILE_2017-09-12.out => EXAMPLE_FILE
EXAMPLE_FILE_2_2017-10-12.out => EXAMPLE_FILE_2
Using Bash itself will be faster, more efficient than spawning sed, awk, etc for each file name.
Of course in use, you would want to test for a successful match:
for fn in *.out; do
if [[ $fn =~ ^(.*)_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2} ]]; then
cap="${BASH_REMATCH[1]}"
printf "%s => %s\n" "$fn" "$cap"
else
echo "$fn no match"
fi
done
As a side note, you can use Bash parameter expansion rather than a regex if you only need to trim the string after the last _ in the file name:
for fn in *.out; do
cap="${fn%_*}"
printf "%s => %s\n" "$fn" "$cap"
done
And then test $cap against $fn. If they are equal, the parameter expansion did not trim the file name after _ because it was not present.
The regex allows a test that a date-like string \d\d\d\d-\d\d-\d\d is after the _. Up to you which you need.
Code
See this code in use here
^\w+(?=_)
Results
Input
EXAMPLE_FILE_2017-09-12.out
EXAMPLE_FILE_2_2017-10-12.out
Output
EXAMPLE_FILE
EXAMPLE_FILE_2
Explanation
^ Assert position at start of line
\w+ Match any word character (a-zA-Z0-9_) between 1 and unlimited times
(?=_) Positive lookahead ensuring what follows is an underscore _ character
Simply with sed:
sed 's/_[^_]*$//' file
The output:
EXAMPLE_FILE
EXAMPLE_FILE_2
----------
In case of iterating through the list of files with extension .out - bash solution:
for f in *.out; do echo "${f%_*}"; done
awk -F_ 'NF-=1' OFS=_ file
EXAMPLE_FILE
EXAMPLE_FILE_2
Could you please try awk solution too, which will take care of all the .out files, note this has ben written and tested in GNU awk.
awk --re-interval 'FNR==1{if(val){close(val)};split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");print array[1];val=FILENAME;nextfile}' *.out
Also my awk version is old so I am using --re-interval, if you have latest version of awk you may need not to use it then.
Explanation and Non-one liner fom of solution: Adding a non-one liner form of solution too here with explanation.
awk --re-interval '##Using --re-interval for supporting ERE in my OLD awk version, if OP has new version of awk it could be removed.
FNR==1{ ##Checking here condition that when very first line of any Input_file is being read then do following actions.
if(val){ ##Checking here if variable named val value is NOT NULL then do following.
close(val) ##close the Input_file named which is stored in variable val, so that we will NOT face problem of TOO MANY FILES OPENED, so it will be like one file read close it in background then.
};
split(FILENAME, array,"_[0-9]{4}-[0-9]{2}-[0-9]{2}");##Splitting FILENAME(which will have Input_file name in it) into array named array only, whose separator is a 4 digits-2 digits- then 2 digits, actually this will take care of YYYY-MM-DD format in Input_file(s) and it will be easier for us to get the file name part.
print array[1]; ##Printing array 1st element here.
val=FILENAME; ##Storing FILENAME variable value which will have current Input_file name in it to variable named val, so that we could close it in background.
nextfile ##nextfile as it name suggests it will skip all the lines in current line and jump onto the next file to save some cpu cycles of our system.
}
' *.out ##Mentioning all *.out Input_file(s) here.

How to use Perl one-liner to add line based on first line pattern match?

My boss needs to change a particular routing file on some dozens (hundreds) of hosts by adding a line like:
10.11.0.0/16 via 172.16.2.XX dev tun0
... where XX is based on the octet preceding the "dev" keyword on the first line of the same file.
He wants it to be an automated in-place edit. The first lines of the existing file look like:
10.12.123.0/22 via 172.16.2.24 dev tun0
10.13.234.0/23 via 172.16.2.22 dev tun0
So the results should look like:
10.12.123.0/22 via 172.16.2.24 dev tun0
10.13.234.0/23 via 172.16.2.22 dev tun0
10.11.0.0/16 via 172.16.2.24 dev tun0
... where the last line has simply been added and the last octet in that line has been copied from the last octet on the first line.
It seems you're missing reset of line counter $. for each input file, and close(ARGV) does just that,
perl -i.bak -pe'
$octet = $1 if /(\d+)\s+dev/ and $. ==1;
$_ .= "10.11.0.0/16 via 172.16.2.$octet dev tun0\n", close(ARGV) if eof;
' "$filenames"
Sure, you could cram this into a one liner, but why punish yourself (and the poor sod who has to maintain this down the road). Things like -i work in programs. Here's the basic pattern you're looking for.
#!/usr/bin/env perl -n -i
print $_;
if( /...whatever you want to match.../ ) {
print "...whatever extra line you want to add...";
}
-n says to iterate line by line, as if there's a while loop around the program. Unlike -p it doesn't automatically print the line. Sure, you could append to $_, but this gives us better control.
-i says to edit the file in place rather than just print to STDOUT.
Here's my attempt ... which does seem to work for a single file:
perl -pi.bak -e '$octet = $1 if /(\d+)\s+dev/ and $. == 1;\
$line="10.11.0.0/16 via 172.16.2.$octet dev tun0\n";\
$_ = $_ . $line if eof;' "$filenames"
Here's a safer variation which only appends the intended line if a matching pattern is found on the first line of each file. (It also resets $. as described by Сухой27):
perl -pi.bak -e '$line ="10.11.0.0/16 via 172.16.2.$1 dev tun0\n"\
if /(\d+)\s+dev/ && $. == 1;\
close(ARGV), $_ = $_ . $line if eof;' "$filenames"
(If no match is found for the regular expression than $line is empty and appending an empty string to $_ is harmless).

How to extract two numbers from a word and store then in two separate variables in bash?

I believe my question is very simple for someone who knows how to use regular expressions, but I am very new at it and I can't figure out a way to do it. I found many questions similar to this, but none could solve my problem.
In bash, i have a few variables that are of the form
nw=[:digit:]+.a=[:digit:]+
for example, some of these are nw=323.a=42 and nw=90.a=5
I want to retrieve these two numbers and put them in the variables $n and $a.
I tried several tools, including perl, sed, tr and awk, but couldn't get any of these to work, despite I've been googling and trying to fix it for an hour now. tr seems to be the fittest though.
I'd like a piece of code which would achieve the following:
#!/bin/bash
ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for dir in $ldir; do
retrieve the number following nw and place it in $n
retrieve the number following a and place it in $a
done
... more things...
If you trust your input, you can use eval:
for dir in $ldir ; do
dir=${dir/w=/=} # remove 'w' before '='
eval ${dir/./ } # replace '.' by ' ', evaluate the result
echo $n, $a # show the result so we can check the correctness
done
if you do not trust your input :) use this:
ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for v in $ldir; do
[[ "$v" =~ ([^\.]*)\.(.*) ]]
declare "n=$(echo ${BASH_REMATCH[1]}|cut -d'=' -f2)"
declare "a=$(echo ${BASH_REMATCH[2]}|cut -d'=' -f2)"
echo "n=$n; a=$a"
done
result in:
n=64; a=2
n=132; a=3
n=4949; a=30
for sure there are more elegant ways, this is just a quick working hack
ldir="nw=64.a=2 nw=132.a=3 nw=4949.a=30"
for dir in $ldir; do
#echo --- line: $dir
for item in $(echo $dir | sed 's/\./ /'); do
val=${item#*=}
name=${item%=*}
#echo ff: $name $val
let "$name=$val"
done
echo retrieve the number following nw and place it in $nw
echo retrieve the number following a and place it in $a
done

shell script: search and replace over multiple lines

I'm looking for a way to search and replace over multiple lines through a shell script. This is what I'm trying to do:
source:
[stuff before]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
[stuff here, possibly multiple lines.
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]
target:
[stuff before]
[new content]
[stuff after]
In short, I want to delete the comments and everything between them and replace with some new content. Basically, I want to do a simple sed command over multiple lines, and if possible just using some basic *nix tools, no additional scripting language.
If you only need to match complete lines then you can do this task with
awk. Something like:
awk -v NEWTEXT=foo 'BEGIN{n=0} /COMMENT_BEGIN/ {n=1} {if (n==0) {print $0}} /COMMENT_END/ {print NEWTEXT; n=0}' < myfile.txt
If the file is not so well formatted, with comments on
the same line as text you want to keep or remove, then I
would use perl, read the entire file into a single string,
do a regular expression match and replace on that string, then write the new string to
a new file. This is not so simple and you need to write a perl script to do the work.
Something like:
#!/usr/bin/perl
$newtext = "foo\nbar";
$/ = ''; # no input separator so whole file is read.
$s = <>; # read whole file from stdin
$startPattern = quotemeta('<!--WIERD_SPECIAL_COMMENT_BEGIN-->');
$endPattern = quotemeta('<!--WIERD_SPECIAL_COMMENT_END-->');
$pattern = $startPattern . '.+' . $endPattern;
$s =~ s/$pattern/$newtext/sg;
print $s;
sed does this just fine. The following is as simple as it gets; if you need to extract stuff from the delimiter line before the start delimiter or after the end delimiter, that's going to be a little more complex.
sed '/<!--WIERD_SPECIAL_COMMENT_BEGIN-->/,/<!--WIERD_SPECIAL_COMMENT_END-->/d' input >output
If you have any control over this, fix the spelling of "weird".
another solution... this is possible to be done in a one-liner, but using perl regular expressions, which I find easier to work with than sed or awk (which are cumbersome with multi-line match and replace):
perl -0 -i -pe 's/<!--WIERD_SPECIAL_COMMENT_BEGIN-->[\s\S]*<!--WIERD_SPECIAL_COMMENT_END-->/your new content here/gim' yourfile1.txt
please note that this will replace the file with the new, changed content.