replace patterns in a text file with an update file - regex

I have this 2 csv file
old.csv
station,32145,80
station,32145,60
new.csv
station,32145,80
station,32145,801
expecting result
result.csv
station,32145,80,no change
station,32145,801,new
station,32145,60,Delete
I have used diff and awk to do the job, but I have slight issue. The row has no changed or the one deleted updated correctly but the new one is not. Anyone can show me where is my mistake?
diff -W999 --side-by-side old.csv new.csv |
awk '/[|][\t]/{split($0,a,"[|][\t]");print a[2]" No Change"};/[\t] *<$/{split($0,a,"[|][\t]* *<$");print a[1]" Delete"};/>[\t]/{split($0,a,">[\t]");print a[2]" New"}'

This should work:
awk -F, '
NR==FNR && NF {a[$0","]++; next}
NF {print ($0 in a) ? $0"no change" : $0"new"; delete a[$0]}
END {for (x in a) print x"delete"}' old.csv new.csv
Output:
station,32145,80,no change
station,32145,801,new
station,32145,60,delete
Update based on comments: Handle random . in second column
awk 'BEGIN{FS=OFS=","}
NR==FNR {gsub(/[.]/,"",$2);a[$0","]++; next}
NF {gsub(/[.]/,"",$2); print ($0 in a) ? $0"no change" : $0"new"; delete a[$0]}
END {for (x in a) print x"delete"}' old.csv new.csv

Code for awk:
new without trailing commas:
awk -v OFS="," 'NR==FNR {a[$0]=$0;next};{b[$0]=$0};$0==a[$0] {print $0, "no change"};a[$0]==0 {print $0, "new"};END {for (x in a) {if (b[x]==0) {print a[x], "Delete"}}}' old new
new with trailing commas:
$awk -v OFS="" 'NR==FNR {a[$0","]=$0",";next};{b[$0]=$0};$0==a[$0] {print $0, "no change"};a[$0]==0 {print $0, "new"};END {for (x in a) {if (b[x]==0) {print a[x], "Delete"}}}' old new

Related

How to print a line with no field separator in awk?

I have data like this (file is called list-in.dat)
a ; b ; c ; i
d
e ; f ; a ; b
g ; h ; i
and I want a list like this (output file list-out.dat) with all items, in alphabetically order (case insensitive) and each unique item only once.
a
b
c
d
e
f
g
h
i
My attempt is:
awk -F " ; " ' BEGIN { OFS="\n" ; } {for(i=0; i<=NF; i++) print $i} ' file-in.dat | uniq | sort -uf > file-out.dat
But I end up with all antries except those lines which has only one item:
a
b
c
e
f
g
h
i
How can I get all (unique, sorted) items no matter how many items are in one line / if the field separator is missing?
Using gnu-awk:
awk -F '[[:blank:]]*;[[:blank:]]*' '{
for (i=1; i<=NF; i++) uniq[$i]
}
END {
PROCINFO["sorted_in"]="#ind_str_asc"
for (i in uniq)
print i
}' file
a
b
c
d
e
f
g
h
i
For non-gnu awk use:
awk -F '[[:blank:]]*;[[:blank:]]*' '{for (i=1; i<=NF; i++) uniq[$i]}
END{for (i in uniq) print i}' file | sort
awk -F' ; ' -v OFS='\n' '{$1=$1} 1' ip.txt | sort -fu
-F' ; ' sets space followed by ; followed by space as field separator
-v OFS='\n' sets newline as output field separator
{$1=$1} change $0 as per new OFS
1 print $0
sort -fu sort uniquely ignoring case in alphabetic order
Could you please try following, awk + sort solution, written and tested with shown samples. In case you want to use ignorecase then add IGNORECASE=1 in awk code.
awk '
BEGIN{
FS=" ; "
}
{
for(i=1;i<=NF;i++){
if(!a[$i]++){ print $i }
}
}
' Input_file | sort
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=" ; " ##Setting field separator as space semi-colon space here.
}
{
for(i=1;i<=NF;i++){ ##Starting a for loop till NF here for each line.
if(!a[$i]++){ print $i } ##Checking condition if current field is NOT present in array a then printing that field value here.
}
}
' Input_file | sort ##Mentioning Input_file name here and passing it to sort as Input to sort the data.

how to replace everything except string of interest

file.txt
fruits:banana,apple,grape,limon,orange,tomate,
fruits:apple,limon,
fruits:banana,grape,limon,
fruits:orange,tomate,grape,
fruits:banana,
fruits:apple,
fruits:banana,apple,
I need to replace everything that is different than "banana" for FRUIT, and get output like this:
fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,
fruits:banana,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,FRUIT,
fruits:banana,
fruits:FRUIT,
fruits:FRUIT,apple,
I tried using awk, but I can only replace the fields of specific strings.
Example replace all strings "apple" by fruit2, or all strings "apple" by fruit2 and all strings "tomate"or "orange" by fruit3
awk -F":" '{ gsub(/apple/,"FRUIT2",$2); print }' OFS="," file.tx
or
awk -F":" '{ gsub(/apple/,"FRUIT2",$2);;gsub(/tomate|orange/,"FRUIT3",$2); print }' OFS="," file.txt |sed "s/./:/7"
fruits:banana,FRUIT2,grape,limon,FRUIT3,FRUIT3,
fruits:FRUIT2,limon,
fruits:banana,grape,limon,
fruits:FRUIT3,FRUIT3,grape,
fruits:banana,
fruits:FRUIT2,
fruits:banana,FRUIT2
but I really need is to replace everything that is different from that for any string, ex: fruit4
How to generate output like this?
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT3,FRUIT3,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT4,FRUIT4,FRUIT4,
fruits:FRUIT3,FRUIT3,FRUIT4,
fruits:FRUIT4,
fruits:FRUIT2,
fruits:FRUIT4,FRUIT2
This awk should work:
awk -F, -v OFS=, '{
for (i=1; i<=NF; i++)
if ($i !~ /(^|:)banana$/)
sub(/[^:]+$/, "FRUIT", $i)
} 1' file
Output:
fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,
fruits:banana,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,FRUIT,
fruits:banana,
fruits:FRUIT,
fruits:banana,FRUIT,
To make the process automated, you can do
awk -F '[:,]' -v OFS=, '
{
for (i=2; i<=NF; i++)
if ($i)
if (seen[$i])
$i = seen[$i]
else
$i = seen[$i] = "FRUIT" ++n
sub(OFS, ":")
print
}
END {
print "map:"
for (key in seen)
print key "\t" seen[key]
}
' file
fruits:FRUIT1,FRUIT2,FRUIT3,FRUIT4,FRUIT5,FRUIT6,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT1,FRUIT3,FRUIT4,
fruits:FRUIT5,FRUIT6,FRUIT3,
fruits:FRUIT1,
fruits:FRUIT2,
fruits:FRUIT1,FRUIT2,
map:
orange FRUIT5
tomate FRUIT6
apple FRUIT2
limon FRUIT4
banana FRUIT1
grape FRUIT3
If you'd like some flexibility in being able to specify your mapping of old to new names on the command line:
$ cat tst.awk
BEGIN {
FS="[:,]"; OFS=","
split(map,t)
for (i=1; i in t; i+=2) {
m[t[i]] = t[i+1]
}
}
{
printf "%s:", $1
for (i=2;i<=NF;i++) {
if ($i in m ) { $i = m[$i] }
else if ("*" in m) { $i = m["*"] }
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
.
$ awk -v map='apple,FRUIT2,tomate,FRUIT3,*,FRUIT4' -f tst.awk file
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT4,FRUIT3,FRUIT4
fruits:FRUIT2,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT3,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4
fruits:FRUIT2,FRUIT4
fruits:FRUIT4,FRUIT2,FRUIT4
$ awk -v map='apple,BAZINGA,*,VEGGIE' -f tst.awk file
fruits:VEGGIE,BAZINGA,VEGGIE,VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE
fruits:VEGGIE,BAZINGA,VEGGIE
$ awk -v map='apple,FRUIT2,tomate,FRUIT3' -f tst.awk file
fruits:banana,FRUIT2,grape,limon,orange,FRUIT3,
fruits:FRUIT2,limon,
fruits:banana,grape,limon,
fruits:orange,FRUIT3,grape,
fruits:banana,
fruits:FRUIT2,
fruits:banana,FRUIT2,

Parse IP and Download-Total from mikrotik

I wanna extract IP and download-total from mikrotik command /queue simple print stat
Here's some example :
0 name="101" target=192.168.10.101/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=17574842/389197663 total-bytes=0 packets=191226/308561
total-packets=0 dropped=9/5899 total-dropped=0
1 name="102" target=192.168.10.102/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=65593392/183786457 total-bytes=0 packets=163260/166022
total-packets=0 dropped=175/2403 total-dropped=0
2 name="103" target=192.168.10.103/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=3263234/67407044 total-bytes=0 packets=41437/52602
total-packets=0 dropped=0/546 total-dropped=0
All that I need is :
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044
But I get
target=192.168.10.101/32
bytes=17574842/389197663
target=192.168.10.102/32
bytes=65593392/183786457
target=192.168.10.103/32
bytes=3263234/67407044
I try it with grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' | sed 's/^ //g'.
So, how can I parse it? Sorry for bad english..
Just continue your line of parsing with another pipes (most easy way i think)
grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' file | sed 's/^ //g' | sed -r 's/target=([^/]*)[/].*/\1/; s/bytes=[^/]*[/]//' | sed 'N; s/\n/ /'
output
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044
sed '/^[0-9]\{1,\}[[:blank:]]\{1,\}name/,/^[[:blank:]]*$/ {
/^[0-9]/{
s#.*target=\([^/]*\).*#\1#;h;d
}
\#^[[:blank:]]*bytes=[0-9]*/\([0-9]*\).*# !d
s//\1/
G
s/\(.*\)\n\(.*\)/\2 \1/p
}
d
' YourFile
A bit long but do the job in 1 sed
awk '{
if ( $3 ~ /target=/ ) split( $3, aIP, "[=/]")
if ( $1 ~ /^[[:blank:]]*bytes=[0-9]*/ ) {
split( $1, aByt, "/")
print aIP[2] " " aByt[2]
}
}' YourFile
same in awk
if always same exact structure
awk 'BEGIN{ RS="" }
{ split( $3, aIP, "[=/]"); split( $12, aByt, "/")
print aIP[2] " " aByt[2]
}' YourFile

Command line to match lines with matching first field (sed, awk, etc.)

What is fast and succinct way to match lines from a text file with a matching first field.
Sample input:
a|lorem
b|ipsum
b|dolor
c|sit
d|amet
d|consectetur
e|adipisicing
e|elit
Desired output:
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
Desired output, alternative:
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
I can imagine many ways to write this, but I suspect there's a smart way to do it, e.g., with sed, awk, etc. My source file is approx 0.5 GB.
There are some related questions here, e.g., "awk | merge line on the basis of field matching", but that other question loads too much content into memory. I need a streaming method.
Here's a method where you only have to remember the previous line (therefore requires the input file to be sorted)
awk -F \| '
$1 == prev_key {print prev_line; matches ++}
$1 != prev_key {
if (matches) print prev_line
matches = 0
prev_key = $1
}
{prev_line = $0}
END { if (matches) print $0 }
' filename
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
Alternate output
awk -F \| '
$1 == prev_key {
if (matches == 0) printf "%s", $1
printf "%s%s", FS, prev_value
matches ++
}
$1 != prev_key {
if (matches) printf "%s%s\n", FS, prev_value
matches = 0
prev_key = $1
}
{prev_value = $2}
END {if (matches) printf "%s%s\n", FS, $2}
' filename
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
For fixed width fields you can used uniq:
$ uniq -Dw 1 file
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
If you don't have fixed width fields here are two awk solution:
awk -F'|' '{a[$1]++;b[$1]=(b[$1])?b[$1]RS$0:$0}END{for(k in a)if(a[k]>1)print b[k]}' file
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
awk -F'|' '{a[$1]++;b[$1]=b[$1]FS$2}END{for(k in a)if(a[k]>1)print k b[k]}' file
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
Using awk:
awk -F '|' '!($1 in a){a[$1]=$2; next} $1 in a{b[$1]=b[$1] FS a[$1] FS $2}
END{for(i in b) print i b[i]}' file
d|amet|consectetur
e|adipisicing|elit
b|ipsum|dolor
This might work for you (GNU sed):
sed -r ':a;$!N;s/^(([^|]*\|).*)\n\2/\1|/;ta;/^([^\n|]*\|){2,}/P;D' /file
This reads 2 lines into the pattern space then checks to see if the keys in both lines are the same. If so it removes the second key and repeats. If not it checks to see if more than two fields exist in the first line and if so prints it out and then deletes it otherwise it just deletes the first line.
$ awk -F'|' '$1 == prev {rec = rec RS $0; size++; next} {if (size>1) print rec; rec=$0; size=1} {prev = $1} END{if (size>1) print rec}' file
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
$ awk -F'|' '$1 == prev {rec = rec FS $2; size++; next} {if (size>1) print rec; rec=$0; size=1} {prev = $1} END{if (size>1) print rec}' file
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit

compare fields after removing spaces in two files with Regex tools

I am new to awk and need to find the statement to compare two fields in files below
The columns are , seperated
1.csv
_________
1space, aspace
2,b
space3space,c
2.csv
____________
1space,spacea
space2,bspace
3,spacecspace
The below statement works fine if there are no leading or training spaces in the fields of either of 1.tsv or 2.tsv
nawk -F, 'NR==FNR{a[$1,$2]++;next} !(a[$1,$2])' 2.tsv 1.tsv
Kindly let me know how can we modify the above statement to trim leadind and lagging spaces and then compare. Thanks for the help.
awk -F, '
{ key=$1; gsub(/^[[:space:]]+|[[:space:]]+$/,"",key) }
NR==FNR { a[key]; next }
!(key in a)
' 2.tsv 1.tsv
Do this:
awk '
BEGIN {FS=OFS=","}
NR==FNR {
gsub(/^ *| *$/,"",$1)
a[$1]++
next
}
{
gsub(/^ *| *$/,"",$1);
if (!($1 in a)) {
print
}
}' 2.tsv 1.tsv
Code for GNU sed:
sed -r 's#\s*(\S+)\s*,\s*(\S+)\s*#/\\s*\1\\s*,\\s*\2\\s*/p#' file1|sed -f - file2
$cat file1
1 , a
2,b
3 ,c
$cat file2
1 ,a
2,b
3,c
$sed -r 's#\s*(\S+)\s*,\s*(\S+)\s*#/\\s*\1\\s*,\\s*\2\\s*/d#' file1|sed -nf - file2
You need to trim all the spaces from $1 before trying to locate it in array a:
awk -F"," 'NR==FNR{$1=$1;a[$1]++;next} {f1=$1; gsub(/ /, "", f1);
if (!a[f1]) print}' 2.tsv 1.tsv