how to replace everything except string of interest - regex

file.txt
fruits:banana,apple,grape,limon,orange,tomate,
fruits:apple,limon,
fruits:banana,grape,limon,
fruits:orange,tomate,grape,
fruits:banana,
fruits:apple,
fruits:banana,apple,
I need to replace everything that is different than "banana" for FRUIT, and get output like this:
fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,
fruits:banana,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,FRUIT,
fruits:banana,
fruits:FRUIT,
fruits:FRUIT,apple,
I tried using awk, but I can only replace the fields of specific strings.
Example replace all strings "apple" by fruit2, or all strings "apple" by fruit2 and all strings "tomate"or "orange" by fruit3
awk -F":" '{ gsub(/apple/,"FRUIT2",$2); print }' OFS="," file.tx
or
awk -F":" '{ gsub(/apple/,"FRUIT2",$2);;gsub(/tomate|orange/,"FRUIT3",$2); print }' OFS="," file.txt |sed "s/./:/7"
fruits:banana,FRUIT2,grape,limon,FRUIT3,FRUIT3,
fruits:FRUIT2,limon,
fruits:banana,grape,limon,
fruits:FRUIT3,FRUIT3,grape,
fruits:banana,
fruits:FRUIT2,
fruits:banana,FRUIT2
but I really need is to replace everything that is different from that for any string, ex: fruit4
How to generate output like this?
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT3,FRUIT3,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT4,FRUIT4,FRUIT4,
fruits:FRUIT3,FRUIT3,FRUIT4,
fruits:FRUIT4,
fruits:FRUIT2,
fruits:FRUIT4,FRUIT2

This awk should work:
awk -F, -v OFS=, '{
for (i=1; i<=NF; i++)
if ($i !~ /(^|:)banana$/)
sub(/[^:]+$/, "FRUIT", $i)
} 1' file
Output:
fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,
fruits:banana,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,FRUIT,
fruits:banana,
fruits:FRUIT,
fruits:banana,FRUIT,

To make the process automated, you can do
awk -F '[:,]' -v OFS=, '
{
for (i=2; i<=NF; i++)
if ($i)
if (seen[$i])
$i = seen[$i]
else
$i = seen[$i] = "FRUIT" ++n
sub(OFS, ":")
print
}
END {
print "map:"
for (key in seen)
print key "\t" seen[key]
}
' file
fruits:FRUIT1,FRUIT2,FRUIT3,FRUIT4,FRUIT5,FRUIT6,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT1,FRUIT3,FRUIT4,
fruits:FRUIT5,FRUIT6,FRUIT3,
fruits:FRUIT1,
fruits:FRUIT2,
fruits:FRUIT1,FRUIT2,
map:
orange FRUIT5
tomate FRUIT6
apple FRUIT2
limon FRUIT4
banana FRUIT1
grape FRUIT3

If you'd like some flexibility in being able to specify your mapping of old to new names on the command line:
$ cat tst.awk
BEGIN {
FS="[:,]"; OFS=","
split(map,t)
for (i=1; i in t; i+=2) {
m[t[i]] = t[i+1]
}
}
{
printf "%s:", $1
for (i=2;i<=NF;i++) {
if ($i in m ) { $i = m[$i] }
else if ("*" in m) { $i = m["*"] }
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
.
$ awk -v map='apple,FRUIT2,tomate,FRUIT3,*,FRUIT4' -f tst.awk file
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT4,FRUIT3,FRUIT4
fruits:FRUIT2,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT3,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4
fruits:FRUIT2,FRUIT4
fruits:FRUIT4,FRUIT2,FRUIT4
$ awk -v map='apple,BAZINGA,*,VEGGIE' -f tst.awk file
fruits:VEGGIE,BAZINGA,VEGGIE,VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE
fruits:VEGGIE,BAZINGA,VEGGIE
$ awk -v map='apple,FRUIT2,tomate,FRUIT3' -f tst.awk file
fruits:banana,FRUIT2,grape,limon,orange,FRUIT3,
fruits:FRUIT2,limon,
fruits:banana,grape,limon,
fruits:orange,FRUIT3,grape,
fruits:banana,
fruits:FRUIT2,
fruits:banana,FRUIT2,

Related

How do I detect embbeded field names and reorder fields using awk?

I have the following data:
"b":1.14105,"a":1.14106,"x":48,"t":1594771200000
"a":1.141,"b":1.14099,"x":48,"t":1594771206000
...
I am trying to display data in a given order and only for three fields. As the fields order is not guaranteed, I need to read the "tag" for each comma separated column for each line.
I have tried to solve this task using awk:
awk -F',' '
{
for(i=1; i<=$NF; i++) {
if(index($i,"\"a\":")!=0) a=$i;
if(index($i,"\"b\":")!=0) b=$i;
if(index($i,"\"t\":")!=0) t=$i;
}
printf("%s,%s,%s\n",a,b,t);
}
'
But I get:
,,
,,
...
In the above data sample, I would expect:
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
...
Note: I am using the awk shipped with FreeBSD
$ cat tst.awk
BEGIN {
FS = "[,:]"
OFS = ","
}
{
for (i=1; i<NF; i+=2) {
f[$i] = $(i+1)
}
print p("a"), p("b"), p("t")
}
function p(tag, t) {
t = "\"" tag "\""
return t ":" f[t]
}
.
$ awk -f tst.awk file
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
With awk and an array:
awk -F '[:,]' '{for(i=1; i<=NF; i=i+2){a[$i]=$(i+1)}; print "\"a\":" a["\"a\""] ",\"b\":" a["\"b\""] ",\"t\":" a["\"t\""]}' file
or
awk -F '[":,]' '{for(i=2; i<=NF; i=i+4){a[$i]=$(i+2)}; print "\"a\":" a["a"] ",\"b\":" a["b"] ",\"t\":" a["t"]}' file
Output:
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
similar awk where you can specify the fields and order.
$ awk -F[:,] -v fields='"a","b","t"' 'BEGIN{n=split(fields,f)}
{for(i=1;i<NF;i+=2) map[$i]=$(i+1);
for(i=1;i<=n;i++) printf "%s", f[i]":"map[f[i]] (i==n?ORS:",")}' file
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000

awk regex for nested big brackets

I have a structure like this:
label1 {
label1_1 {
item1_1_1: "value1_1_1";
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
item1_1_3: "value1_1_3";
};
label1_2 {...};
...
};
label2 {
item2_1: "value2_1";
label2_1:{
item2_1_1: "value2_1_1";
...
};
};
The section could be in one line or multiple lines, and empty line presentable. I'm trying to use awk to get any section with given label name,
section=$(awk -v RS='' -v ORS='\n\n' "/($2)\s(\{([^{}]|(?R)|\n)*\})/" $1)
where the $1 is file name, $2 is label name. It works if happens no empty line in the section, for example "label2", but faild by others.
What's the correct regex I should use?
Here's one way to do what you want, assuming neither { nor } can occur within quoted strings and using GNU awk 4.* for a couple of extensions:
$ cat tst.awk
BEGIN { RS="^$" }
{
tmp = $0
while ( match(tmp,/(\<([[:alnum:]_]+):?\s*{[^{}]+};)/,a) ) {
start[a[2]] = RSTART
lgth[a[2]] = RLENGTH
tmp = substr(tmp,1,RSTART-1) sprintf("%*s",length(a[1]),"") substr(tmp,RSTART+RLENGTH)
}
}
label in start { print substr($0,start[label],lgth[label]) }
.
$ awk -v label='label2' -f tst.awk file
label2 {
item2_1: "value2_1";
label2_1:{
item2_1_1: "value2_1_1";
...
};
};
$ awk -v label='label1_1' -f tst.awk file
label1_1 {
item1_1_1: "value1_1_1";
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
item1_1_3: "value1_1_3";
};
$ awk -v label='label1_1_2' -f tst.awk file
label1_1_2:{ item1_1_2_1: "value1_1_2_1";};
You can call awk as either awk -f scriptfile inputfile or awk 'script' inputfile so to use the above awk script inline instead of stored in a file is just:
awk '
BEGIN { RS="^$" }
{
tmp = $0
while ( match(tmp,/(\<([[:alnum:]_]+):?\s*{[^{}]+};)/,a) ) {
start[a[2]] = RSTART
lgth[a[2]] = RLENGTH
tmp = substr(tmp,1,RSTART-1) sprintf("%*s",length(a[1]),"") substr(tmp,RSTART+RLENGTH)
}
}
label in start { print substr($0,start[label],lgth[label]) }
' file

How to replace pipe delimited to white space in a specific range of columns?

I have use awk and sed to replace pipe with white space, here's my code:
awk -F "|" -v OFS=" " ' $1=$1 '
sed "s/|/ /g" try.log
But it deletes all the pipe in my data. Here's a sample data:
JAP|09|7777|TECHNOLOGY|AGRICULTURE|INDUSTRY
The result I want is this:
JAP 09 7777|TECHNOLOGY|AGRICULTURE|INDUSTRY
Thanks in advance.
Perl solution:
perl -pe '$c = 0; s/\|/ /, $c++ while $c < 2'
This might work for you (GNU sed):
sed 's/|/ /;s// /' file
A programmatic solution might be:
sed 'y/|/\n/;s/\n/|/3g;y/\n/ /' file
Try this one liner awk:
awk -F"|" '{a=""; for (i=1;i<=NF; i++)if(i <=3) a=a " "$i; else a=a "|"$i; sub("^[ ]","",a); print a}'
Long format:
BEGIN {
FS="|";
}
{
a="";
for (i=1; i<= NF; i++)
{
if (i <= 3)
{
a=a " "$i;
}
else
{
a=a "|"$i;
}
}
sub("^[ ]","",a);
print a;
}
Output:
echo "JAP|09|7777|TECHNOLOGY|AGRICULTURE|INDUSTRY" | awk -f script.awk
JAP 09 7777|TECHNOLOGY|AGRICULTURE|INDUSTRY
awk '{sub(/\|09\|/," 09 ")}1' file
JAP 09 7777|TECHNOLOGY|AGRICULTURE|INDUSTRY

Awk BEGIN example

$ cat tables.txt | awk 'BEGIN {
RS="\nStation"
FS="\n"
}
{ print $1 }
'
Running the above command in the above format or as a script gives me the desired output.
08594: SAL , CAPE VERDE
But if I try running the same in CLI as a single gives me error as syntax. What I am doing wrong here?
$ awk 'BEGIN { RS="\nStation" FS="\n" }{ print $1 }' tables.txt
You can use:
awk 'BEGIN { RS="\nStation"; FS="\n" }{ print $1 }' tables.txt
i.e. use ; to terminate one assignment before starting another i.e. FS="\n".

using Regex in AWK seems to not find pattern

Hi I am trying to match the following string to no avail
echo '[xxAA][xxBxx][C]' | awk -F '/\[.*\]/' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
I basically want to have each field be an enclosing bracket such that
field 1 = xxAA
field 2 = xxBxx
field 3 = C
but i keep getting the following result
-->[xxAA][xxBxx][C]<--
any pointers where I am going wrong?
You can use a regex in Field Separator. We enclose the [ and ] in character class to have it considered as literal. Both are separated by | which is logical OR. Since we target them as field separator we just iterate over even field numbers to get the output.
$ echo '[xxAA][xxBxx][C]' | awk -v FS="[]]|[[]" '{ for (i=2;i<=NF;i+=2) print $i }'
xxAA
xxBxx
C
The regex /\[.*\]/ matches the entire input, because the .* matches the ][ inside the input as well as matching the letters.
You could split fields on the ']' character instead, then put it back again in the output:
echo '[xxAA][xxBxx][C]' | awk -F ']' '{ for (i = 1; i <= NF; i++) if ($i != "") printf "-->%s]<--\n", $i }'
This is a job for GNU awk's FPAT variable which lets you specify the pattern of the fields rather than the pattern of the field separators:
$ echo '[xxAA][xxBxx][C]' | awk -v FPAT='[^][]+' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--
With other awks I'd use:
$ echo '[xxAA][xxBxx][C]' | awk -F'\\]\\[' '{ gsub(/^\[|\]$/,""); for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--