Tokenize and capture with sed

Tokenize and capture with sed - regex

Suppose we have a string like
"dir1|file1|dir2|file2"
and would like to turn it into
"-f dir1/file1 -f dir2/file2"
Is there an elegant way to do this with sed or awk for a general case of n > 2?
My attempt was to try
echo "dir1|file1|dir2|file2" | sed 's/\(\([^|]\)|\)*/-f \2\/\4 -f \6\/\8/'

An awk solution:
awk -F'|' '{ for (i=1;i<=NF;i+=2) printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") }' \
<<<"dir1|file1|dir2|file2"
-F'|' splits the input into fields by |
for (i=1;i<=NF;i+=2) loops over the field indices in increments of 2
printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") prints pairs of consecutive fields joined with / and prefixed with -f<space>
((i==NF-1) ? "\n" : " ") terminates each field-pair either with a space, if more fields follow, or a \n to terminate the overall output.
In a comment, the OP suggests a shorter variation, which may be of interest if you don't need/want the output to be \n-terminated:
awk -F'|' '{ for (i=1;i<=NF;++i) printf "%s", (i%2 ? " -f " $i : "/" $i ) }' \
<<<"dir1|file1|dir2|file2"

This might work for you (GNU sed):
sed 's/\([^|]*\)|\([^|]*\)|\?/-f \1\/\2 /g;s/ $//' file
This will work for dir1|file1|dir2|file2|dirn|filen type strings
The regexp forms two back references (\1,\2 used in the replacement part of the substitution command s/pattern/replacement/), the first is all non-|'s, then a |, the second is all non-|'s then an optional | i.e. for the first application of the substitution (N.B. the g flag is implemented and so the substitutions may be multiple) dir1 becomes \1 and file1 becomes \2. All that remains is to prepend -f and replace the first | by / and the second | by a space. The last space is not needed at the end of the line and is removed in the second substitution command.

$ awk -v RS='|' 'NR%2{p=$0;next} {printf " -f %s/%s", p, $0}' <<< 'dir1|file1|dir2|file2'
-f dir1/file1 -f dir2/file2

A gnu-awk solution:
s="dir1|file1|dir2|file2"
awk 'BEGIN{ FPAT="[^|]+\\|[^|]+" } {
for (i=1; i<=NF; i++) {
sub(/\|/, "/", $i);
if (i>1)
printf " ";
printf "-f " $i
};
print ""
}' <<< "$s"
-f dir1/file1 -f dir2/file2
FPAT is used for grabbing dir1|file2 into single field.

Related

Removing multiple delimiters between outside delimiters on each line

Using awk or sed in a bash script, I need to remove comma separated delimiters that are located between an inner and outer delimiter. The problem is that wrong values ends up in the wrong columns, where only 3 columns are desired.
For example, I want to turn this:
2020/11/04,Test Account,569.00
2020/11/05,Test,Account,250.00
2020/11/05,More,Test,Accounts,225.00
Into this:
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
I've tried to use a few things, testing regex:
But I cannot find a solution to only select the commas in order to remove.

awk -F, '{ printf "%s,",$1;for (i=2;i<=NF-2;i++) { printf "%s ",$i };printf "%s,%s\n",$(NF-1),$NF }' file
Using awk, print the first comma delimited field and then loop through the rest of the field up to the last but 2 field printing the field followed by a space. Then for the last 2 fields print the last but one field, a comma and then the last field.

With GNU awk for the 3rd arg to match():
$ awk -v OFS=, '{
match($0,/([^,]*),(.*),([^,]*)/,a)
gsub(/,/," ",a[2])
print a[1], a[2], a[3]
}' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
or with any awk:
$ awk '
BEGIN { FS=OFS="," }
{
n = split($0,a)
gsub(/^[^,]*,|,[^,]*$/,"")
gsub(/,/," ")
print a[1], $0, a[n]
}
' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

Use this Perl one-liner:
perl -F',' -lane 'print join ",", $F[0], "#F[1 .. ($#F-1)]", $F[-1];' in.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
$F[0] : first element of the array #F (= first comma-delimited value).
$F[-1] : last element of #F.
#F[1 .. ($#F-1)] : elements of #F between the second from the start and the second from the end, inclusive.
"#F[1 .. ($#F-1)]" : the above elements, joined on blanks into a string.
join ",", ... : join the LIST "..." on a comma, and return the resulting string.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

perl -pe 's{,\K.*(?=,)}{$& =~ y/,/ /r}e' file
sed -e ':a' -e 's/\(,[^,]*\),\([^,]*,\)/\1 \2/; t a' file
awk '{$1=$1","; $NF=","$NF; gsub(/ *, */,","); print}' FS=, file
awk '{for (i=2; i<=NF; ++i) $i=(i>2 && i<NF ? " " : ",") $i} 1' FS=, OFS= file

awk doesn't support look arounds, we could have it by using match function of awk; using that could you please try following, written and tested with shown samples in GNU awk.
awk '
match($0,/,.*,/){
val=substr($0,RSTART+1,RLENGTH-2)
gsub(/,/," ",val)
print substr($0,1,RSTART) val substr($0,RSTART+RLENGTH-1)
}
' Input_file

Yet another perl
$ perl -pe 's/(?:^[^,]*,|,[^,]*$)(*SKIP)(*F)|,/ /g' ip.txt
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
(?:^[^,]*,|,[^,]*$) matches first/last field along with the comma character
(*SKIP)(*F) this would prevent modification of preceding regexp
|, provide , as alternate regexp to be matched for modification
With sed (assuming \n is supported by the implementation, otherwise, you'll have to find a character that cannot be present in the input)
sed -E 's/,/\n/; s/,([^,]*)$/\n\1/; y/,/ /; y/\n/,/'
s/,/\n/; s/,([^,]*)$/\n\1/ replace first and last comma with newline character
y/,/ / replace all comma with space
y/\n/,/ change newlines back to comma

A similar answer to Timur's, in awk
awk '
BEGIN { FS = OFS = "," }
function join(start, stop, sep, str, i) {
str = $start
for (i = start + 1; i <= stop; i++) {
str = str sep $i
}
return str
}
{ print $1, join(2, NF-1, " "), $NF }
' file.csv
It's a shame awk doesn't ship with a join function builtin

How to reverse all the words in a file with bash in Ubuntu?

I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text

You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file

Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f

For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.

Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt

Parsing a .csv-like file in bash

I have a file formatted as follows:
string1,string2,string3,...
...
I have to analyze the second column, counting the occurrences of each string, and producing a file formatted as follows:
"number of occurrences of x",x
"number of occurrences of y",y
...
I managed to write the following script, that works fine:
#!/bin/bash
> output
regExp='^\s*([0-9]+) (.+)$'
while IFS= read -r line
do
if [[ "$line" =~ $regExp ]]
then
printf "${BASH_REMATCH[1]},${BASH_REMATCH[2]}\n" >> output
fi
done <<< "`gawk -F , '!/^$/ {print $2}' $1 | sort | uniq -c`"
My question is:
There is a better and simpler way to do the job?
In particular I don't know how to fix that:
gawk -F , '!/^$/ {print $2}' miocsv.csv | sort | uniq -c | gawk '{print $1","$2}'
The problem is that string2 can contain whitespaces and, if so, the second call on gawk will truncate the string.
Neither i know how to print all the field "from 2 to NF", maintaining the delimiter, which can occur several times in succession.
Thank very much,
Goodbye
EDIT:
As asked, here there is some sample data:
(It is an exercise, sorry for the inventive)
Input:
*,*,*
test, test ,test
prova, * , prova
test,test,test
prova, prova ,prova
leonardo,da vinci,leonardo
in,o u t ,pr
, spaces ,
, spaces ,
leonardo,da vinci,leonardo
leonardo,da vinci,leonardo
leonardo,da vinci,leonardo
in,o u t ,pr
test, test ,test
, tabs ,
, tabs ,
po,po,po
po,po,po
po,po,po
prova, * , prova
prova, * , prova
*,*,*
*,*,*
*,*,*
, spaces ,
, tabs ,
Output:
3, *
4,*
4,da vinci
2,o u t
3,po
1, prova
3, spaces
3, tabs
1,test
2, test

A one-liner in awk:
awk -F, 'x[$2]++ { } END { for (i in x) print x[i] "," i }' input.csv
It stores the count for each 2nd column string in the associative array x, and in the end loops through the array and prints the results.
To get the exact output you showed for this example, you need to pipe it to sort(1), setting the field delimiter to , and the sort key to the 2nd field:
awk -F, 'x[$2]++ { } END { for (i in x) print x[i] "," i }' input.csv | sort -t, -k2,2
The only condition, of course, is that the 2nd column of each line doesn't contain a ,

You can make your final awk:
gawk '{ sub(" *","",$0); sub(" ",",",$0); print }'
or use sed for this sort of thing:
sed 's/ *\([0-9]*\) /\1,/'

Here is a Perl one-liner, similar to Filipe's awk solution:
perl -F, -lane '$x{$F[1]}++; END{ for $i (sort keys %x) { print "$x{$i},$i" } }' input.csv
The output is sorted alphabetically according to the second column.
The #F autosplit array starts at index $F[0] while awk fields start with $1

Parse IP and Download-Total from mikrotik

I wanna extract IP and download-total from mikrotik command /queue simple print stat
Here's some example :
0 name="101" target=192.168.10.101/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=17574842/389197663 total-bytes=0 packets=191226/308561
total-packets=0 dropped=9/5899 total-dropped=0
1 name="102" target=192.168.10.102/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=65593392/183786457 total-bytes=0 packets=163260/166022
total-packets=0 dropped=175/2403 total-dropped=0
2 name="103" target=192.168.10.103/32 rate=0bps/0bps total-rate=0bps
packet-rate=0/0 total-packet-rate=0 queued-bytes=0/0
total-queued-bytes=0 queued-packets=0/0 total-queued-packets=0
bytes=3263234/67407044 total-bytes=0 packets=41437/52602
total-packets=0 dropped=0/546 total-dropped=0
All that I need is :
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044
But I get
target=192.168.10.101/32
bytes=17574842/389197663
target=192.168.10.102/32
bytes=65593392/183786457
target=192.168.10.103/32
bytes=3263234/67407044
I try it with grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' | sed 's/^ //g'.
So, how can I parse it? Sorry for bad english..

Just continue your line of parsing with another pipes (most easy way i think)
grep -oP 'target=.*?\ |[^\-]bytes=.*?\ ' file | sed 's/^ //g' | sed -r 's/target=([^/]*)[/].*/\1/; s/bytes=[^/]*[/]//' | sed 'N; s/\n/ /'
output
192.168.10.101 389197663
192.168.10.102 183786457
192.168.10.103 67407044

sed '/^[0-9]\{1,\}[[:blank:]]\{1,\}name/,/^[[:blank:]]*$/ {
/^[0-9]/{
s#.*target=\([^/]*\).*#\1#;h;d
}
\#^[[:blank:]]*bytes=[0-9]*/\([0-9]*\).*# !d
s//\1/
G
s/\(.*\)\n\(.*\)/\2 \1/p
}
d
' YourFile
A bit long but do the job in 1 sed
awk '{
if ( $3 ~ /target=/ ) split( $3, aIP, "[=/]")
if ( $1 ~ /^[[:blank:]]*bytes=[0-9]*/ ) {
split( $1, aByt, "/")
print aIP[2] " " aByt[2]
}
}' YourFile
same in awk
if always same exact structure
awk 'BEGIN{ RS="" }
{ split( $3, aIP, "[=/]"); split( $12, aByt, "/")
print aIP[2] " " aByt[2]
}' YourFile

filtering some text from line using sed linux

I have a following content in the file:
NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"
I just want to filter out NAME , SYSPORT and ALM field using sed

Try the below sed command to filter out NAME,SYSPORT,ALM fields ,
$ sed 's/.*\(NAME=[^ ]*\).*\(SYSPORT=[^ ]*\).*\(ALM:[^;]*\).*/\1 \2 \3/g' file
NAME=ALARMCARDSLOT137 SYSPORT=2629 ALM:20063

why not using grep?
grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
test with your text:
kent$ echo 'NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"'|grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063

Here is another awk
awk -F" |;" -v RS=" " '/NAME|SYSPORT|ALM/ {print $1}'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063

Whenever there are name=value pairs in input files, I find it best to first create an array mapping the names to the values and then operating on the array using the names of the fields you care about. For example:
$ cat tst.awk
function bldN2Varrs( i, fldarr, fldnr, subarr, subnr, tmp ) {
for (i=2;i<=NF;i+=2) { gsub(/ /,RS,$i) }
split($0,fldarr,/[[:blank:]]+/)
for (fldnr in fldarr) {
split(fldarr[fldnr],tmp,/=/)
gsub(RS," ",tmp[2])
gsub(/^"|"$/,"",tmp[2])
name2value[tmp[1]] = tmp[2]
split(tmp[2],subarr,/ /)
for (subnr in subarr) {
split(subarr[subnr],tmp,/:/)
subName2value[tmp[1]] = tmp[2]
}
}
}
function prt( fld, subfld ) {
if (subfld) print fld "/" subfld "=" subName2value[subfld]
else print fld "=" name2value[fld]
}
BEGIN { FS=OFS="\"" }
{
bldN2Varrs()
prt("NAME")
prt("SYSPORT")
prt("DYNDATA","ALM")
}
.
$ awk -f tst.awk file
NAME=ALARMCARDSLOT137
SYSPORT=2629
DYNDATA/ALM=20063;1406718801,
and if 20063;1406718801, isn't the desired value for the ALM field and you just want some subsection of that, simply tweak the array construction function to suit whatever your criteria is.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Tokenize and capture with sed - regex

Suppose we have a string like "dir1|file1|dir2|file2" and would like to turn it into "-f dir1/file1 -f dir2/file2" Is there an elegant way to do this with sed or awk for a general case of n > 2? My attempt was to try echo "dir1|file1|dir2|file2" | sed 's/\(\([^|]\)|\)*/-f \2\/\4 -f \6\/\8/'

$ awk -v RS='|' 'NR%2{p=$0;next} {printf " -f %s/%s", p, $0}' <<< 'dir1|file1|dir2|file2' -f dir1/file1 -f dir2/file2

A gnu-awk solution: s="dir1|file1|dir2|file2" awk 'BEGIN{ FPAT="[^|]+\\|[^|]+" } { for (i=1; i<=NF; i++) { sub(/\|/, "/", $i); if (i>1) printf " "; printf "-f " $i }; print "" }' <<< "$s" -f dir1/file1 -f dir2/file2 FPAT is used for grabbing dir1|file2 into single field.

Related

Removing multiple delimiters between outside delimiters on each line

How to reverse all the words in a file with bash in Ubuntu?

Parsing a .csv-like file in bash

Parse IP and Download-Total from mikrotik

filtering some text from line using sed linux

Categories

Resources