decimal pattern matching - regex

I have a big file and the lines pattern is given below:
MDQ[11:15],IO,MDQ[10:14],,,,MDQ[12:16],TPR_AAWD[11:15]
I want to modify this file like given below:
MDQ[11],IO,MDQ[10],,,,MDQ[12],TPR_AAWD[11]
MDQ[12],IO,MDQ[11],,,,MDQ[13],TPR_AAWD[12]
MDQ[13],IO,MDQ[12],,,,MDQ[14],TPR_AAWD[13]
MDQ[14],IO,MDQ[13],,,,MDQ[15],TPR_AAWD[14]
How i can implement this in sed/awk/perl/csh/vim?
Please help

awk -F '[][]' '{
split($2, a, /:/)
split($4, b, /:/)
split($6, c, /:/)
split($8, d, /:/)
for (i=0; i < a[2]-a[1]; i++) {
printf("%s[%d]%s[%d]%s[%d]%s[%d]\n",
$1, a[1]+i,
$3, b[1]+i,
$5, c[1]+i,
$7, d[1]+i)
}
}'

Hope the below helps:
sed -e 's/:[0-9]*//g'

Related

How to reverse all the words in a file with bash in Ubuntu?

I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text
You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file
Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f
For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.
Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt

Printing the actual field delimiter value not the regular expression

Given the following input:
check1;check2
check1;;check2
check1,check2
and the awk command:
awk -F';+|,' '{print $1 FS $2}'
FS should contain the selected delimiter?
How can you print the delimiter which is selected i.e. either of ;, ;; or , not the regular expression that the describes the delimiters.
If the input is check1;check2 then the output should be check1;check2.
If you're using GNU Awk (gawk) you can use the 4th argument of split():
gawk '{split($0, a, /;+|,/, seps); print a[1] seps[1] a[2]}' file
Output:
check1;check2
check1;;check2
check1,check2
Using it within a loop is also easy to handle:
gawk '{nf = split($0, a, /;+|,/, seps); for (i = 1; i <= nf; ++i) printf "%s%s", a[i], seps[i]; print ""}' file
22011,25029;;3331,25275
6740,16516;;27292,1217
13480,31488;;7947,18804
328,30623;;12470,6883
If you only need the fields you would only have to touch a. Separators would be separated in seps and the indices of those are aligned with a.
I don't think awk stores the matched delimiter anywhere. If you use GNU awk, you can do it yourself:
gawk '{match($0, /([^;,]*)(;+|,)(.*)/, a); print a[1], a[2], a[3]}'
GNU awk has this feature for records not fields so you could also do something like this:
$ awk '{printf "%s%s",$0,RT}' RS=';+|,|\n' file
check1;check2
check1;;check2
check1,check2
Where RT is the value match by RS for the given record which you can see by:
$ awk '{printf "%s",RT}' RS=';+|,|\n' file
;
;;
,

Regex with awk or gawk

I'm a beginner user of awk/gawk.
If I run below, the shell gives me nothing. Please help!
echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk 'BEGIN{
n = split($0, arr, /,(?=\\w+=)/)
for (x=1; x<n; x++) printf "arr[%d]=%s\n", x, arr[x]
}'
.....................................................
I am trying to parse:
A=1,B=2,3,C=,D=5,6,E=7,8,9
Expected Output:
A=1
B=2,3
C=
D=5,6
E=7,8,9
I bet there's something wrong with my awk.
gawk doesn't support look-ahead.
if you want gawk to parse it as you expected, try this:
awk '{n=split(gensub(/,([A-Z])/, " \\1","g" ),arr," ");for(x=1;x<=n;x++)print arr[x]}'
test with your example:
kent$ echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk '{n=split(gensub(/,([A-Z])/, " \\1","g" ),arr," ");for(x=1;x<=n;x++)print arr[x]}'
A=1
B=2,3
C=
D=5,6
E=7,8,9
This might be easier with sed:
$ echo "A=1,B=2,3,C=,D=5,6,E=7,8,9" | sed 's/,\(\w\+=\)/\n\1/g'
A=1
B=2,3
C=
D=5,6
E=7,8,9
If you are using gnu awk, you could do:
awk '{printf $0 "\n" substr( RT, 2 )}' RS=,[A-Z]
As nhahtdh, theres is no lookahead in awk... But you can use a different separator for the assignments. Why not "A=1;B=2,3,4;C=5..."?
If your input must have that format, try flex...
You could also use comma as the record separator:
echo "A=1,B=2,3,C=,D=5,6,E=7,8,9" |
awk -v RS=, '{sep=","} /=/ {sep="\n"} NR==1 {sep=""} {printf "%s%s", sep, $0}'
outputs
A=1
B=2,3
C=
D=5,6
E=7,8,9
You have two problems. First, you don't want a BEGIN clause; you just want this to run on every input line. Second, you are trying to use regular expression features that AWK does not support.
Instead of trying to use a fancy pattern that splits the string, loop and call match() to parse out the features you want.
echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk '
{
line = $0
for (i = 0;;)
{
i = match(line, /([A-Z]+)=([0-9,]*)(,|$)/, arr)
if (0 == i)
break
key = arr[1]
value = arr[2]
l = length(key "=" value ",") + 1
line = substr(line, l)
printf "DEBUG: key '%s' value '%s'\n", key, value
}
}'
This prints:
DEBUG: key A value 1
DEBUG: key B value 2,3
DEBUG: key C value
DEBUG: key D value 5,6
DEBUG: key E value 7,8,9
Other way using awk
awk '{print gensub(/,([A-Z]+=)/, "\n\\1","g")}' temp.txt
Output
A=1
B=2,3
C=
D=5,6
E=7,8,9

Working with AWK regex

I have a file in which have values in following format-
20/01/2012 01:14:27;UP;UserID;User=bob email=abc#sample.com
I want to pick each value from this file (not labels). By saying label, i mean to say that for string email=abc#sample.com, i only want to pick abc#sample.com and for sting User=bob, i only want to pic bob. All the Space separated values are easy to pick but i am unable to pick the values separated by Semi colon. Below is the command i am using in awk-
awk '{print "1=",$1} /;/{print "2=",$2,"3=",$3}' sample_file
In $2, i am getting the complete string till bob and rest of the string is assigned to $3. Although i can work with substr provided with awk but i want to be on safe side, string length may vary.
Can somebody tell me how to design such regex to parse my file.
You can set multiple delimiters using awk -F:
awk -F "[ \t;=]+" '{ print $1, $2, $3, $4, $5, $6, $7, $8 }' file.txt
Results:
value1 value2 value3 value4 label1 value5 label2 value6
EDIT:
You can remove anything before the equal signs using sub (/[^=]*=/,"", $i). This will allow you to just print the 'values':
awk 'BEGIN { FS="[ \t;]+"; OFS=" " } { for (i=1; i<=NF; i++) { sub (/[^=]*=/,"", $i); line = (line ? line OFS : "") $i } print line; line = "" }' file.txt
Results:
20/01/2012 01:14:27 UP UserID bob abc#sample.com

Grepping only certain lines of files

I have a collection of files in a directory which I would like to search for a particular regular expression (=([14-9]|[23][0-9]), as it happens). But I only care when this pattern falls on the second, sixth, tenth, ..., 4n+2-th line.
Is there a good way to do this?
modification to the answer without using extra grep,
awk '/(=([14-9]|[23][0-9])/ && FNR % 4==2{print FNR":"$0}}' inputFile
You should pass it through awk first to get rid of the unwanted lines (and optionally put on line numbers so that you can still tell what the real lines are):
pax> echo 'L1
...> L2
...> L3
...> L4
...> L5
...> L6
...> L7
...> L8
...> L9
...> L10
...> L11
...> L12' | awk '{if ((FNR % 4)==2) {print FNR":"$0}}'
2:L2
6:L6
10:L10
(just use '{if ((FNR % 4)==2) {print}}' if you don't care about the line numbers). So something like:
awk '{if ((FNR % 4)==2) {print FNR":"$0}}' inputFile | grep '(=([14-9]|[23][0-9])'
should do the trick.
try to do this with awk. Someting like
BEGIN {i=0; n=0; }
/yourregegex/ {
if(i==n) { print $0; n= 4*i+2;}
}
{
i++;
}