using Regex in AWK seems to not find pattern - regex

Hi I am trying to match the following string to no avail
echo '[xxAA][xxBxx][C]' | awk -F '/\[.*\]/' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
I basically want to have each field be an enclosing bracket such that
field 1 = xxAA
field 2 = xxBxx
field 3 = C
but i keep getting the following result
-->[xxAA][xxBxx][C]<--
any pointers where I am going wrong?

You can use a regex in Field Separator. We enclose the [ and ] in character class to have it considered as literal. Both are separated by | which is logical OR. Since we target them as field separator we just iterate over even field numbers to get the output.
$ echo '[xxAA][xxBxx][C]' | awk -v FS="[]]|[[]" '{ for (i=2;i<=NF;i+=2) print $i }'
xxAA
xxBxx
C

The regex /\[.*\]/ matches the entire input, because the .* matches the ][ inside the input as well as matching the letters.
You could split fields on the ']' character instead, then put it back again in the output:
echo '[xxAA][xxBxx][C]' | awk -F ']' '{ for (i = 1; i <= NF; i++) if ($i != "") printf "-->%s]<--\n", $i }'

This is a job for GNU awk's FPAT variable which lets you specify the pattern of the fields rather than the pattern of the field separators:
$ echo '[xxAA][xxBxx][C]' | awk -v FPAT='[^][]+' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--
With other awks I'd use:
$ echo '[xxAA][xxBxx][C]' | awk -F'\\]\\[' '{ gsub(/^\[|\]$/,""); for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--

Related

How to use sed to extract numbers from a comma separated string?

I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115

How to replace second pattern(dot) after pattern(comma) in bash

How do i replace second dot after comma.
this is the closest i could go
echo '0.592922148,0.821504176,1.174.129.731' | xargs -d ',' -n1 echo | sed 's/\([^\.]*\.[^\.]*\)\./\1/' | sed 's/\([^\.]*\.[^\.]*\)\./\1/'
Output :
0.592922148
0.821504176
1.174129731
Expected output :
0.592922148,0.821504176,1.174129731
You may use
sed -e ':a' -e 's/\(\.[^.,]*\)\./\1/' -e 't a'
See online sed demo:
s='0.592922148,0.821504176,1.174.129.731'
sed -e ':a' -e 's/\(\.[^.,]*\)\./\1/' -e 't a' <<< "$s"
Details
:a - label a
s/\(\.[^.,]*\)\./\1/ - finds and captures into Group 1 a dot, then any 0+ chars other than dot and comma, and then just matches a dot, and replaces this match with the value in Group 1 (thus, removing the second matched dot)
t a - if there was a successful replacement, goes back to the a label position in the string.
While I think the sed solution is your best choice, since you have tagged your question with both sed and awk, an awk solution is fairly straight forward as well using split() and basic string concatenation. (just not nearly as short) For example you could do:
awk -v OFS=, -F, '{
for (i=1; i<=NF; i++) {
n=split ($i, a,".")
if (n > 2) {
s=a[1] "." a[2]
for (j=3; j<=n; j++)
s = s a[j]
$i=s
}
}
}1'
Where you define the field separator and output field separators as ','. Then looping over each field, check the return of split(), splitting the field into an array on '.' into array a. If the resulting number of elements is greater than 2, then put your first two elements back together restoring the first '.' in the number, and then simply concatenate the remaining fields. The 1 at the end is the default "print record" to print the updated record.
Example Use/Output
$ echo '0.592922148,0.821504176,1.174.129.731' |
> awk -v OFS=, -F, '{
> for (i=1; i<=NF; i++) {
> n=split ($i, a,".")
> if (n > 2) {
> s=a[1] "." a[2]
> for(j=3;j<=n;j++)
> s = s a[j]
> $i=s
> }
> }
> }1'
0.592922148,0.821504176,1.174129731
Could you please try following.
echo '0.592922148,0.821504176,1.174.129.731' |
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
ind=index($i,".")
if(ind){
val1=substr($i,1,ind)
val2=substr($i,ind+1)
gsub(/\./,"",val2)
$i=val1 val2
}
}
val1=val2=""
}
1'
Explanation: Adding explanation for above code.
echo '0.592922148,0.821504176,1.174.129.731' | ##Printing values as per OP mentioned and using pipe to send its output as standard input for awk command.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program here.
FS=OFS="," ##Setting FS and OFS as comma for each line of Input_file here.
} ##Closing BEGIN BLOCK here.
{
for(i=1;i<=NF;i++){ ##Starting a for loop to traverse through fields of line..
ind=index($i,".") ##Checking index of DOT in current field and saving it into ind variable.
if(ind){ ##Checking condition if variable ind is NOT NULL.
val1=substr($i,1,ind) ##Creating variable val1 from sub-string in current field from 1 to ind value.
val2=substr($i,ind+1) ##Creating variable val2 from sub-string in current field from ind+1 value to till complete length of current field.
gsub(/\./,"",val2) ##Globally substituting DOTs with NULL in val2 variable.
$i=val1 val2 ##Re-crearing current field with value of val1 val2.
} ##Closing BLOCK for if condition.
} ##Closing BLOCK for for loop.
val1=val2="" ##Nullifying val1 and val2 variables here.
} ##Closing main code BLOCK here.
1' ##Mentioning 1 will print edited/non-edited line.
An awk verison:
echo '0.592922148,0.821504176,1.174.129.731' | awk -F, '{for (i=1;i<=NF;i++) {sub(/\./,"#",$i);gsub(/\./,"",$i);sub(/#/,".",$i);print $i}}'
0.592922148
0.821504176
1.174129731
It splits the line inn to multiple fields by ,. Then replace first . to #. Then replace rest of . to nothing. Last replace # back to . and print it.
Edit
awk -F, '{for (i=1;i<=NF;i++) {sub(/\./,"#",$i);gsub(/\./,"",$i);sub(/#/,".",$i);a=a (i==1?"":",")$i}print a}' file
0.592922148,0.821504176,1.174129731

How to reverse all the words in a file with bash in Ubuntu?

I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text
You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file
Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f
For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.
Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt

Tokenize and capture with sed

Suppose we have a string like
"dir1|file1|dir2|file2"
and would like to turn it into
"-f dir1/file1 -f dir2/file2"
Is there an elegant way to do this with sed or awk for a general case of n > 2?
My attempt was to try
echo "dir1|file1|dir2|file2" | sed 's/\(\([^|]\)|\)*/-f \2\/\4 -f \6\/\8/'
An awk solution:
awk -F'|' '{ for (i=1;i<=NF;i+=2) printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") }' \
<<<"dir1|file1|dir2|file2"
-F'|' splits the input into fields by |
for (i=1;i<=NF;i+=2) loops over the field indices in increments of 2
printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") prints pairs of consecutive fields joined with / and prefixed with -f<space>
((i==NF-1) ? "\n" : " ") terminates each field-pair either with a space, if more fields follow, or a \n to terminate the overall output.
In a comment, the OP suggests a shorter variation, which may be of interest if you don't need/want the output to be \n-terminated:
awk -F'|' '{ for (i=1;i<=NF;++i) printf "%s", (i%2 ? " -f " $i : "/" $i ) }' \
<<<"dir1|file1|dir2|file2"
This might work for you (GNU sed):
sed 's/\([^|]*\)|\([^|]*\)|\?/-f \1\/\2 /g;s/ $//' file
This will work for dir1|file1|dir2|file2|dirn|filen type strings
The regexp forms two back references (\1,\2 used in the replacement part of the substitution command s/pattern/replacement/), the first is all non-|'s, then a |, the second is all non-|'s then an optional | i.e. for the first application of the substitution (N.B. the g flag is implemented and so the substitutions may be multiple) dir1 becomes \1 and file1 becomes \2. All that remains is to prepend -f and replace the first | by / and the second | by a space. The last space is not needed at the end of the line and is removed in the second substitution command.
$ awk -v RS='|' 'NR%2{p=$0;next} {printf " -f %s/%s", p, $0}' <<< 'dir1|file1|dir2|file2'
-f dir1/file1 -f dir2/file2
A gnu-awk solution:
s="dir1|file1|dir2|file2"
awk 'BEGIN{ FPAT="[^|]+\\|[^|]+" } {
for (i=1; i<=NF; i++) {
sub(/\|/, "/", $i);
if (i>1)
printf " ";
printf "-f " $i
};
print ""
}' <<< "$s"
-f dir1/file1 -f dir2/file2
FPAT is used for grabbing dir1|file2 into single field.

replace nth occurrence of character in a file using awk regardless of the line

I am trying to replace the nth occurrence of a character or string regardless of the line using awk.
So if our data was this
|||||||
||||||
|||||
|||
and we were trying to replace | with A
then the output should look like this, assuming we want to replace every 3rd occurance
||A||A|
|A||A|
|A||A
||A
The current awk command I am using is this
awk '/|/{c++;if(c==3){sub(/|/,"A");c=0}}1' test.data
and it wrongly outputs this
|||||||
||||||
A||||
|||
also the data can look like this
|||xfsafrwe|||asfasdf|
|safasf|||asfasdf||
||asfasf|||
|||
and the result of course is this
||Axfsafrwe||Aasfasdf|
|safasfA||asfasdfA|
|Aasfasf||A
||A
Thanks
With GNU awk:
awk '{
for (i = 0; ++i <= NF;)
++c % n || $i = v
}1' OFS= FS= n=3 v=A infile
Adjusted after OP clarification:
awk '{
for (i = 0; ++i <=NF;)
if ($i == o)
++C % c || $i = n
} 1' FS= OFS= c=3 o=\| n=A infile