Combination of same delimiters in awk - regex

I have a String
1__2_3__4_5_6
I want to set '__'(2 underscore) as delimiter in AWK.
$1 should be 1
$2 should be 2_3
$3 should be 4_5_6

Just set __ as FS value. You could also pass a regex as FS value.
$ echo '1__2_3__4_5_6' | awk -v FS="__" '{print $1}'
1
$ echo '1__2_3__4_5_6' | awk -v FS="__" '{print $2}'
2_3
$ echo '1__2_3__4_5_6' | awk -v FS="__" '{print $3}'
4_5_6
$ echo '1__2_3__4_5_6' | awk -v FS="_{2}" '{print $3}'
4_5_6
_{2} matches exactly two underscores.

Related

Awk regex expression to select for a certain number of delimiters in a field

I am trying to select for a field which has exactly a certain number of commas. For example, I can select for 1 comma in a field as follows:
$ echo jkl,abc | awk '$1 ~ /[a-z],[a-z]/{print $0}'
jkl,abc
The expected output, "jkl,abc", is seen.
However, when I try for 2 commas it doesn't work.
$ echo jkl,abc,xyz | awk '$1 ~ /[a-z],[a-z],[a-z]/{print $0}'
(no output)
Any thoughts?
Thanks!
/[a-z],[a-z],[a-z]/ doesn't match jkl,abc,xyz because you didn't use quantifiers. Right regex would have been: /^[a-z]+,[a-z]+,[a-z]+$/ e.g.
awk '/^[a-z]+,[a-z]+,[a-z]+$/' <<< 'jkl,abc,xyz'
However, to validate number of commas, it would be better to compare number of fields while using FS = "," like this:
awk -F, 'NF == 2' <<< 'jkl,abc'
awk -F, 'NF == 3' <<< 'jkl,abc,xyz'
jkl,abc
jkl,abc,xyz
It should be like:
echo jkl,abc,xyz | awk '/[a-z]+,[a-z]+,[a-z]+/{print $0}'
OR
echo jkl,abc,xyz | awk '/[a-z]+,[a-z]+,[a-z]+/'
OP's code why its not working:
Because OP is mentioning only 1 occurrence of [a-z] and , but that is not that case there are more than 1 characters present in line before comma hence its not matching it. With your given code $1 is not required since you are matching whole line so I have removed $1 part from solution.
In case you have multiple fields(separated by spaces) and you want to check condition on 1st part then you could go with:
echo "jkl,abc,xyz blabla" | awk '$1 ~ /[a-z]+,[a-z]+,[a-z]+/'
Your middle segment of the regexp wasn't accounting for more than one letter between the commas so you should have made just that one part of it [a-z]* or [a-z]+ depending on your requirements for handling the case of zero letters.
Some approaches to consider to find 2 or more commas in a field:
$ echo jkl,abc,xyz | awk '$1 ~ /[a-z],[a-z]*,[a-z]/'
jkl,abc,xyz
$ echo jkl,abc,xyz | awk '$1 ~ /([a-z]*,){2,}/'
jkl,abc,xyz
$ echo jkl,abc,xyz | awk '$1 ~ /[^,],[^,]*,[^,]/'
jkl,abc,xyz
$ echo jkl,abc,xyz | awk '$1 ~ /([^,]*,){2,}/'
jkl,abc,xyz
$ echo jkl,abc,xyz | awk 'gsub(/,/,"&",$1) > 1'
jkl,abc,xyz

Sed command is misbehaving when a exclamation mark is included in the regex

I have this task where I have to extract the usernames and the hashed passwords from /etc/shadow but I'm having a problem when replacing the stream with the sed command.
I have tried this command:
sed '/s/*/NoPassword/; s/!/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt
It works fine when it comes to replacing the "!" with LockedPasswords, but some users have "!!" in the field and not "!", so I have tried other commands.
These ones give no result at all, the password fields containing either one or more question marks stay the same
sed '/s/*/NoPassword/; s/!+/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt
sed '/s/*/NoPassword/; s/!{1,2}/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt
What seems to be the problem? I'm only a beginner for both linux and regex.
You never need sed when you're using awk. Your whole command line can just be:
awk -F':' '{sub(/\*/,"NoPassword"); sub(/!+/,"LockedPassword"); print $1, $2}' /etc/shadow > passwords.txt
Both + and { } need to be escaped in Sed. So both of your attempts should work after fixing this:
sed '/s/*/NoPassword/; s/!\+/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt
sed '/s/*/NoPassword/; s/!\{1,2\}/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt
Use the -E flag:
sed -E '/s/*/NoPassword/; s/!+/LockedPassword/' /etc/shadow | awk -F: '{ print $1" "$2 }' > passwords.txt

Keep tab separation in incremental operation in a specific cell in a table [duplicate]

How do I select the first column from the TAB separated string?
# echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}'
The above will return the entire line and not just "LOAD_SETTLED" as expected.
Update:
I need to change the third column in the tab separated values.
The following does not work.
echo $line | awk 'BEGIN { -v var="$mycol_new" FS = "[ \t]+" } ; { print $1 $2 var $4 $5 $6 $7 $8 $9 }' >> /pdump/temp.txt
This however works as expected if the separator is comma instead of tab.
echo $line | awk -v var="$mycol_new" -F'\t' '{print $1 "," $2 "," var "," $4 "," $5 "," $6 "," $7 "," $8 "," $9 "}' >> /pdump/temp.txt
You need to set the OFS variable (output field separator) to be a tab:
echo "$line" |
awk -v var="$mycol_new" -F'\t' 'BEGIN {OFS = FS} {$3 = var; print}'
(make sure you quote the $line variable in the echo statement)
Make sure they're really tabs! In bash, you can insert a tab using C-v TAB
$ echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F$'\t' '{print $1}'
LOAD_SETTLED
Use:
awk -v FS='\t' -v OFS='\t' ...
Example from one of my scripts.
I use the FS and OFS variables to manipulate BIND zone files, which are tab delimited:
awk -v FS='\t' -v OFS='\t' \
-v record_type=$record_type \
-v hostname=$hostname \
-v ip_address=$ip_address '
$1==hostname && $3==record_type {$4=ip_address}
{print}
' $zone_file > $temp
This is a clean and easy to read way to do this.
You can set the Field Separator:
... | awk 'BEGIN {FS="\t"}; {print $1}'
Excellent read:
https://docs.freebsd.org/info/gawk/gawk.info.Field_Separators.html
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -v var="test" 'BEGIN { FS = "[ \t]+" } ; { print $1 "\t" var "\t" $3 }'
If your fields are separated by tabs - this works for me in Linux.
awk -F'\t' '{print $1}' < tab_delimited_file.txt
I use this to process data generated by mysql, which generates tab-separated output in batch mode.
From awk man page:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).
1st column only
— awk NF=1 FS='\t'
LOAD_SETTLED
First 3 columns
— awk NF=3 FS='\t' OFS='\t'
LOAD_SETTLED LOAD_INIT 2011-01-13
Except first 2 columns
— {g,n}awk NF=NF OFS= FS='^([^\t]+\t){2}'
— {m}awk NF=NF OFS= FS='^[^\t]+\t[^\t]+\t'
2011-01-13 03:50:01
Last column only
— awk '($!NF=$NF)^_' FS='\t', or
— awk NF=NF OFS= FS='^.*\t'
03:50:01
Should this not work?
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk '{print $1}'

Search regex on a specific field using awk

In awk I can search a field for a value like:
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2=="eaae" {print $0};'
aa,bb,cc
dd,eaae,ff
And I can search by regular expressions like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; /[a]{2}/ {print $0};'
aa,bb,cc
dd,eaae,ff
Can I force the awk to apply the regexp search to a specific field ? I'm looking for something like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2==/[a]{2}/ {print $0};'
expecting result:
dd,eaae,ff
Anyone know how to do it using awk?
Accepted response - Operator "~" (thanks to hek2mgl):
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2 ~ /[a]{2}/ {print $0};'
You can use :
$2 ~ /REGEX/ {ACTION}
If the regex should apply to the second field (for example) only.
In your case this would lead to:
awk -F, '$2 ~ /^[a]{2}$/' <<< "aa,bb,cc\ndd,eaae,ff"
You may wonder why I've just used the regex in the awk program and no print. This is because your action is print $0 - printing the current line - which is the default action in awk.

regex to search for a string between two slashes

I have a question in bash shell scripting. I am looking to search a string between two slashes. Slash is a delimiter here.
Lets say the string is /one/two/, I want to be able to just pick up one.
How can i achieve this is in shell scripts? Any pointers are greatly appreciated.
Use the -F flag of awk to set the delimeter to /. Then you can print the first ($2) and second ($3) field from the line.
$ cat /my/file
/one/two/
$ awk -F/ '{print $2}' /my/file
one
$ awk -F/ '{print $3}' /my/file
two
If the string is in a variable, you can pipe it to awk.
#!/bin/bash
var=/one/two/
echo $var | awk -F/ '{print $2}'
echo $var | awk -F/ '{print $3}'
path="/one/two/"
path=${path#/} # Remove leading /
path=${path%%/*} # Remove everything after first /
echo "$path" # Is now "one"
Using a bash regular expression:
$ str="/one/two/"
$ re="/([^/]*)/[^/]*/"
$ [[ $str =~ $re ]] && echo "${BASH_REMATCH[1]}"
one
$
Using cut:
$ str="/one/two/"
$ echo "$str" | cut -d/ -f2
one
$
Convert your string to an array, delimited with / and read the necessary element:
$ str="/one/two/"
$ IFS='/' a=( $str ) echo "${a[1]}"
one
$
And a couple of more
> cut -f 2 -d "/" <<< "/one/two"
one
> awk -F "/" '{print $2}' <<< "/one/two"
one
> oldifs="$IFS"; IFS="/"; var="/one/two/"; set -- $var; echo "$2"; IFS="$oldifs"
one